AWS SWF – Simple Workflow Overview – Certification

AWS SWF – Simple Workflow

  • AWS SWF makes it easy to build applications that coordinate work across distributed components
  • SWF makes it easier to develop asynchronous and distributed applications by providing a programming model and infrastructure for coordinating distributed components, tracking and maintaining their execution state in a reliable way
  • SWF does the following
    • stores metadata about a workflow and its component parts.
    • stores task for workers and queues them until a Worker needs them.
    • assigns task to workers, which can run either on cloud or on-premises
    • routes information between executions of a workflow and the associated Workers.
    • tracks the progress of workers on Tasks, with configurable timeouts.
    • maintains workflow state in a durable fashion
  • SWF helps coordinating tasks across the application which involves managing intertask dependencies, scheduling, and concurrency in accordance with the logical flow of the application.
  • SWF gives full control over implementing tasks and coordinating them without worrying about underlying complexities such as tracking their progress and maintaining their state.
  • SWF tracks and maintains the workflow state in a durable fashion, so that the application is resilient to failures in individual components, which can be implemented, deployed, scaled, and modified independently
  • SWF offers capabilities to support a variety of application requirements and is suitable for a range of use cases that require coordination of tasks, including media processing, web application back-ends, business process workflows, and analytics pipelines.

Simple Workflow Concepts

AWS SWF Components

  • Workflow
    • Fundamental concept in SWF is the Workflow, which is the automation of a business process
    • A workflow is a set of activities that carry out some objective, together with logic that coordinates the activities.
  • Workflow Execution
    • A workflow execution is a running instance of a workflow
  • Workflow History
    • SWF maintains the state and progress of each workflow execution in its Workflow History, which saves the application from having to store the state in a durable way.
    • It enables applications to be stateless as all information about a workflow execution is stored in its workflow history.
    • For each workflow execution, the history provides a record of which activities were scheduled, their current status, and their results. The workflow execution uses this information to determine next steps.
    • History provides a detailed audit trail that can be used to monitor running workflow executions and verify completed workflow executions.
    • Operations that do not change the state of the workflow for e.g. polling execution do not typically appear in the workflow history
    • Markers can be used to record information in the workflow history of a workflow execution that is specific to the use case
  • Domain
    • Each workflow runs in an AWS resource called a Domain, which controls the workflow’s scope
    • An AWS account can have multiple domains, with each containing multiple workflows
    • Workflows in different domains cannot interact with each other
  • Activities
    • Designing an SWF workflow, Activities need to be precisely defined and then registered with SWF as an activity type with information such as name, version and timeout
  • Activity Task & Activity Worker
    • An Activity Worker is a program that receives activity tasks, performs them, and provides results back. An activity worker can be a program or even a person who performs the task using an activity worker software
    • Activity tasks—and the activity workers that perform them can
      • run synchronously or asynchronously, can be distributed across multiple computers, potentially in different geographic regions, or run on the same computer,
      • be written in different programming languages and run on different operating systems
      • be created that are long-running, or that may fail, time out require restarts or that may complete with varying throughput & latency
  • Decider
    • A Decider implements a Workflow’s coordination logic.
    • Decider schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ends (or closes) the workflow when the objective has been completed.
    • Decider directs the workflow by receiving decision tasks from SWF and responding back to SWF with decisions. A decision represents an action or set of actions which are the next steps in the workflow which can either be to schedule an activity task, set timers to delay the execution of an activity task, to request cancellation of activity tasks already in progress, and to complete or close the workflow.
  • Workers and Deciders are both stateless, and can respond to increased traffic by simply adding additional Workers and Deciders as needed
  • Role of  SWF service is to function as a reliable central hub through which data is exchanged between the decider, the activity workers, and other relevant entities such as the person administering the workflow.
  • Mechanism by which both the activity workers and the decider receive their tasks (activity tasks and decision tasks resp.) is by polling the SWF
  • SWF allows “long polling”, requests will be held open for up to 60 seconds if necessary, to reduce network traffic and unnecessary processing
  • SWF informs the decider of the state of the workflow by including with each decision task, a copy of the current workflow execution history. The workflow execution history is composed of events, where an event represents a significant change in the state of the workflow execution for e.g events would be the completion of a task, notification that a task has timed out, or the expiration of a timer that was set earlier in the workflow execution. The history is a complete, consistent, and authoritative record of the workflow’s progress

Workflow Implementation & Execution

  1. Implement Activity workers with the processing steps in the Workflow.
  2. Implement Decider with the coordination logic of the Workflow.
  3. Register the Activities and workflow with SWF.
  4. Start the Activity workers and Decider. Once started, the decider and activity workers should start polling Amazon SWF for tasks.
  5. Start one or more executions of the Workflow. Each execution runs independently and can be provided with its own set of input data.
  6. When an execution is started, SWF schedules the initial decision task. In response, the decider begins generating decisions which initiate activity tasks. Execution continues until your decider makes a decision to close the execution.
  7. View and Track workflow executions

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon SWF stand for?
    1. Simple Web Flow
    2. Simple Work Flow
    3. Simple Wireless Forms
    4. Simple Web Form
  2. Regarding Amazon SWF, the coordination logic in a workflow is contained in a software program called a ____.
    1. Handler
    2. Decider
    3. Coordinator
    4. Worker
  3. For which of the following use cases are Simple Workflow Service (SWF) and Amazon EC2 an appropriate solution? Choose 2 answers
    1. Using as an endpoint to collect thousands of data points per hour from a distributed fleet of sensors
    2. Managing a multi-step and multi-decision checkout process of an e-commerce website
    3. Orchestrating the execution of distributed and auditable business processes
    4. Using as an SNS (Simple Notification Service) endpoint to trigger execution of video transcoding jobs
    5. Using as a distributed session store for your web application
  4. Amazon SWF is designed to help users…
    1. … Design graphical user interface interactions
    2. … Manage user identification and authorization
    3. … Store Web content
    4. … Coordinate synchronous and asynchronous tasks which are distributed and fault tolerant.
  5. What does a “Domain” refer to in Amazon SWF?
    1. A security group in which only tasks inside can communicate with each other
    2. A special type of worker
    3. A collection of related Workflows
    4. The DNS record for the Amazon SWF service
  6. Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements Customers can show oft their Individuality on the ski slopes and have access to head-up-displays. GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time? [PROFESSIONAL]
    1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
    2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
    3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
    4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)
  7. Your startup wants to implement an order fulfillment process for selling a personalized gadget that needs an average of 3-4 days to produce with some orders taking up to 6 months you expect 10 orders per day on your first day. 1000 orders per day after 6 months and 10,000 orders after 12 months. Orders coming in are checked for consistency men dispatched to your manufacturing plant for production quality control packaging shipment and payment processing. If the product does not meet the quality standards at any stage of the process employees may force the process to repeat a step Customers are notified via email about order status and any critical issues with their orders such as payment failure. Your case architecture includes AWS Elastic Beanstalk for your website with an RDS MySQL instance for customer data and orders. How can you implement the order fulfillment process while making sure that the emails are delivered reliably? [PROFESSIONAL]
    1. Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers. (Would use a SWF instead of BPM)
    2. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1. Use the decider instance to send emails to customers. (Decider sending emails might not be reliable)
    3. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1. Use SES to send emails to customers.
    4. Use an SQS queue to manage all process tasks. Use an Auto Scaling group of EC2 Instances that poll the tasks and execute them. Use SES to send emails to customers. (Does not provide an ability to repeat a step)
  8. Select appropriate use cases for SWF with Amazon EC2? (Choose 2)
    1. Video encoding using Amazon S3 and Amazon EC2. In this use case, large videos are uploaded to Amazon S3 in chunks. Application is built as a workflow where each video file is handled as one workflow execution.
    2. Processing large product catalogs using Amazon Mechanical Turk. While validating data in large catalogs, the products in the catalog are processed in batches. Different batches can be processed concurrently.
    3. Order processing system with Amazon EC2, SQS, and SimpleDB. Use SWF notifications to orchestrate an order processing system running on EC2, where notifications sent over HTTP can trigger real-time processing in related components such as an inventory system or a shipping service.
    4. Using as an SQS (Simple Queue Service) endpoint to trigger execution of video transcoding jobs.
  9. When you register an activity in Amazon SWF, you provide the following information, except:
    1. a name
    2. timeout values
    3. a domain
    4. version
  10. Regarding Amazon SWF, at times you might want to record information in the workflow history of a workflow execution that is specific to your use case. ____ enable you to record information in the workflow execution history that you can use for any custom or scenario-specific purpose.
    1. Markers
    2. Tags
    3. Hash keys
    4. Events
  11. Which of the following statements about SWF are true? Choose 3 answers.
    1. SWF tasks are assigned once and never duplicated
    2. SWF requires an S3 bucket for workflow storage
    3. SWF workflow executions can last up to a year
    4. SWF triggers SNS notifications on task assignment
    5. SWF uses deciders and workers to complete tasks
    6. SWF requires at least 1 EC2 instance per domain

References