AWS Blue Green Deployment Whitepaper

March 21, 2017 ~ Last updated on : April 12, 2023 ~ jayendrapatil ~ 7 Comments

AWS Blue Green Deployment

Blue/green deployments provide near zero-downtime release and rollback capabilities.
Blue/green deployment works by shifting traffic between two identical environments that are running different versions of the application
- Blue environment represents the current application version serving production traffic.
- In parallel, the green environment is staged running a different version of your application.
- After the green environment is ready and tested, production traffic is redirected from blue to green.
- If any problems are identified, you can roll back by reverting traffic back to the blue environment.

NOTE: Advanced Topic required for DevOps Professional Exam Only

AWS Services

Route 53

Route 53 is a highly available and scalable authoritative DNS service that route user requests

Route 53 with its DNS service allows administrators to direct traffic by simply updating DNS records in the hosted zone
TTL can be adjusted for resource records to be shorter which allow record changes to propagate faster to clients

Elastic Load Balancing

Elastic Load Balancing distributes incoming application traffic across EC2 instances
Elastic Load Balancing scales in response to incoming requests, performs health checking against Amazon EC2 resources, and naturally integrates with other AWS tools, such as Auto Scaling.

ELB also helps perform health checks of EC2 instances to route traffic only to the healthy instances

Auto Scaling

Auto Scaling allows different versions of launch configuration, which define templates used to launch EC2 instances, to be attached to an Auto Scaling group to enable blue/green deployment.
Auto Scaling’s termination policies and Standby state enable blue/green deployment
- Termination policies in Auto Scaling groups to determine which EC2 instances to remove during a scaling action.
- Auto Scaling also allows instances to be placed in Standby state, instead of termination, which helps with quick rollback when required
Auto Scaling with Elastic Load Balancing can be used to balance and scale the traffic

Elastic Beanstalk

Elastic Beanstalk makes it easy to run multiple versions of the application and provides capabilities to swap the environment URLs, facilitating blue/green deployment.
Elastic Beanstalk supports Auto Scaling and Elastic Load Balancing, both of which enable blue/green deployment

OpsWorks

OpsWorks has the concept of stacks, which are logical groupings of AWS resources with a common purpose & should be logically managed together

Stacks are made of one or more layers with each layer represents a set of EC2 instances that serve a particular purpose, such as serving applications or hosting a database server.
OpsWorks simplifies cloning entire stacks when preparing for blue/green environments.

CloudFormation

CloudFormation helps describe the AWS resources through JSON formatted templates and provides automation capabilities for provisioning blue/green environments and facilitating updates to switch traffic, whether through Route 53 DNS, Elastic Load Balancing, etc

CloudFormation provides infrastructure as code strategy, where infrastructure is provisioned and managed using code and software development techniques, such as version control and continuous integration, in a manner similar to how application code is treated

CloudWatch

CloudWatch monitoring can provide early detection of application health in blue/green deployments

Deployment Techniques

DNS Routing using Route 53

Route 53 DNS service can help switch traffic from the blue environment to the green and vice versa, if rollback is necessary

Route 53 can help either switch the traffic completely or through a weighted distribution
Weighted distribution
- helps distribute percentage of traffic to go to the green environment and gradually update the weights until the green environment carries the full production traffic
- provides the ability to perform canary analysis where a small percentage of production traffic is introduced to a new environment
- helps manage cost by using auto scaling for instances to scale based on the actual demand
Route 53 can handle Public or Elastic IP address, Elastic Load Balancer, Elastic Beanstalk environment web tiers etc.

Auto Scaling Group Swap Behind Elastic Load Balancer

AWS Blue Green Deployment - Auto Scaling Group

Elastic Load Balancing with Auto Scaling to manage EC2 resources as per the demand can be used for Blue Green deployments
Multiple Auto Scaling groups can be attached to the Elastic Load Balancer
Green ASG can be attached to an existing ELB while Blue ASG is already attached to the ELB to serve traffic

ELB would start routing requests to the Green Group as for HTTP/S listener it uses a least outstanding requests routing algorithm
Green group capacity can be increased to process more traffic while the Blue group capacity can be reduced either by terminating the instances or by putting the instances in a standby mode
Standby is a good option because if roll back to the blue environment needed, blue server instances can be put back in service and they’re ready to go

If no issues with the Green group, the blue group can be decommissioned by adjusting the group size to zero

Update Auto Scaling Group Launch Configurations

AWS Blue Green Deployment - Auto Scaling Launch

Auto Scaling groups have their own launch configurations which define template for EC2 instances to be launched
Auto Scaling group can have only one launch configuration at a time, and it can’t be modified. If needs modification, a new launch configuration can be created and attached to the existing Auto Scaling Group

After a new launch configuration is in place, any new instances that are launched use the new launch configuration parameters, but existing instances are not affected.
When Auto Scaling removes instances (referred to as scaling in) from the group, the default termination policy is to remove instances with the oldest launch configuration
To deploy the new version of the application in the green environment, update the Auto Scaling group with the new launch configuration, and then scale the Auto Scaling group to twice its original size.

Then, shrink the Auto Scaling group back to the original size
To perform a rollback, update the Auto Scaling group with the old launch configuration. Then, do the preceding steps in reverse

Elastic Beanstalk Application Environment Swap

AWS Blue Green Deployment - Elastic Beanstalk

Elastic Beanstalk multiple environment and environment url swap feature helps enable Blue Green deployment

Elastic Beanstalk can be used to host the blue environment exposed via URL to access the environment
Elastic Beanstalk provides several deployment policies, ranging from policies that perform an in-place update on existing instances, to immutable deployment using a set of new instances.
Elastic Beanstalk performs an in-place update when the application versions are updated, however application may become unavailable to users for a short period of time.

To avoid the downtime, a new version can be deployed to a separate Green environment with its own URL, launched with the existing environment’s configuration
Elastic Beanstalk’s Swap Environment URLs feature can be used to promote the green environment to serve production traffic
Elastic Beanstalk performs a DNS switch, which typically takes a few minutes

To perform a rollback, invoke Swap Environment URL again.

Clone a Stack in AWS OpsWorks and Update DNS

OpsWorks can be used to create
- Blue environment stack with the current version of the application and serving production traffic
- Green environment stack with the newer version of the application and is not receiving any traffic

To promote to the green environment/stack into production, update DNS records to point to the green environment/stack’s load balancer

Labs

Qwiklabs free labs
- Blue/Green Deployment Pattern with AWS Elastic Beanstalk

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.

What is server immutability?
1. Not updating a server after creation. (During the new release, a new set of EC2 instances are rolled out by terminating older instances and are disposable. EC2 instance usage is considered temporary or ephemeral in nature for the period of deployment until the current release is active)
2. The ability to change server counts.
3. Updating a server after creation.
4. The inability to change server counts.
You need to deploy a new application version to production. Because the deployment is high-risk, you need to roll the new version out to users over a number of hours, to make sure everything is working correctly. You need to be able to control the proportion of users seeing the new version of the application down to the percentage point. You use ELB and EC2 with Auto Scaling Groups and custom AMIs with your code pre-installed assigned to Launch Configurations. There are no database-level changes during your deployment. You have been told you cannot spend too much money, so you must not increase the number of EC2 instances much at all during the deployment, but you also need to be able to switch back to the original version of code quickly if something goes wrong. What is the best way to meet these requirements?
1. Create a second ELB, Auto Scaling Launch Configuration, and Auto Scaling Group using the Launch Configuration. Create AMIs with all code pre-installed. Assign the new AMI to the second Auto Scaling Launch Configuration. Use Route53 Weighted Round Robin Records to adjust the proportion of traffic hitting the two ELBs. (Use Weighted Round Robin DNS Records and reverse proxies allow such fine-grained tuning of traffic splits. Blue-Green option does not meet the requirement that we mitigate costs and keep overall EC2 fleet size consistent, so we must select the 2 ELB and ASG option with WRR DNS tuning)
2. Use the Blue-Green deployment method to enable the fastest possible rollback if needed. Create a full second stack of instances and cut the DNS over to the new stack of instances, and change the DNS back if a rollback is needed. (Full second stack is expensive)
3. Create AMIs with all code pre-installed. Assign the new AMI to the Auto Scaling Launch Configuration, to replace the old one. Gradually terminate instances running the old code (launched with the old Launch Configuration) and allow the new AMIs to boot to adjust the traffic balance to the new code. On rollback, reverse the process by doing the same thing, but changing the AMI on the Launch Config back to the original code. (Cannot modify the existing launch config)
4. Migrate to use AWS Elastic Beanstalk. Use the established and well-tested Rolling Deployment setting AWS provides on the new Application Environment, publishing a zip bundle of the new code and adjusting the wait period to spread the deployment over time. Re-deploy the old code bundle to rollback if needed.

When thinking of AWS Elastic Beanstalk, the ‘Swap Environment URLs’ feature most directly aids in what?
1. Immutable Rolling Deployments
2. Mutable Rolling Deployments
3. Canary Deployments
4. Blue-Green Deployments (Complete switch from one environment to other)
You were just hired as a DevOps Engineer for a startup. Your startup uses AWS for 100% of their infrastructure. They currently have no automation at all for deployment, and they have had many failures while trying to deploy to production. The company has told you deployment process risk mitigation is the most important thing now, and you have a lot of budget for tools and AWS resources. Their stack: 2-tier API Data stored in DynamoDB or S3, depending on type, Compute layer is EC2 in Auto Scaling Groups, They use Route53 for DNS pointing to an ELB, An ELB balances load across the EC2 instances. The scaling group properly varies between 4 and 12 EC2 servers. Which of the following approaches, given this company’s stack and their priorities, best meets the company’s needs?
1. Model the stack in AWS Elastic Beanstalk as a single Application with multiple Environments. Use Elastic Beanstalk’s Rolling Deploy option to progressively roll out application code changes when promoting across environments. (Does not support DynamoDB also need Blue Green deployment for zero downtime deployment as cost is not a constraint)
2. Model the stack in 3 CloudFormation templates: Data layer, compute layer, and networking layer. Write stack deployment and integration testing automation following Blue-Green methodologies.
3. Model the stack in AWS OpsWorks as a single Stack, with 1 compute layer and its associated ELB. Use Chef and App Deployments to automate Rolling Deployment. (Does not support DynamoDB also need Blue Green deployment for zero downtime deployment as cost is not a constraint)
4. Model the stack in 1 CloudFormation template, to ensure consistency and dependency graph resolution. Write deployment and integration testing automation following Rolling Deployment methodologies. (Need Blue Green deployment for zero downtime deployment as cost is not a constraint)

You are building out a layer in a software stack on AWS that needs to be able to scale out to react to increased demand as fast as possible. You are running the code on EC2 instances in an Auto Scaling Group behind an ELB. Which application code deployment method should you use?
1. SSH into new instances those come online, and deploy new code onto the system by pulling it from an S3 bucket, which is populated by code that you refresh from source control on new pushes. (is slow and manual)
2. Bake an AMI when deploying new versions of code, and use that AMI for the Auto Scaling Launch Configuration. (Pre baked AMIs can help to get started quickly)
3. Create a Dockerfile when preparing to deploy a new version to production and publish it to S3. Use UserData in the Auto Scaling Launch configuration to pull down the Dockerfile from S3 and run it when new instances launch. (is slow)
4. Create a new Auto Scaling Launch Configuration with UserData scripts configured to pull the latest code at all times. (is slow)
You company runs a complex customer relations management system that consists of around 10 different software components all backed by the same Amazon Relational Database (RDS) database. You adopted AWS OpsWorks to simplify management and deployment of that application and created an AWS OpsWorks stack with layers for each of the individual components. An internal security policy requires that all instances should run on the latest Amazon Linux AMI and that instances must be replaced within one month after the latest Amazon Linux AMI has been released. AMI replacements should be done without incurring application downtime or capacity problems. You decide to write a script to be run as soon as a new Amazon Linux AMI is released. Which solutions support the security policy and meet your requirements? Choose 2 answers
1. Assign a custom recipe to each layer, which replaces the underlying AMI. Use AWS OpsWorks life-cycle events to incrementally execute this custom recipe and update the instances with the new AMI.
2. Create a new stack and layers with identical configuration, add instances with the latest Amazon Linux AMI specified as a custom AMI to the new layer, switch DNS to the new stack, and tear down the old stack. (Blue-Green Deployment)
3. Identify all Amazon Elastic Compute Cloud (EC2) instances of your AWS OpsWorks stack, stop each instance, replace the AMI ID property with the ID of the latest Amazon Linux AMI ID, and restart the instance. To avoid downtime, make sure not more than one instance is stopped at the same time.
4. Specify the latest Amazon Linux AMI as a custom AMI at the stack level, terminate instances of the stack and let AWS OpsWorks launch new instances with the new AMI.
5. Add new instances with the latest Amazon Linux AMI specified as a custom AMI to all AWS OpsWorks layers of your stack, and terminate the old ones.
Your company runs an event management SaaS application that uses Amazon EC2, Auto Scaling, Elastic Load Balancing, and Amazon RDS. Your software is installed on instances at first boot, using a tool such as Puppet or Chef, which you also use to deploy small software updates multiple times per week. After a major overhaul of your software, you roll out version 2.0 new, much larger version of the software of your running instances. Some of the instances are terminated during the update process. What actions could you take to prevent instances from being terminated in the future? (Choose two)
1. Use the zero downtime feature of Elastic Beanstalk to deploy new software releases to your existing instances. (No such feature, you can perform environment url swap)
2. Use AWS CodeDeploy. Create an application and a deployment targeting the Auto Scaling group. Use CodeDeploy to deploy and update the application in the future. (Refer link)
3. Run “aws autoscaling suspend-processes” before updating your application. (Refer link)
4. Use the AWS Console to enable termination protection for the current instances. (Termination protection does not work with Auto Scaling)
5. Run “aws autoscaling detach-load-balancers” before updating your application. (Does not prevent Auto Scaling to terminate the instances)

References

AWS Blue/Green Deployment Whitepaper

AWS Elastic Transcoder – Certification

January 14, 2017 ~ Last updated on : January 23, 2017 ~ jayendrapatil ~ 4 Comments

AWS Elastic Transcoder

Amazon Elastic Transcoder is a highly scalable, easy-to-use and cost-effective way for developers and businesses to convert (or “transcode”) video files from their source format into versions that will play back on multiple devices like smartphones, tablets and PCs.

Elastic Transcoder is for any customer with media assets stored in S3 for e.g. developers creating apps or websites that publish user-generated content, enterprises and educational establishments converting training and communication videos, and content owners and broadcasters needing to convert media assets into web-friendly formats.
Elastic Transcoder features
- can be used to convert files from different media formats into H.264/AAC/MP4 files at different resolutions, bitrates, and frame rates, and set up transcoding pipelines to transcode files in parallel.
- can be configured to overlay up to four graphics, known as watermarks, over a video during transcoding
- can be configured to transcode captions, or subtitles, from one format to another and supports embedded and sidebar caption types
- provides clip stitching ability to stitch together parts, or clips, from multiple input files to create a single output
- can be configured to create Thumbnails
Elastic Transcoder is integrated with CloudTrail, an AWS service that captures information about every request that is sent to the Elastic Transcoder API by your AWS account, including your IAM users

Elastic Transcoder Components

Presets
- are templates that contain most of the settings for transcoding media files from one format to another.
- Elastic Transcoder includes some default presets for common formats and ability to create customized presets
Jobs
- do the work of transcoding and converts a file into up to 30 formats.
- takes the input file to be transcoded, names of the transcoded files and several other settings as input
- For each transcoded format a preset needs to be specified

Pipelines
- are queues that manage the transcoding jobs.
- Elastic Transcoder starts processing the jobs and transcoding into format (for multiple formats) in the order they are added.
- can be paused to temporarily stop processing jobs
Notifications
- help keep you apprised of the status of a job, i.e. started, completed, encounters warning or error
- eliminate the need for polling to determine when a job has finished and can be configured during pipeline creation

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your website is serving on-demand training videos to your workforce. Videos are uploaded monthly in high resolution MP4 format. Your workforce is distributed globally often on the move and using company-provided tablets that require the HTTP Live Streaming (HLS) protocol to watch a video. Your company has no video transcoding expertise and it required you might need to pay for a consultant. How do you implement the most cost-efficient architecture without compromising high availability and quality of video delivery?
1. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS. S3 to host videos with lifecycle Management to archive original flies to Glacier after a few days. CloudFront to serve HLS transcoded videos from S3
2. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number or nodes depending on the length of the queue S3 to host videos with Lifecycle Management to archive all files to Glacier after a few days CloudFront to serve HLS transcoding videos from Glacier
3. Elastic Transcoder to transcode original high-resolution MP4 videos to HLS EBS volumes to host videos and EBS snapshots to incrementally backup original rues after a few days. CloudFront to serve HLS transcoded videos from EC2.
4. A video transcoding pipeline running on EC2 using SQS to distribute tasks and Auto Scaling to adjust the number of nodes depending on the length of the queue. EBS volumes to host videos and EBS snapshots to incrementally backup original files after a few days. CloudFront to serve HLS transcoded videos from EC2

References

Elastic_Transcoder_Developer_Guide

AWS CloudSearch – Certification

January 13, 2017 ~ Last updated on : January 15, 2017 ~ jayendrapatil

AWS CloudSearch

CloudSearch is a fully-managed, full-featured search service in the AWS Cloud that makes it easy to set up, manage, and scale a search solution

CloudSearch
- automatically provisions the required resources
- deploys a highly tuned search index
- easy configuration and can be up & running in less than one hour
- search and ability to upload searchable data
- automatically scales for data and traffic
- self-healing clusters, and
- high availability with Multi-AZ

CloudSearch uses Apache Solr as the underlying text search engine and
- can be used to index and search both structured and unstructured data.
- content can come from multiple sources and can include database fields along with files in a variety of formats, web pages, and so on.
- supports indexing features like algorithmic stemming, dictionary stemming, stopword dictionary
- can support customizable result ranking i.e. relevancy
- supports search features for text search, different query types (range, boolean etc), sorting, facets for filtering, grouping etc
- supports enhanced features for auto suggestions, highlighting, spatial search, fuzzy search etc

CloudSearch supports Multi-AZ option and it deploys additional instances in a second AZ in the same region.
CloudSearch can offer significantly lower total cost of ownership compared to operating and managing your own search environment

CloudSearch Search Domains, Data & Indexing

CloudSearch Architecture

Search domain is a data container and a set of services that make the data searchable
- Document service that allows data uploading to domain for indexing
- Search service that enables search requests against the indexed data
- Configuration service for controlling the domains behavior (include relevance ranking)
Search domain can’t be automatically migrated from one region to another. New domain in the target region needs to be created, configured and data uploaded, and then the original domain deleted
Indexed data to be made searchable
- can be submitted through a REST based web service url
- has to be in JSON or XML format
- is represented as a document with a unique document ID and multiple fields either to be search on to needed to be just retrieved
CloudSearch generates a search index from the document data according to the index fields configured for the domain

Data updates can be submitted by to add, update and delete documents
Data can be uploaded using secure and encrypted SSL HTTPS connection

CloudSearch Auto Scaling

CloudSearch Scaling

Search domains scale in two dimensions: data and traffic
A search instance is a single search engine in the cloud that indexes documents and responds to search requests with a finite amount of RAM and CPU resources for indexing data and processing requests.

Search domain can have one or more search partitions, portion of the data which fits on a single search instance, and the number of search partitions can change as the documents are indexed
CloudSearch can determine the size and number of search instances required to deliver low latency, high throughput search performance
When a search domain is created , a single instance is deployed

CloudSearch automatically scales the domain by adding instances as the volume of data or traffic increases
Scaling for data
- CloudSearch handles scaling for data by
  - Vertical scaling by increasing the size of the instance, when the amount of data exceeds a single search instance
  - Horizontal scaling using search partitions, when the amount of data exceeds the capacity of the largest search instance type
- Number of search instances required to hold the index partitions is sometimes referred to as the domain’s width.
- CloudSearch reduces the number of partitions and size of search instances if the amount of data reduces
Scaling for traffic
- CloudSearch handles Scaling for traffic by
  - Vertical scaling by increasing the size of the instance, when the amount of traffic exceeds a single search instance
  - Horizontal scaling by deploying a duplicate search instance to provide additional processing power i.e. the complete number of partitions are duplicated
- CloudSearch reduces the number of partitions and size of search instances if the traffic reduces
- Number of duplicate search instances is sometimes referred to as the domain’s depth.

CloudSearch Search Features

CloudSearch provides features to index and search both structured data and plain text as well as unstructured data like pdf, word documents
CloudSearch provides near real-time indexing for document updates
Indexing features include
- tokenization,
- stopwords,
- stemming and
- synonyms
Search features include
- faceted search, free text search, Boolean search expressions,
- customizable relevance ranking, query time rank expressions,
- grouping
- field weighting, searching and sorting
- Other features like
  - Autocomplete suggestions
  - Highlighting
  - Geospatial search
  - New data types: date, double, 64 bit signed int, LatLon
  - Dynamic fields
  - Index field statistics
  - Sloppy phrase search
  - Term boosting
  - Enhanced range searching for all field types
  - Search filters that don’t affect relevance
  - Support for multiple query parsers: simple, structured, lucene, dismax
  - Query parser configuration options

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A newspaper organization has an on-premises application which allows the public to search its back catalogue and retrieve individual newspaper pages via a website written in Java. They have scanned the old newspapers into JPEGs (approx. 17TB) and used Optical Character Recognition (OCR) to populate a commercial search product. The hosting platform and software is now end of life and the organization wants to migrate its archive to AWS and produce a cost efficient architecture and still be designed for availability and durability. Which is the most appropriate?
1. Use S3 with reduced redundancy to store and serve the scanned files, install the commercial search application on EC2 Instances and configure with auto-scaling and an Elastic Load Balancer. (Reusing Commercial search application which is nearing end of life not a good option for cost)
2. Model the environment using CloudFormation. Use an EC2 instance running Apache webserver and an open source search application, stripe multiple standard EBS volumes together to store the JPEGs and search index. (storing JPEGs on EBS volumes not cost effective also answer does not address Open source solution availability)
3. Use S3 with standard redundancy to store and serve the scanned files, use CloudSearch for query processing, and use Elastic Beanstalk to host the website across multiple availability zones. (Cost effective S3 storage, CloudSearch for Search and Highly available and durable web application)
4. Use a single-AZ RDS MySQL instance to store the search index and the JPEG images use an EC2 instance to serve the website and translate user queries into SQL. (MySQL not an ideal solution to sore index and JPEG images for cost and performance)
5. Use a CloudFront download distribution to serve the JPEGs to the end users and Install the current commercial search product, along with a Java Container for the website on EC2 instances and use Route53 with DNS round-robin. (Web Application not scalable, whats the source for JPEGs files through CloudFront)

References

AWS_CloudSearch_Developer_Guide

AWS Elastic Beanstalk vs OpsWorks vs CloudFormation – Certification

January 10, 2017 ~ Last updated on : June 17, 2026 ~ jayendrapatil ~ 14 Comments

⚠️ PARTIAL DEPRECATION NOTICE

AWS OpsWorks (all variants) reached End of Life (EOL) in 2024.

AWS OpsWorks for Puppet Enterprise – EOL March 31, 2024
AWS OpsWorks for Chef Automate – EOL May 5, 2024

AWS OpsWorks Stacks – EOL May 26, 2024

The OpsWorks services have been disabled for both new and existing customers. The comparison sections involving OpsWorks are maintained for historical reference.

Current Deployment & Management Options:

AWS CloudFormation – Infrastructure as Code (still actively supported and enhanced)
AWS Elastic Beanstalk – Simplified application deployment (still actively supported)

AWS CDK – Programmatic infrastructure definition using familiar languages
AWS Systems Manager – Configuration management and automation (OpsWorks replacement)
AWS CodeDeploy – Application deployment automation
AWS App Runner – Fully managed container application service

For OpsWorks migration guidance, refer to: AWS OpsWorks EOL Documentation

AWS Elastic Beanstalk vs CloudFormation vs CDK – Deployment & Management Services Comparison

AWS offers multiple options for provisioning IT infrastructure and application deployment and management, varying from convenience & ease of setup to low-level granular control.
Deployment and Management - Elastic Beanstalk vs CloudFormation vs CDK

AWS Elastic Beanstalk

AWS Elastic Beanstalk is a higher-level service which allows you to quickly deploy with minimum management effort a web or worker-based environment using EC2, Docker using ECS, Elastic Load Balancing, Auto Scaling, RDS, CloudWatch, etc.
Elastic Beanstalk is the fastest and simplest way to get an application up and running on AWS, perfect for developers who want to deploy code and not worry about underlying infrastructure.
Elastic Beanstalk provides an environment to easily deploy and run applications in the cloud. It is integrated with developer tools and provides a one-stop experience for application lifecycle management.

Elastic Beanstalk requires minimal configuration and will help deploy, monitor, and handle the elasticity/scalability of the application.
A user doesn’t need to do much more than write application code and configure some settings on Elastic Beanstalk.
Supports platforms including Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on Amazon Linux 2 and Amazon Linux 2023.

AI-Powered Environment Analysis (2026) – Elastic Beanstalk now offers AI-powered analysis that automatically diagnoses environment health issues, identifies root causes, and provides recommended solutions when health status is Warning, Degraded, or Severe.
Dual-Stack IPv6 Support (2025) – Supports dual-stack configuration for Application Load Balancers and Network Load Balancers, allowing environments to serve both IPv4 and IPv6 traffic.

AWS OpsWorks (Deprecated – EOL 2024)

⚠️ All AWS OpsWorks services reached End of Life in 2024 and have been disabled for all customers.

AWS OpsWorks was an application management service that simplified software configuration, application deployment, scaling, and monitoring using Chef or Puppet.

OpsWorks was recommended for managing infrastructure with a configuration management system such as Chef.
OpsWorks enabled writing custom Chef recipes, utilized self-healing, and worked with layers.
Although OpsWorks was a deployment management service that helped deploy applications with Chef recipes, it was not primarily meant to manage scaling out of the box and needed to be handled explicitly.

Migration Paths:
- AWS Systems Manager – For configuration management and automation (recommended by AWS)
- Chef SaaS – For customers who want to continue using Chef recipes
- Puppet Enterprise – Self-hosted Puppet for existing Puppet users
- AWS CodeDeploy – For application deployment workflows
- Amazon ECS/EKS – For containerized workloads

AWS CloudFormation

AWS CloudFormation enables modeling, provisioning, and version-controlling of a wide range of AWS resources ranging from a single EC2 instance to a complex multi-tier, multi-region application.
CloudFormation is a low-level service and provides granular control to provision and manage stacks of AWS resources based on templates (JSON or YAML).
CloudFormation templates enable version control of the infrastructure and make deployment of environments easy and repeatable.

CloudFormation supports infrastructure needs of many different types of applications such as existing enterprise applications, legacy applications, applications built using a variety of AWS resources, and container-based solutions (including those built using AWS Elastic Beanstalk).
CloudFormation is not just an application deployment tool but can provision any kind of AWS resource.
CloudFormation is designed to complement Elastic Beanstalk and other AWS services.

CloudFormation with Elastic Beanstalk
- CloudFormation supports Elastic Beanstalk application environments as one of the AWS resource types.
- This allows you, for example, to create and manage an AWS Elastic Beanstalk–hosted application along with an RDS database to store the application data. In addition to RDS instances, any other supported AWS resource can be added to the group as well.

Key Updates (2024-2025):
- Stack Refactoring (2025) – Move resources between stacks, rename logical IDs, and decompose monolithic templates into focused components without disrupting running infrastructure.
- 40% Faster Deployments (2024) – Optimistic stabilization with CONFIGURATION_COMPLETE event enables parallel creation of dependent resources.
- IaC Generator – Generate CloudFormation templates from existing AWS resources (reverse-engineer existing infrastructure into IaC).
- Configuration Drift Management – Improved drift detection and remediation capabilities.
- AI Integration – IaC context integrated with AI-powered development tools.

AWS Cloud Development Kit (AWS CDK)

AWS CDK is an open-source software development framework that allows you to define cloud infrastructure using familiar programming languages (TypeScript, JavaScript, Python, Java, C#/.NET, Go).
CDK synthesizes into CloudFormation templates, providing the reliability of CloudFormation with the expressiveness of general-purpose programming languages.
CDK provides high-level constructs (L2/L3) that encapsulate AWS best practices and reduce the amount of boilerplate code needed.

CDK is ideal for teams who prefer imperative programming over declarative YAML/JSON templates.
Same infrastructure that takes 500+ lines of CloudFormation YAML can be expressed in ~15 lines of CDK TypeScript.
CDK Refactoring (2025) – Refactor CDK code (rename constructs, move resources between stacks) while preserving deployed resources.

CDK Mixins (2026) – Add composable, reusable abstractions to any construct (L1, L2, or custom) without rebuilding existing infrastructure code.
CDK Aspects – Apply organization-wide policies (security rules, tagging standards, compliance requirements) across entire infrastructure.

AWS Serverless Application Model (AWS SAM)

AWS SAM is an open-source framework specifically designed for building serverless applications using infrastructure as code.

SAM extends CloudFormation with shorthand syntax to express Lambda functions, APIs, databases, and event source mappings with fewer lines of code.
During deployment, SAM transforms the SAM syntax into CloudFormation syntax, then CloudFormation provisions the resources.
SAM CLI provides local testing, debugging, packaging, and deployment capabilities.

SAM Accelerate – Speeds up local development and cloud testing.
WebSocket API Support (2026) – Define complete WebSocket APIs for API Gateway with minimal configuration.
SAM CLI integrates with AWS CDK and Terraform.

AWS App Runner

AWS App Runner is a fully managed container application service that lets you build, deploy, and run containerized web applications and API services without prior infrastructure or container experience.

App Runner connects directly to your code or image repository and provides an automatic CI/CD pipeline with fully managed operations, high performance, scalability, and security.
App Runner automatically handles load balancing, auto-scaling (including scale to zero), encryption, and health monitoring.
Ideal for web applications and APIs that need to deploy quickly from source code or container images without managing infrastructure.

Supports deployment from GitHub, Bitbucket, or Amazon ECR.

AWS Proton (Deprecated – EOL October 7, 2026)

⚠️ AWS Proton will reach End of Life on October 7, 2026. Plan migration accordingly.

AWS Proton was a fully managed deployment service that standardized how organizations deploy microservices and infrastructure from approved templates.
Proton sat on top of CloudFormation (or Terraform) and added self-service deployment, versioning, parameter validation, and standardization.
Designed for platform teams to provide standardized templates while giving developers self-service deployment speed.

After EOL, the Proton console, API, and pipeline management will be permanently unavailable, though deployed infrastructure will remain intact.

Comparison Summary

Service	Level	Best For	Status
Elastic Beanstalk	High-level PaaS	Developers who want to deploy code without managing infrastructure	✅ Active
CloudFormation	Low-level IaC	Granular control over all AWS resources via declarative templates	✅ Active
AWS CDK	High-level IaC	Teams who prefer defining infrastructure in programming languages	✅ Active
AWS SAM	Serverless IaC	Serverless applications (Lambda, API Gateway, DynamoDB)	✅ Active
App Runner	Fully Managed	Container web apps/APIs without any infrastructure management	✅ Active
OpsWorks	Configuration Mgmt	Chef/Puppet based configuration management	❌ EOL (2024)
AWS Proton	Template Orchestration	Standardized microservice deployment templates	⚠️ EOL Oct 2026

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your team is excited about the use of AWS because now they have access to programmable infrastructure. You have been asked to manage your AWS infrastructure in a manner similar to the way you might manage application code. You want to be able to deploy exact copies of different versions of your infrastructure, stage changes into different environments, revert back to previous versions, and identify what versions are running at any particular time (development, test, QA, production). Which approach addresses this requirement?
1. Use cost allocation reports and AWS OpsWorks to deploy and manage your infrastructure.
2. Use AWS CloudWatch metrics and alerts along with resource tagging to deploy and manage your infrastructure.
3. Use AWS Elastic Beanstalk and a version control system like GIT to deploy and manage your infrastructure.
4. Use AWS CloudFormation and a version control system like GIT to deploy and manage your infrastructure.

An organization is planning to use AWS for their production roll out. The organization wants to implement automation for deployment such that it will automatically create a LAMP stack, download the latest PHP installable from S3 and setup the ELB. Which of the below mentioned AWS services meets the requirement for making an orderly deployment of the software?
1. AWS Elastic Beanstalk
2. AWS CloudFront
3. AWS CloudFormation
4. AWS DevOps
You are working with a customer who is using Chef configuration management in their data center. Which service is designed to let the customer leverage existing Chef recipes in AWS?
Note: AWS OpsWorks reached EOL in 2024. For Chef-based configuration management on AWS, customers should now use Chef SaaS or AWS Systems Manager with Chef recipes via Application Manager.
1. Amazon Simple Workflow Service
2. AWS Elastic Beanstalk
3. AWS CloudFormation
4. AWS OpsWorks (Historical answer – service now deprecated)
A company wants to define their infrastructure using a programming language like TypeScript instead of writing YAML templates. They want the same reliability as CloudFormation but with less boilerplate code. Which AWS service should they use?
1. AWS Elastic Beanstalk
2. AWS Cloud Development Kit (CDK)
3. AWS SAM
4. AWS App Runner

A startup wants to deploy a containerized web application with minimal infrastructure management. They want automatic scaling, load balancing, and a CI/CD pipeline connected to their GitHub repository. Which AWS service provides the simplest solution?
1. Amazon ECS with Fargate
2. AWS Elastic Beanstalk
3. AWS App Runner
4. AWS CloudFormation
A team is building a serverless application using Lambda functions, API Gateway, and DynamoDB. They want to define their infrastructure using a simplified template syntax with built-in local testing capabilities. Which tool is most appropriate?
1. AWS CloudFormation
2. AWS CDK
3. AWS SAM
4. AWS Elastic Beanstalk
An organization has a large monolithic CloudFormation stack that they want to split into smaller, focused stacks without recreating their existing infrastructure. Which CloudFormation feature enables this?
1. CloudFormation StackSets
2. CloudFormation Change Sets
3. CloudFormation Stack Refactoring
4. CloudFormation Nested Stacks

References

AWS High Availability & Fault Tolerance Architecture – Certification

January 7, 2017 ~ Last updated on : June 12, 2026 ~ jayendrapatil ~ 40 Comments

AWS High Availability & Fault Tolerance Architecture

📅 Content Update – June 2025

This post has been updated to reflect modern AWS HA/FT services and best practices including AWS Resilience Hub, Application Recovery Controller (ARC), Fault Injection Service (FIS), Multi-AZ DB Clusters, DynamoDB Global Tables with Multi-Region Strong Consistency (MRSC), and current ELB types (ALB/NLB/GWLB).

Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud.
Fault-tolerance defines the ability for a system to remain in operation even if some of the components used to build the system fail.

Most of the higher-level services, such as S3, DynamoDB, SQS, and ELB, have been built with fault tolerance and high availability in mind.
Services that provide basic infrastructure, such as EC2 and EBS, provide specific features, such as availability zones, elastic IP addresses, and snapshots, that a fault-tolerant and highly available system must take advantage of and use correctly.

AWS High Availability and Fault Tolerance

NOTE: Topic mainly for Professional Exam Only

Regions & Availability Zones

Amazon Web Services are available in geographic Regions and with multiple Availability Zones (AZs) within a region, which provide easy access to redundant deployment locations.

AZs are distinct geographical locations that are engineered to be insulated from failures in other AZs.
Regions and AZs help achieve greater fault tolerance by distributing the application geographically and help build multi-site solutions.
AZs provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. All traffic between AZs is encrypted.

By placing EC2 instances in multiple AZs, an application can be protected from failure at a single data center.
It is important to run independent application stacks in more than one AZ, either in the same region or in another region, so that if one zone fails, the application in the other zone can continue to run.
AWS recommends deploying production workloads across at least 3 AZs for optimal fault isolation and static stability.

Amazon Machine Image – AMIs

EC2 is a web service within Amazon Web Services that provides computing resources.
Amazon Machine Image (AMI) provides a Template that can be used to define the service instances.
Template basically contains a software configuration (i.e., OS, application server, and applications) and is applied to an instance type.

AMI can either contain all the softwares, applications and the code bundled or can be configured to have a bootstrap script (user data) to install the same on startup.
A single AMI can be used to create server resources of different instance types and start creating new instances or replacing failed instances.
EC2 Image Builder can automate the creation, testing, and distribution of AMIs across regions, enabling faster recovery through pre-built golden images.

Auto Scaling

Auto Scaling helps to automatically scale EC2 capacity up or down based on defined rules.
Auto Scaling also enables addition of more instances in response to an increasing load; and when those instances are no longer needed, they will be automatically terminated.
Auto Scaling enables terminating server instances at will, knowing that replacement instances will be automatically launched.

Auto Scaling can work across multiple AZs within an AWS Region.
Predictive Scaling uses machine learning to proactively scale out ASGs ahead of anticipated demand spikes, improving availability and reducing the need for over-provisioning.
Target Tracking Scaling policies provide a simplified way to configure dynamic scaling based on a specific metric target (e.g., average CPU utilization at 50%).

Auto Scaling groups support warm pools to pre-initialize instances for faster scaling, reducing cold-start times during demand surges.
Amazon Application Recovery Controller (ARC) supports zonal autoshift with EC2 Auto Scaling, automatically shifting traffic away from impaired AZs.

Elastic Load Balancing – ELB

Elastic Load Balancing is an effective way to increase the availability of a system and distributes incoming traffic to applications across several EC2 instances.

ELB supports health checks on hosts, distribution of traffic to EC2 instances across multiple availability zones, and dynamic addition and removal of EC2 hosts from the load-balancing rotation.
Elastic Load Balancing detects unhealthy instances within its pool and automatically reroutes traffic to healthy instances, until the unhealthy instances have been restored seamlessly using Auto Scaling.
Auto Scaling and Elastic Load Balancing are an ideal combination – while ELB gives a single DNS name for addressing, Auto Scaling ensures there is always the right number of healthy EC2 instances to accept requests.

ELB can be used to balance across instances in multiple AZs of a region.

ELB Types

Application Load Balancer (ALB) – Layer 7 (HTTP/HTTPS); supports path-based routing, host-based routing, mutual TLS authentication (2023), one-click AWS WAF integration, URL and host header rewrites (2025), Automatic Target Weights, and LCU Capacity Reservation for handling sharp traffic spikes.
Network Load Balancer (NLB) – Layer 4 (TCP/UDP/TLS); ultra-low latency, static IPs per AZ, weighted target groups for blue/green deployments, and subnet removal/addition capability (2025).

Gateway Load Balancer (GWLB) – Layer 3 gateway + Layer 4 load balancer; used to deploy, scale, and manage third-party virtual network appliances (firewalls, IDS/IPS).
Classic Load Balancer (CLB) – Previous generation; deprecated for new workloads. AWS recommends migrating to ALB or NLB. CLBs in EC2-Classic were retired in August 2022.

Elastic IPs – EIPs

Elastic IP addresses are public static IP addresses that can be mapped programmatically between instances within a region.

EIPs are associated with the AWS account and not with a specific instance or lifetime of an instance.
Elastic IP addresses can be used for instances and services that require consistent endpoints, such as master databases, central file servers, and EC2-hosted load balancers.
Elastic IP addresses can be used to work around host or availability zone failures by quickly remapping the address to another running instance or a replacement instance that was just started.

Reserved Instances & Savings Plans

Reserved Instances help reserve and guarantee computing capacity is available at a lower cost always.
Savings Plans provide a more flexible pricing model with up to 72% savings in exchange for committing to a consistent amount of compute usage (measured in $/hour) over a 1 or 3-year term.
On-Demand Capacity Reservations (ODCRs) ensure EC2 capacity is available in a specific AZ when needed for HA without requiring a term commitment.

Elastic Block Store – EBS

Elastic Block Store (EBS) offers persistent off-instance storage volumes that persist independently from the life of an instance and are about an order of magnitude more durable than on-instance storage.
EBS volumes store data redundantly and are automatically replicated within a single availability zone.
EBS helps in failover scenarios where if an EC2 instance fails and needs to be replaced, the EBS volume can be attached to the new EC2 instance.

Valuable data should never be stored only on instance (ephemeral) storage without proper backups, replication, or the ability to re-create the data.
EBS Multi-Attach (for io1/io2 volumes) allows a single volume to be attached to up to 16 Nitro-based instances within the same AZ for shared storage HA scenarios.

EBS Snapshots

EBS volumes are highly reliable, but to further mitigate the possibility of a failure and increase durability, point-in-time Snapshots can be created to store data on volumes in S3, which is then replicated to multiple AZs.

Snapshots can be used to create new EBS volumes, which are an exact replica of the original volume at the time the snapshot was taken.
Snapshots provide an effective way to deal with disk failures or other host-level issues, as well as with problems affecting an AZ.
Snapshots are incremental and back up only changes since the previous snapshot, so it is advisable to hold on to recent snapshots.

Snapshots are tied to the region, while EBS volumes are tied to a single AZ.
EBS Snapshots Archive provides up to 75% lower storage costs for snapshots stored 90+ days and rarely accessed.
Fast Snapshot Restore (FSR) eliminates the need for initializing volumes from snapshots, enabling full-performance volumes immediately upon creation for faster failover.

Relational Database Service – RDS

RDS makes it easy to run relational databases in the cloud.
RDS Multi-AZ instance deployments provision a synchronous standby replica in a different AZ, providing high availability and automatic failover protection.
In case of a failover scenario, the standby is promoted to be the primary seamlessly and will handle the database operations.

RDS Multi-AZ DB Cluster deployments (for MySQL and PostgreSQL) provide a primary instance and two readable standby instances across 3 AZs. This offers improved write latency, faster failover (typically under 35 seconds), and the standby instances can serve read traffic.
Automated backups, enabled by default, provide point-in-time recovery for the database instance.
RDS will back up your database and transaction logs and store both for a user-specified retention period.

In addition to the automated backups, manual RDS backups can also be performed which are retained until explicitly deleted.
Backups help recover from higher-level faults such as unintentional data modification, either by operator error or by bugs in the application.
RDS Read Replicas provide read-only replicas of the database and the ability to scale out beyond the capacity of a single database deployment for read-heavy database workloads.

RDS Read Replicas is a scalability and not a High Availability solution. However, cross-region Read Replicas can be manually promoted for disaster recovery.
Amazon RDS now supports ENA Express for Multi-AZ replication (2026), using Scalable Reliable Datagram (SRD) to improve replication performance by distributing traffic across multiple network paths.

Simple Storage Service – S3

S3 provides highly durable (99.999999999% / 11 9s), fault-tolerant and redundant object store.

S3 stores objects redundantly on multiple devices across multiple facilities in an S3 Region.
S3 is a great storage solution for somewhat static or slow-changing objects, such as images, videos, and other static media.
S3 also supports edge caching and streaming of these assets by interacting with the Amazon CloudFront service.

S3 Cross-Region Replication (CRR) automatically replicates objects to a bucket in another region, enabling disaster recovery and low-latency access for globally distributed users.
S3 Express One Zone delivers up to 10x faster performance with single-digit millisecond latency for frequently accessed data, but note it stores data in a single AZ (not suitable as the sole copy for fault tolerance).

Simple Queue Service – SQS

Simple Queue Service (SQS) is a highly reliable distributed messaging system that can serve as the backbone of a fault-tolerant application.

SQS is engineered to provide “at least once” delivery of all messages in standard queues. FIFO queues provide exactly-once processing and strict message ordering.
Messages sent to a queue are retained for up to 4 days (by default, can be extended up to 14 days) or until they are read and deleted by the application.
Messages can be polled by multiple workers and processed, while SQS takes care that a request is processed by only one worker at a time using a configurable time interval called visibility timeout.

If the number of messages in a queue starts to grow or if the average time to process a message becomes too high, workers can be scaled upwards by simply adding additional EC2 instances.
Dead-letter queues (DLQs) capture messages that cannot be processed successfully. DLQ redrive allows moving messages back to source queues for reprocessing.
FIFO queues support up to 70,000 messages per second with high throughput mode and up to 120K in-flight messages (increased from 20K in November 2024).

Route 53

Amazon Route 53 is a highly available and scalable DNS web service.
Queries for the domain are automatically routed to the nearest DNS server and thus are answered with the best possible performance.
Route 53 resolves requests for your domain name (for example, www.example.com) to your Elastic Load Balancer, as well as your zone apex record (example.com).

Route 53 supports multiple routing policies for HA: Failover (active-passive), Latency-based, Weighted, Geolocation, Geoproximity (expanded to public/private hosted zones in 2024), and Multivalue Answer.
Route 53 health checks can monitor endpoint health and trigger DNS failover automatically.
Route 53 Accelerated Recovery (2026) ensures customers can continue making DNS changes even during regional AWS outages, providing greater predictability for mission-critical applications.

CloudFront

CloudFront can be used to deliver website content, including dynamic, static and streaming content using a global network of edge locations.
Requests for your content are automatically routed to the nearest edge location, so content is delivered with the best possible performance.
CloudFront is optimized to work with other Amazon Web Services, like S3 and EC2.

CloudFront also works seamlessly with any non-AWS origin server, which stores the original, definitive versions of your files.
CloudFront Functions run lightweight JavaScript at edge locations for request/response customization. Lambda@Edge provides full compute capabilities at Regional Edge Caches.
VPC Origins allow CloudFront to fetch content directly from private resources within a VPC without exposing them to the public internet.

Origin Shield acts as an additional caching layer to reduce the load on origins and improve cache hit ratios for multi-region architectures.

DynamoDB Global Tables

DynamoDB Global Tables provide a fully managed, multi-Region, multi-active database solution for globally distributed applications.
Global Tables automatically replicate data across your choice of AWS Regions. Every replica table in every Region can accept both reads and writes.

Changes made to an item in one Region are typically replicated to all other replica Regions within a second.
Multi-Region Strong Consistency (MRSC), generally available since June 2025, provides zero RPO (Recovery Point Objective) by enabling strongly consistent reads from any Region. This is the highest level of application resilience for DynamoDB.
Global Tables now support cross-account replication (2026), enabling multi-account multi-region architectures.

Global Tables replace the previous cross-region replication approach (DynamoDB Streams-based) with a fully managed, zero-administration solution.

AWS Resilience Hub

AWS Resilience Hub is a central location to define, track, and manage the resilience of applications.
It enables you to define resilience goals (RTO/RPO), assess your resilience posture against those goals, and implement recommendations based on the AWS Well-Architected Framework.

Resilience Hub performs automated resilience assessments and identifies gaps in your architecture, such as missing Multi-AZ deployments or lack of backup strategies.
Integrates with AWS Fault Injection Service (FIS) to run chaos experiments directly from the Resilience Hub console.
The next generation of Resilience Hub (GA May 2026) uses generative AI to provide a structured resilience journey for SRE and development teams.

AWS Fault Injection Service (FIS)

AWS FIS is a managed chaos engineering service that enables you to perform controlled fault injection experiments on your AWS workloads.
FIS helps simulate real-world failures (AZ disruptions, instance failures, network degradation, API throttling) to validate fault tolerance of your architecture.
Supports actions targeting EC2, ECS, EKS, RDS, Lambda functions (native integration since October 2024), and more.

Amazon.com ran 733 AWS FIS experiments to prepare for Prime Day 2024.
Experiments can be generated using natural language through Amazon Bedrock integration (2025).

Amazon Application Recovery Controller (ARC)

ARC helps manage and coordinate recovery for applications across AWS Regions and Availability Zones.

Zonal Shift allows you to quickly shift traffic for a resource (ALB, NLB, EKS, Auto Scaling group) away from an impaired AZ to healthy AZs.
Zonal Autoshift enables AWS to automatically shift traffic away from an AZ when internal telemetry detects a potential impairment — without manual intervention.
Routing Controls provide manual override capabilities for cross-region failover of applications.

Zonal shift and zonal autoshift are available at no additional cost.
Supported resources include ALB, NLB, EC2 Auto Scaling groups, EKS clusters, and Karpenter (2026).

AWS Certification Exam Practice Questions

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated.

Open to further feedback, discussion and correction.

You are moving an existing traditional system to AWS, and during the migration discover that there is a master server which is a single point of failure. Having examined the implementation of the master server you realize there is not enough time during migration to re-engineer it to be highly available, though you do discover that it stores its state in a local MySQL database. In order to minimize down-time you select RDS to replace the local database and configure master to use it, what steps would best allow you to create a self-healing architecture[PROFESSIONAL]
1. Migrate the local database into multi-AZ RDS database. Place master node into a multi-AZ auto-scaling group with a minimum of one and maximum of one with health checks.
2. Replicate the local database into a RDS read replica. Place master node into a Cross-Zone ELB with a minimum of one and maximum of one with health checks. (Read Replica does not provide HA and write capability and ELB does not have feature for Min and Max 1 and Cross Zone allows just the equal distribution of load across instances)
3. Migrate the local database into multi-AZ RDS database. Place master node into a Cross-Zone ELB with a minimum of one and maximum of one with health checks. (ELB does not have feature for Min and Max 1 and Cross Zone allows just the equal distribution of load across instances)
4. Replicate the local database into a RDS read replica. Place master node into a multi-AZ auto-scaling group with a minimum of one and maximum of one with health checks. (Read Replica does not provide HA and write capability)
You are designing Internet connectivity for your VPC. The Web servers must be available on the Internet. The application must have a highly available architecture. Which alternatives should you consider? (Choose 2 answers)
1. Configure a NAT instance in your VPC. Create a default route via the NAT instance and associate it with all subnets. Configure a DNS A record that points to the NAT instance public IP address (NAT is for internet connectivity for instances in private subnet)
2. Configure a CloudFront distribution and configure the origin to point to the private IP addresses of your Web servers. Configure a Route53 CNAME record to your CloudFront distribution.
3. Place all your web servers behind ELB. Configure a Route53 CNAME to point to the ELB DNS name.
4. Assign EIPs to all web servers. Configure a Route53 record set with all EIPs. With health checks and DNS failover.

When deploying a highly available 2-tier web application on AWS, which combination of AWS services meets the requirements? 1. AWS Direct Connect 2. Amazon Route 53 3. AWS Storage Gateway 4. Elastic Load Balancing 4. Amazon EC2 5. Auto scaling 6. Amazon VPC 7. AWS Cloud Trail [PROFESSIONAL]
1. 2,4,5 and 6
2. 3,4,5 and 8
3. 1 through 8
4. 1,3,5 and 7
5. 1,2,5 and 6
Company A has hired you to assist with the migration of an interactive website that allows registered users to rate local restaurants. Updates to the ratings are displayed on the home page, and ratings are updated in real time. Although the website is not very popular today, the company anticipates that It will grow rapidly over the next few weeks. They want the site to be highly available. The current architecture consists of a single Windows Server 2008 R2 web server and a MySQL database running on Linux. Both reside inside an on-premises hypervisor. What would be the most efficient way to transfer the application to AWS, ensuring performance and high-availability? [PROFESSIONAL]
1. Export web files to an Amazon S3 bucket in us-west-1. Run the website directly out of Amazon S3. Launch a multi-AZ MySQL Amazon RDS instance in us-west-1a. Import the data into Amazon RDS from the latest MySQL backup. Use Route 53 and create an alias record pointing to the elastic load balancer. (Its an Interactive website, although it can be implemented using Javascript SDK, its a migration and the application would need changes. Also no use of ELB if hosted on S3)
2. Launch two Windows Server 2008 R2 instances in us-west-1b and two in us-west-1a. Copy the web files from on premises web server to each Amazon EC2 web server, using Amazon S3 as the repository. Launch a multi-AZ MySQL Amazon RDS instance in us-west-2a. Import the data into Amazon RDS from the latest MySQL backup. Create an elastic load balancer to front your web servers. Use Route 53 and create an alias record pointing to the elastic load balancer. (Although RDS instance is in a different region which will impact performance, this is the only option that works.)
3. Use AWS VM Import/Export to create an Amazon Elastic Compute Cloud (EC2) Amazon Machine Image (AMI) of the web server. Configure Auto Scaling to launch two web servers in us-west-1a and two in us-west-1b. Launch a Multi-AZ MySQL Amazon Relational Database Service (RDS) instance in us-west-1b. Import the data into Amazon RDS from the latest MySQL backup. Use Amazon Route 53 to create a hosted zone and point an A record to the elastic load balancer. (does not create a load balancer)
4. Use AWS VM Import/Export to create an Amazon EC2 AMI of the web server. Configure auto-scaling to launch two web servers in us-west-1a and two in us-west-1b. Launch a multi-AZ MySQL Amazon RDS instance in us-west-1a. Import the data into Amazon RDS from the latest MySQL backup. Create an elastic load balancer to front your web servers. Use Amazon Route 53 and create an A record pointing to the elastic load balancer. (Need to create an aliased record without which the Route 53 pointing to ELB would not work)

Your company runs a customer facing event registration site. This site is built with a 3-tier architecture with web and application tier servers and a MySQL database. The application requires 6 web tier servers and 6 application tier servers for normal operation, but can run on a minimum of 65% server capacity and a single MySQL database. When deploying this application in a region with three availability zones (AZs) which architecture provides high availability? [PROFESSIONAL]
1. A web tier deployed across 2 AZs with 3 EC2 (Elastic Compute Cloud) instances in each AZ inside an Auto Scaling Group behind an ELB (elastic load balancer), and an application tier deployed across 2 AZs with 3 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB. and one RDS (Relational Database Service) instance deployed with read replicas in the other AZ.
2. A web tier deployed across 3 AZs with 2 EC2 (Elastic Compute Cloud) instances in each AZ inside an Auto Scaling Group behind an ELB (elastic load balancer) and an application tier deployed across 3 AZs with 2 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB and one RDS (Relational Database Service) Instance deployed with read replicas in the two other AZs.
3. A web tier deployed across 2 AZs with 3 EC2 (Elastic Compute Cloud) instances in each AZ inside an Auto Scaling Group behind an ELB (elastic load balancer) and an application tier deployed across 2 AZs with 3 EC2 instances m each AZ inside an Auto Scaling Group behind an ELB and a Multi-AZ RDS (Relational Database Service) deployment.
4. A web tier deployed across 3 AZs with 2 EC2 (Elastic Compute Cloud) instances in each AZ Inside an Auto Scaling Group behind an ELB (elastic load balancer). And an application tier deployed across 3 AZs with 2 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB. And a Multi-AZ RDS (Relational Database services) deployment.
For a 3-tier, customer facing, inclement weather site utilizing a MySQL database running in a Region which has two AZs which architecture provides fault tolerance within the region for the application that minimally requires 6 web tier servers and 6 application tier servers running in the web and application tiers and one MySQL database? [PROFESSIONAL]
1. A web tier deployed across 2 AZs with 6 EC2 (Elastic Compute Cloud) instances in each AZ inside an Auto Scaling Group behind an ELB (elastic load balancer), and an application tier deployed across 2 AZs with 6 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB. and a Multi-AZ RDS (Relational Database Service) deployment. (As it needs Fault Tolerance with minimal 6 servers always available)
2. A web tier deployed across 2 AZs with 3 EC2 (Elastic Compute Cloud) instances in each A2 inside an Auto Scaling Group behind an ELB (elastic load balancer) and an application tier deployed across 2 AZs with 3 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB and a Multi-AZ RDS (Relational Database Service) deployment.
3. A web tier deployed across 2 AZs with 3 EC2 (Elastic Compute Cloud) instances in each AZ inside an Auto Scaling Group behind an ELB (elastic load balancer) and an application tier deployed across 2 AZs with 6 EC2 instances in each AZ inside an Auto Scaling Group behind an ELB and one RDS (Relational Database Service) Instance deployed with read replicas in the other AZs.
4. A web tier deployed across 1 AZs with 6 EC2 (Elastic Compute Cloud) instances in each AZ Inside an Auto Scaling Group behind an ELB (elastic load balancer). And an application tier deployed in the same AZs with 6 EC2 instances inside an Auto scaling group behind an ELB and a Multi-AZ RDS (Relational Database services) deployment, with 6 stopped web tier EC2 instances and 6 stopped application tier EC2 instances all in the other AZ ready to be started if any of the running instances in the first AZ fails.
You are designing a system which needs, at minimum, 8 m4.large instances operating to service traffic. When designing a system for high availability in the us-east-1 region, which has 6 Availability Zones, you company needs to be able to handle death of a full availability zone. How should you distribute the servers, to save as much cost as possible, assuming all of the EC2 nodes are properly linked to an ELB? Your VPC account can utilize us-east-1’s AZ’s a through f, inclusive.
1. 3 servers in each of AZ’s a through d, inclusive.
2. 8 servers in each of AZ’s a and b.
3. 2 servers in each of AZ’s a through e, inclusive. (You need to design for N+1 redundancy on Availability Zones. ZONE_COUNT = (REQUIRED_INSTANCES / INSTANCE_COUNT_PER_ZONE) + 1. To minimize cost, spread the instances across as many possible zones as you can. By using a though e, you are allocating 5 zones. Using 2 instances, you have 10 total instances. If a single zone fails, you have 4 zones left, with 2 instances each, for a total of 8 instances. By spreading out as much as possible, you have increased cost by only 25% and significantly de-risked an availability zone failure. Refer link)
4. 4 servers in each of AZ’s a through c, inclusive.
You need your API backed by DynamoDB to stay online during a total regional AWS failure. You can tolerate a couple minutes of lag or slowness during a large failure event, but the system should recover with normal operation after those few minutes. What is a good approach? [PROFESSIONAL]
1. Set up DynamoDB Global Tables in a multi-active configuration across two regions. Create an Auto Scaling Group behind an ELB in each of the two regions. Add a Route53 Latency DNS Record with DNS Failover, using the ELBs in the two regions as the resource records. (Use DynamoDB Global Tables (multi-active replication) with two ELBs and ASGs with Route53 Failover and Latency DNS. Note: DynamoDB Global Tables now also support Multi-Region Strong Consistency (MRSC) for zero RPO since June 2025.)
2. Set up a DynamoDB Multi-Region table. Create an Auto Scaling Group behind an ELB in each of the two regions DynamoDB is running in. Add a Route53 Latency DNS Record with DNS Failover, using the ELBs in the two regions as the resource records. (This is now essentially correct with DynamoDB Global Tables being the multi-region solution. However at the time of the question, this option was considered incorrect.)
3. Set up a DynamoDB Multi-Region table. Create a cross-region ELB pointing to a cross-region Auto Scaling Group, and direct a Route53 Latency DNS Record with DNS Failover to the cross-region ELB. (No such thing as Cross Region ELB or cross-region ASG)
4. Set up DynamoDB cross-region replication in a master-standby configuration, with a single standby in another region. Create a cross-region ELB pointing to a cross-region Auto Scaling Group, and direct a Route53 Latency DNS Record with DNS Failover to the cross-region ELB. (No such thing as cross-region ELB or cross-region ASG)
You are putting together a WordPress site for a local charity and you are using a combination of Route53, Elastic Load Balancers, EC2 & RDS. You launch your EC2 instance, download WordPress and setup the configuration files connection string so that it can communicate to RDS. When you browse to your URL however, nothing happens. Which of the following could NOT be the cause of this.
1. You have forgotten to open port 80/443 on your security group in which the EC2 instance is placed.
2. Your elastic load balancer has a health check, which is checking a webpage that does not exist; therefore your EC2 instance is not in service.
3. You have not configured an ALIAS for your A record to point to your elastic load balancer
4. You have locked port 22 down to your specific IP address therefore users cannot access your site using HTTP/HTTPS

A development team that is currently doing a nightly six-hour build which is lengthening over time on-premises with a large and mostly under utilized server would like to transition to a continuous integration model of development on AWS with multiple builds triggered within the same day. However, they are concerned about cost, security and how to integrate with existing on-premises applications such as their LDAP and email servers, which cannot move off-premises. The development environment needs a source code repository; a project management system with a MySQL database resources for performing the builds and a storage location for QA to pick up builds from. What AWS services combination would you recommend to meet the development team’s requirements? [PROFESSIONAL]
1. A Bastion host Amazon EC2 instance running a VPN server for access from on-premises, Amazon EC2 for the source code repository with attached Amazon EBS volumes, Amazon EC2 and Amazon RDS MySQL for the project management system, EIP for the source code repository and project management system, Amazon SQS for a build queue, An Amazon Auto Scaling group of Amazon EC2 instances for performing builds and Amazon Simple Email Service for sending the build output. (Bastion is not for VPN connectivity also SES should not be used)
2. An AWS Storage Gateway for connecting on-premises software applications with cloud-based storage securely, Amazon EC2 for the resource code repository with attached Amazon EBS volumes, Amazon EC2 and Amazon RDS MySQL for the project management system, EIPs for the source code repository and project management system, Amazon Simple Notification Service for a notification initiated build, An Auto Scaling group of Amazon EC2 instances for performing builds and Amazon S3 for the build output. (Storage Gateway does not provide secure connectivity, still needs VPN. SNS alone cannot handle builds)
3. An AWS Storage Gateway for connecting on-premises software applications with cloud-based storage securely, Amazon EC2 for the resource code repository with attached Amazon EBS volumes, Amazon EC2 and Amazon RDS MySQL for the project management system, EIPs for the source code repository and project management system, Amazon SQS for a build queue, An Amazon Elastic Map Reduce (EMR) cluster of Amazon EC2 instances for performing builds and Amazon CloudFront for the build output. (Storage Gateway does not provide secure connectivity, still needs VPN. EMR is not ideal for performing builds as it needs normal EC2 instances)
4. A VPC with a VPN Gateway back to their on-premises servers, Amazon EC2 for the source-code repository with attached Amazon EBS volumes, Amazon EC2 and Amazon RDS MySQL for the project management system, EIPs for the source code repository and project management system, SQS for a build queue, An Auto Scaling group of EC2 instances for performing builds and S3 for the build output. (VPN gateway is required for secure connectivity. SQS for build queue and EC2 for builds)
Which of the following AWS services and features are essential for building a modern, highly available fault-tolerant architecture? (Choose 3) [NEW – 2025]
1. Amazon Application Recovery Controller (ARC) with zonal autoshift
2. AWS CloudTrail
3. AWS Fault Injection Service (FIS) for resilience testing
4. RDS Multi-AZ DB Cluster with readable standbys
5. Amazon Inspector
A company needs its DynamoDB-backed application to survive a complete regional failure with zero data loss (zero RPO). Which approach best achieves this requirement? [NEW – 2025]
1. Use DynamoDB Streams to replicate data to another region manually.
2. Use DynamoDB point-in-time recovery (PITR) with cross-region backups.
3. Use DynamoDB Global Tables with Multi-Region Strong Consistency (MRSC). (MRSC, GA since June 2025, enables zero RPO with strongly consistent reads from any region.)
4. Use DynamoDB On-Demand backup and restore to a secondary region.
An application runs behind an Application Load Balancer across 3 AZs. During an AZ impairment detected by AWS, what feature can automatically redirect traffic away from the affected AZ without manual intervention? [NEW – 2025]
1. Route 53 health check failover
2. ALB Cross-Zone load balancing
3. Amazon Application Recovery Controller (ARC) zonal autoshift (ARC zonal autoshift automatically shifts traffic away from an impaired AZ when AWS internal telemetry detects issues, without requiring manual intervention.)
4. Auto Scaling AZ rebalancing

References

AWS Intrusion Detection & Prevention System IDS/IPS

December 22, 2016 ~ Last updated on : April 4, 2023 ~ jayendrapatil ~ 13 Comments

AWS Intrusion Detection & Prevention System IDS/IPS

An Intrusion Prevention System IPS
- is an appliance that monitors and analyzes network traffic to detect malicious patterns and potentially harmful packets and prevent vulnerability exploits
- Most IPS offer firewall, unified threat management and routing capabilities

An Intrusion Detection System IDS is
- an appliance or capability that continuously monitors the environment
- sends alerts when it detects malicious activity, policy violations or network & system attack from someone attempting to break into or compromise the system
- produces reports for analysis.

Approaches for AWS IDS/IPS

Network Tap or SPAN

Traditional approach involves using a network Test Access Point (TAP) or Switch Port Analyzer (SPAN) to access & monitor all network traffic.

Connection between the AWS Internet Gateway (IGW) and the Elastic Load Balancer would be an ideal place to capture all network traffic.
However, there is no place to plug this in between IGW and ELB as there are no SPAN ports, network taps, or a concept of Layer 2 bridging

Packet Sniffing

It is not possible for a virtual instance running in promiscuous mode to receive or sniff traffic that is intended for a different virtual instance.

While interfaces can be placed into promiscuous mode, the hypervisor will not deliver any traffic to an instance that is not addressed to it.
Even two virtual instances that are owned by the same customer located on the same physical host cannot listen to each other’s traffic
So, promiscuous mode is not allowed

Host Based Firewall – Forward Deployed IDS

Deploy a network-based IDS on every instance you deploy IDS workload scales with your infrastructure

Host-based security software works well with highly distributed and scalable application architectures because network packet inspection is distributed across the entire software fleet
However, CPU-intensive process is deployed onto every single machine.

Host Based Firewall – Traffic Replication

An Agent is deployed on every instance to capture & replicate traffic for centralized analysis
Actual workload of network traffic analysis is not performed on the instance but on a separate server

Traffic capture and replication is still CPU-intensive (particularly on Windows machines.)
It significantly increases the internal network traffic in the environment as every inbound packet is duplicated in the transfer from the instance that captures the traffic to the instance that analyzes the traffic

In-Line Firewall – Inbound IDS Tier

Add another tier to the application architecture where a load balancer sends all inbound traffic to a tier of instances that performs the network analysis for e.g. Third Party Solution Fortinet FortiGate

IDS workload is now isolated to a horizontally scalable tier in the architecture You have to maintain and manage another mission-critical elastic tier in the architecture

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A web company is looking to implement an intrusion detection and prevention system into their deployed VPC. This platform should have the ability to scale to thousands of instances running inside of the VPC. How should they architect their solution to achieve these goals?
1. Configure an instance with monitoring software and the elastic network interface (ENI) set to promiscuous mode packet sniffing to see an traffic across the VPC. (virtual instance running in promiscuous mode to receive or“sniff” traffic)
2. Create a second VPC and route all traffic from the primary application VPC through the second VPC where the scalable virtualized IDS/IPS platform resides.
3. Configure servers running in the VPC using the host-based ‘route’ commands to send all traffic through the platform to a scalable virtualized IDS/IPS (host based routing is not allowed)
4. Configure each host with an agent that collects all network traffic and sends that traffic to the IDS/IPS platform for inspection.

You are designing an intrusion detection prevention (IDS/IPS) solution for a customer web application in a single VPC. You are considering the options for implementing IDS/IPS protection for traffic coming from the Internet. Which of the following options would you consider? (Choose 2 answers)
1. Implement IDS/IPS agents on each Instance running In VPC
2. Configure an instance in each subnet to switch its network interface card to promiscuous mode and analyze network traffic. (virtual instance running in promiscuous mode to receive or“sniff” traffic)
3. Implement Elastic Load Balancing with SSL listeners In front of the web applications (ELB with SSL does not serve as IDS/IPS)
4. Implement a reverse proxy layer in front of web servers and configure IDS/IPS agents on each reverse proxy server

References

Network based intrusion detection in AWS

AWS Risk and Compliance – Whitepaper – Certification

November 7, 2016 ~ Last updated on : February 9, 2017 ~ jayendrapatil ~ 2 Comments

AWS Risk and Compliance Whitepaper Overview

AWS Risk and Compliance Whitepaper is intended to provide information to assist AWS customers with integrating AWS into their existing control framework supporting their IT environment.

AWS does communicate its security and control environment relevant to customers. AWS does this by doing the following:
- Obtaining industry certifications and independent third-party attestations described in this document
- Publishing information about the AWS security and control practices in whitepapers and web site content
- Providing certificates, reports, and other documentation directly to AWS customers under NDA (as required)

Shared Responsibility model

AWS’ part in the shared responsibility includes
- providing its services on a highly secure and controlled platform and providing a wide array of security features customers can use
- relieves the customer’s operational burden as AWS operates, manages and controls the components from the host operating system and virtualization layer down to the physical security of the facilities in which the service operates
Customers’ responsibility includes
- configuring their IT environments in a secure and controlled manner for their purposes
- responsibility and management of the guest operating system (including updates and security patches), other associated application software as well as the configuration of the AWS provided security group firewall
- stringent compliance requirements by leveraging technology such as host based firewalls, host based intrusion detection/prevention, encryption and key management
- relieve customer burden of operating controls by managing those controls associated with the physical infrastructure deployed in the AWS environment

Risk and Compliance Governance

AWS provides a wide range of information regarding its IT control environment to customers through white papers, reports, certifications, and other third-party attestations
AWS customers are required to continue to maintain adequate governance over the entire IT control environment regardless of how IT is deployed.

Leading practices include
- an understanding of required compliance objectives and requirements (from relevant sources),
- establishment of a control environment that meets those objectives and requirements,
- an understanding of the validation required based on the organization’s risk tolerance,
- and verification of the operating effectiveness of their control environment.
Strong customer compliance and governance might include the following basic approach:
- Review information available from AWS together with other information to understand as much of the entire IT environment as possible, and then document all compliance requirements.
- Design and implement control objectives to meet the enterprise compliance requirements.
- Identify and document controls owned by outside parties.
- Verify that all control objectives are met and all key controls are designed and operating effectively.

Approaching compliance governance in this manner helps companies gain a better understanding of their control environment and will help clearly delineate the verification activities to be performed.

AWS Certifications, Programs, Reports, and Third-Party Attestations

AWS engages with external certifying bodies and independent auditors to provide customers with considerable information regarding the policies, processes, and controls established and operated by AWS.
AWS provides third-party attestations, certifications, Service Organization Controls (SOC) reports and other relevant compliance reports directly to our customers under NDA.

Key Risk and Compliance Questions

Shared Responsibility
- AWS controls the physical components of that technology.
- Customer owns and controls everything else, including control over connection points and transmissions
Auditing IT
- Auditing for most layers and controls above the physical controls remains the responsibility of the customer
- AWS ISO 27001 and other certifications are available for auditors review
- AWS-defined logical and physical controls is documented in the SOC 1 Type II report and available for review by audit and compliance teams

Data location
- AWS customers control which physical region their data and their servers will be located
- AWS replicates the data only within the region
- AWS will not move customers’ content from the selected Regions without notifying the customer, unless required to comply with the law or requests of governmental entities

Data center tours
- As AWS host multiple customers, AWS does not allow data center tours by customers, as this exposes a wide range of customers to physical access of a third party.
- An independent and competent auditor validates the presence and operation of controls as part of our SOC 1 Type II report.
- This third-party validation provides customers with the independent perspective of the effectiveness of controls in place.
- AWS customers that have signed a non-disclosure agreement with AWS may request a copy of the SOC 1 Type II report.
Third-party access
- AWS strictly controls access to data centers, even for internal employees.
- Third parties are not provided access to AWS data centers except when explicitly approved by the appropriate AWS data center manager per the AWS access policy
Multi-tenancy
- AWS environment is a virtualized, multi-tenant environment.
- AWS has implemented security management processes, PCI controls, and other security controls designed to isolate each customer from other customers.
- AWS systems are designed to prevent customers from accessing physical hosts or instances not assigned to them by filtering through the virtualization software.
Hypervisor vulnerabilities
- Amazon EC2 utilizes a highly customized version of Xen hypervisor.
- Hypervisor is regularly assessed for new and existing vulnerabilities and attack vectors by internal and external penetration teams, and is well suited for maintaining strong isolation between guest virtual machines
Vulnerability management
- AWS is responsible for patching systems supporting the delivery of service to customers, such as the hypervisor and networking services
Encryption
- AWS allows customers to use their own encryption mechanisms for nearly all the services, including S3, EBS, SimpleDB, and EC2.
- IPSec tunnels to VPC are also encrypted

Data isolation
- All data stored by AWS on behalf of customers has strong tenant isolation security and control capabilities
Composite services
- AWS does not leverage any third-party cloud providers to deliver AWS services to customers.
Distributed Denial Of Service (DDoS) attacks
- AWS network provides significant protection against traditional network security issues and the customer can implement further protection

Data portability
- AWS allows customers to move data as needed on and off AWS storage
Service & Customer provider business continuity
- AWS does operate a business continuity program
- AWS data centers incorporate physical protection against environmental risks.
- AWS’ physical protection against environmental risks has been validated by an independent auditor and has been certified
- AWS provides customers with the capability to implement a robust continuity plan with multi region/AZ deployment architectures, backups, data redundancy replication
Capability to scale
- AWS cloud is distributed, highly secure and resilient, giving customers massive scale potential.
- Customers may scale up or down, paying for only what they use
Service availability
- AWS does commit to high levels of availability in its service level agreements (SLA) for e.g. S3 99.9%
Application Security
- AWS system development lifecycle incorporates industry best practices which include formal design reviews by the AWS Security Team, source code analysis, threat modeling and completion of a risk assessment
- AWS does not generally outsource development of software.

Threat and Vulnerability Management
- AWS Security regularly engages independent security firms to perform external vulnerability threat assessments
- AWS Security regularly scans all Internet facing service endpoint IP addresses for vulnerabilities, but do not include customer instances
- AWS Security notifies the appropriate parties to remediate any identified vulnerabilities.
- Customers can request permission to conduct scans and Penetration tests of their cloud infrastructure as long as they are limited to the customer’s instances and do not violate the AWS Acceptable Use Policy. Advance approval for these types of scans is required
Data Security

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

When preparing for a compliance assessment of your system built inside of AWS. What are three best practices for you to prepare for an audit? Choose 3 answers
1. Gather evidence of your IT operational controls (Customer still needs to gather all the IT operation controls inline with their environment)
2. Request and obtain applicable third-party audited AWS compliance reports and certifications (Customers can request the reports and certifications produced by our third-party auditors or can request more information about AWS Compliance)
3. Request and obtain a compliance and security tour of an AWS data center for a pre-assessment security review (AWS does not allow data center tour)
4. Request and obtain approval from AWS to perform relevant network scans and in-depth penetration tests of your system’s Instances and endpoints (AWS requires prior approval to be taken to perform penetration tests)
5. Schedule meetings with AWS’s third-party auditors to provide evidence of AWS compliance that maps to your control objectives (Customers can request the reports and certifications produced by our third-party auditors or can request more information about AWS Compliance)
In the shared security model, AWS is responsible for which of the following security best practices (check all that apply) :
1. Penetration testing
2. Operating system account security management
3. Threat modeling
4. User group access management
5. Static code analysis

You are running a web-application on AWS consisting of the following components an Elastic Load Balancer (ELB) an Auto-Scaling Group of EC2 instances running Linux/PHP/Apache, and Relational DataBase Service (RDS) MySQL. Which security measures fall into AWS’s responsibility?
1. Protect the EC2 instances against unsolicited access by enforcing the principle of least-privilege access (Customer owned)
2. Protect against IP spoofing or packet sniffing
3. Assure all communication between EC2 instances and ELB is encrypted (Customer owned)
4. Install latest security patches on ELB, RDS and EC2 instances (Customer owned)
Which of the following statements is true about achieving PCI certification on the AWS platform? (Choose 2)
1. Your organization owns the compliance initiatives related to anything placed on the AWS infrastructure
2. Amazon EC2 instances must run on a single-tenancy environment (dedicated instance)
3. AWS manages card-holder environments
4. AWS Compliance provides assurance related to the underlying infrastructure

References

AWS_Risk_and_Compliance_Whitepaper.pdf

AWS Import/Export – Certification

October 14, 2016 ~ Last updated on : December 5, 2017 ~ jayendrapatil ~ 11 Comments

AWS Import/Export Disk

AWS Import/Export accelerates moving large amounts of data into and out of AWS using portable storage devices for transport

AWS transfers the data directly onto and off of storage devices using Amazon’s high-speed internal network, bypassing the Internet, and can be much faster and more cost effective than upgrading connectivity.
AWS Import/Export can be implemented in two different ways
- AWS Import/Export Disk (Disk)
  - originally the only service offered by AWS for data transfer by mail
  - Disk supports transfers data directly onto and off of storage devices you own using the Amazon high-speed internal network
- AWS Snowball
  - is generally faster and cheaper to use than Disk for importing data into Amazon S3
AWS Import/Export supports
- importing data to several types of AWS storage, including EBS snapshots, S3 buckets, and Glacier vaults.
- exporting data out from S3 only
Data load typically begins the next business day after the storage device arrives at AWS and after the data export or import completes, the storage device is returned

Ideal Usage Patterns

AWS Import/Export is ideal for transferring large amounts of data in and out of the AWS cloud, especially in cases where transferring the data over the Internet would be too slow (a week or more) or too costly.

Common use cases include
- first time migration – initial data upload to AWS
- content distribution or regular data interchange to/from your customers or business associates,
- off-site backup – transfer to Amazon S3 or Amazon Glacier for off-site backup and archival storage, and
- disaster recovery – quick retrieval (export) of large backups from Amazon S3 or Amazon Glacier

AWS Import/Export Disk Jobs

AWS Import/Export jobs can be created in 2 steps
- Submit a Job request to AWS where each job corresponds to exactly one storage device
- Send your storage device to AWS, which after the data is uploaded or downloaded is returned back
AWS Import/Export jobs can be created
- using a command line tool, which requires no programming or
- programmatically using the AWS SDK for Java or the REST API to send requests to AWS or
- even through third party tools
AWS Import/Export Data Encrption
- supports data encryption methods
  - PIN-code encryption, Hardware-based device encryption that uses a physical PIN pad for access to the data.
  - TrueCrypt software encryption, Disk encryption using TrueCrypt, which is an open-source encryption application.
- Creating an import or export job with encryption requires providing the PIN code or password for the selected encryption method
- Although is is not mandatory for the data to be encrypted for import jobs, it is highly recommended
- All export jobs require data encryption can use either hardware encryption or software encryption or both methods.
AWS Import/Export supported Job Types
- Import to S3
- ~~Import to Glacier~~ (Import to Glacier is no longer supported by AWS. Refer Updates)
- Import to EBS
- Export to S3
AWS erases the device after every import job prior to return shipping.

Guidelines and Limitations

AWS Import/Export does not support Server-Side Encryption (SSE) when importing data.

Maximum file size of a single file or object to be imported is 5 TB. Files and objects larger than 5 TB won’t be imported.
Maximum device capacity is 16 TB for Amazon Simple Storage Service (Amazon S3) and Amazon EBS jobs.
Maximum device capacity is 4 TB for Amazon Glacier jobs.

AWS Import/Export exports only the latest version from an Amazon S3 bucket that has versioning turned on.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

You are working with a customer who has 10 TB of archival data that they want to migrate to Amazon Glacier. The customer has a 1-Mbps connection to the Internet. Which service or feature provides the fastest method of getting the data into Amazon Glacier?
1. Amazon Glacier multipart upload
2. AWS Storage Gateway
3. VM Import/Export
4. AWS Import/Export (Normal upload will take ~900 days as the internet max speed is capped)

References

AWS Import & Export Developer Guide

AWS Associate Certification Exams – Preparation – Sample Questions

October 2, 2016 ~ Last updated on : July 18, 2017 ~ jayendrapatil ~ 181 Comments

AWS Solution Architect & SysOps Associate Certification Exams Preparation & Sample Questions

I recently passed AWS Solution Architect – Associate (90%) & SysOps – Associate (81%) certification exams.

I would like to share my preparation leading to and experience for the exams

AWS Certification exams are pretty tough to crack as they cover a lot of topics from a wide range of services offered by them.

I cleared both the Solution Architect and SysOps Associate certifications in a time frame of 2 months.
I had 6 months of prior hands-on experience with AWS primarily on IAM, VPC, EC2, S3 & RDS which helped a lot
There are lot of resources online which can be helpful but are overwhelming as well as misguide you (I found lot of dumps which have sample exam questions but the answers are marked wrong)

AWS Associate certifications although can be cleared with complete theoretical knowledge, a bit of hands on really helps a lot.
Also, AWS services are update literally everyday with new features being added, issues resolved and so on, which the exam questions surely don’t keep a track off. Not sure how often the exam questions are updated.
So my suggestion is if you see a question which focuses on a scenario which added latest by AWS within a month, still don’t go with that answer and stick to the answer which was relevant before the update for e.g. encryption of Root volume usually made in the certification exam with options to use external tools and was enabled by AWS recently.

AWS Certification Exam Preparation

As I mentioned there are lot of resources and courses online for the Certification exam which can be overwhelming, this is what I did for my preparation to clear the exams

- Went through AWS Certification Preparation guide
- Went through the AWS Solution Architect & SysOps blue print thoroughly as it mentions the topics and the weightage in the exam
- Purchased the acloud guru course from udemy (got it for $10 on discount) for both the AWS Certified Solutions Architect – Associate 2017 and AWS Certified SysOps Administrator – Associate 2017 course, which greatly helped to have a clear picture of the the format, topics and relevant sections
- Signed up with AWS for the Free Tier account which provides a lot of the Services to be tried for free with certain limits which are more then enough to get things going. Be sure to decommission anything, if you using any thing beyond the free limits, preventing any surprises 🙂
- Also, used the QwikLabs for all the introductory courses which are free and allow you to try out the services multiple times (I think its max 5, as I got the warnings couple of times)
- Update: Qwiklabs seems to have reduced the free courses quite a lot and now provide targeted labs for AWS Certification exams which are charged
- Went through the few Whitepapers especially the
  - DDOS
  - Security Best Practices
- Read the FAQs atleast for the important topics, as they cover important points and are good for quick review
- Went through multiple sites to consolidate the Sample exam questions and worked on them to get the correct answers. I have tried to consolidate them further in this blog topic wise.
- Went through multiple discussion topics on the acloud guru course which are pretty interesting and provides further insights and some of them are actually certification exam questions
- I did not purchase the AWS Practice exams, as the questions are available all around. But if you want to check the format, it might be useful.
- Opinion : acloud guru course are good by itself but is not sufficient to pass the exam but might help to counter about 50-60% of exam questions
- Also, if you are well prepared the time for the certification exam is more then enough and I could answer all the questions within an hour and was able to run a review on all them once.
- Important Exam Time Tip: Only mark the questions which you doubt as Mark for Review and then go through them only. I did the mistake marking quite a few as Mark for Review, even though I was confident on the answers, and wasting time on them again.
- You can also check on
  - Braincert AWS Solution Architect – Associate Practice Exam
    - Set of extensive questions, with very nice, accurate & detailed explanation
  - Whizlabs AWS Solutions Architect Associate Exam and AWS SysOps Administrator Associate Exam exams which has practice exams

AWS Associate Certification Exam Important Topics

Both Solution Architect & Sysops concentrate on a variety of AWS services
Important topics with 70-80% coverage
- IAM
  - IAM Roles
  - IAM Best Practices
- VPC
- EC2
- S3
  - S3 General
  - S3 Storage classes, Permissions, Object lifecycle & Versioning
- Whitepapers
- CloudWatch Monitoring & Troubleshooting – Primarily for SysOps

Other topics with 20-30% coverage
Can expect questions from SWF, AWS Support, Cloud HSM (supported services), Trusted Advisor, Storage Gateway, Direct Connect, SNS, Consolidated Billing

AWS SWF – Simple Workflow Overview – Certification

September 29, 2016 ~ Last updated on : March 24, 2017 ~ jayendrapatil ~ 11 Comments

AWS SWF – Simple Workflow

AWS SWF makes it easy to build applications that coordinate work across distributed components

SWF makes it easier to develop asynchronous and distributed applications by providing a programming model and infrastructure for coordinating distributed components, tracking and maintaining their execution state in a reliable way
SWF does the following
- stores metadata about a workflow and its component parts.
- stores task for workers and queues them until a Worker needs them.
- assigns task to workers, which can run either on cloud or on-premises
- routes information between executions of a workflow and the associated Workers.
- tracks the progress of workers on Tasks, with configurable timeouts.
- maintains workflow state in a durable fashion
SWF helps coordinating tasks across the application which involves managing intertask dependencies, scheduling, and concurrency in accordance with the logical flow of the application.
SWF gives full control over implementing tasks and coordinating them without worrying about underlying complexities such as tracking their progress and maintaining their state.

SWF tracks and maintains the workflow state in a durable fashion, so that the application is resilient to failures in individual components, which can be implemented, deployed, scaled, and modified independently
SWF offers capabilities to support a variety of application requirements and is suitable for a range of use cases that require coordination of tasks, including media processing, web application back-ends, business process workflows, and analytics pipelines.

Simple Workflow Concepts

AWS SWF Components

Workflow
- Fundamental concept in SWF is the Workflow, which is the automation of a business process
- A workflow is a set of activities that carry out some objective, together with logic that coordinates the activities.
Workflow Execution
- A workflow execution is a running instance of a workflow
Workflow History
- SWF maintains the state and progress of each workflow execution in its Workflow History, which saves the application from having to store the state in a durable way.
- It enables applications to be stateless as all information about a workflow execution is stored in its workflow history.
- For each workflow execution, the history provides a record of which activities were scheduled, their current status, and their results. The workflow execution uses this information to determine next steps.
- History provides a detailed audit trail that can be used to monitor running workflow executions and verify completed workflow executions.
- Operations that do not change the state of the workflow for e.g. polling execution do not typically appear in the workflow history
- Markers can be used to record information in the workflow history of a workflow execution that is specific to the use case
Domain
- Each workflow runs in an AWS resource called a Domain, which controls the workflow’s scope
- An AWS account can have multiple domains, with each containing multiple workflows
- Workflows in different domains cannot interact with each other
Activities
- Designing an SWF workflow, Activities need to be precisely defined and then registered with SWF as an activity type with information such as name, version and timeout
Activity Task & Activity Worker
- An Activity Worker is a program that receives activity tasks, performs them, and provides results back. An activity worker can be a program or even a person who performs the task using an activity worker software
- Activity tasks—and the activity workers that perform them can
  - run synchronously or asynchronously, can be distributed across multiple computers, potentially in different geographic regions, or run on the same computer,
  - be written in different programming languages and run on different operating systems
  - be created that are long-running, or that may fail, time out require restarts or that may complete with varying throughput & latency
Decider
- A Decider implements a Workflow’s coordination logic.
- Decider schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ends (or closes) the workflow when the objective has been completed.
- Decider directs the workflow by receiving decision tasks from SWF and responding back to SWF with decisions. A decision represents an action or set of actions which are the next steps in the workflow which can either be to schedule an activity task, set timers to delay the execution of an activity task, to request cancellation of activity tasks already in progress, and to complete or close the workflow.
Workers and Deciders are both stateless, and can respond to increased traffic by simply adding additional Workers and Deciders as needed
Role of SWF service is to function as a reliable central hub through which data is exchanged between the decider, the activity workers, and other relevant entities such as the person administering the workflow.

Mechanism by which both the activity workers and the decider receive their tasks (activity tasks and decision tasks resp.) is by polling the SWF
SWF allows “long polling”, requests will be held open for up to 60 seconds if necessary, to reduce network traffic and unnecessary processing
SWF informs the decider of the state of the workflow by including with each decision task, a copy of the current workflow execution history. The workflow execution history is composed of events, where an event represents a significant change in the state of the workflow execution for e.g events would be the completion of a task, notification that a task has timed out, or the expiration of a timer that was set earlier in the workflow execution. The history is a complete, consistent, and authoritative record of the workflow’s progress

Workflow Implementation & Execution

Implement Activity workers with the processing steps in the Workflow.
Implement Decider with the coordination logic of the Workflow.
Register the Activities and workflow with SWF.
Start the Activity workers and Decider. Once started, the decider and activity workers should start polling Amazon SWF for tasks.
Start one or more executions of the Workflow. Each execution runs independently and can be provided with its own set of input data.
When an execution is started, SWF schedules the initial decision task. In response, the decider begins generating decisions which initiate activity tasks. Execution continues until your decider makes a decision to close the execution.

View and Track workflow executions

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

What does Amazon SWF stand for?
1. Simple Web Flow
2. Simple Work Flow
3. Simple Wireless Forms
4. Simple Web Form
Regarding Amazon SWF, the coordination logic in a workflow is contained in a software program called a ____.
1. Handler
2. Decider
3. Coordinator
4. Worker
For which of the following use cases are Simple Workflow Service (SWF) and Amazon EC2 an appropriate solution? Choose 2 answers
1. Using as an endpoint to collect thousands of data points per hour from a distributed fleet of sensors
2. Managing a multi-step and multi-decision checkout process of an e-commerce website
3. Orchestrating the execution of distributed and auditable business processes
4. Using as an SNS (Simple Notification Service) endpoint to trigger execution of video transcoding jobs
5. Using as a distributed session store for your web application
Amazon SWF is designed to help users…
1. … Design graphical user interface interactions
2. … Manage user identification and authorization
3. … Store Web content
4. … Coordinate synchronous and asynchronous tasks which are distributed and fault tolerant.

What does a “Domain” refer to in Amazon SWF?
1. A security group in which only tasks inside can communicate with each other
2. A special type of worker
3. A collection of related Workflows
4. The DNS record for the Amazon SWF service
Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements Customers can show oft their Individuality on the ski slopes and have access to head-up-displays. GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time? [PROFESSIONAL]
1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)
Your startup wants to implement an order fulfillment process for selling a personalized gadget that needs an average of 3-4 days to produce with some orders taking up to 6 months you expect 10 orders per day on your first day. 1000 orders per day after 6 months and 10,000 orders after 12 months. Orders coming in are checked for consistency men dispatched to your manufacturing plant for production quality control packaging shipment and payment processing. If the product does not meet the quality standards at any stage of the process employees may force the process to repeat a step Customers are notified via email about order status and any critical issues with their orders such as payment failure. Your case architecture includes AWS Elastic Beanstalk for your website with an RDS MySQL instance for customer data and orders. How can you implement the order fulfillment process while making sure that the emails are delivered reliably? [PROFESSIONAL]
1. Add a business process management application to your Elastic Beanstalk app servers and re-use the ROS database for tracking order status use one of the Elastic Beanstalk instances to send emails to customers. (Would use a SWF instead of BPM)
2. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1. Use the decider instance to send emails to customers. (Decider sending emails might not be reliable)
3. Use SWF with an Auto Scaling group of activity workers and a decider instance in another Auto Scaling group with min/max=1. Use SES to send emails to customers.
4. Use an SQS queue to manage all process tasks. Use an Auto Scaling group of EC2 Instances that poll the tasks and execute them. Use SES to send emails to customers. (Does not provide an ability to repeat a step)

Select appropriate use cases for SWF with Amazon EC2? (Choose 2)
1. Video encoding using Amazon S3 and Amazon EC2. In this use case, large videos are uploaded to Amazon S3 in chunks. Application is built as a workflow where each video file is handled as one workflow execution.
2. Processing large product catalogs using Amazon Mechanical Turk. While validating data in large catalogs, the products in the catalog are processed in batches. Different batches can be processed concurrently.
3. Order processing system with Amazon EC2, SQS, and SimpleDB. Use SWF notifications to orchestrate an order processing system running on EC2, where notifications sent over HTTP can trigger real-time processing in related components such as an inventory system or a shipping service.
4. Using as an SQS (Simple Queue Service) endpoint to trigger execution of video transcoding jobs.
When you register an activity in Amazon SWF, you provide the following information, except:
1. a name
2. timeout values
3. a domain
4. version

Regarding Amazon SWF, at times you might want to record information in the workflow history of a workflow execution that is specific to your use case. ____ enable you to record information in the workflow execution history that you can use for any custom or scenario-specific purpose.
1. Markers
2. Tags
3. Hash keys
4. Events
Which of the following statements about SWF are true? Choose 3 answers.
1. SWF tasks are assigned once and never duplicated
2. SWF requires an S3 bucket for workflow storage
3. SWF workflow executions can last up to a year
4. SWF triggers SNS notifications on task assignment
5. SWF uses deciders and workers to complete tasks
6. SWF requires at least 1 EC2 instance per domain