AWS Auto Scaling Policies

EC2 Auto Scaling Policies provide several ways for scaling the Auto Scaling group.

Maintain a Steady Count of Instances

Auto Scaling ensures a steady minimum (or desired if specified) count of Instances will always be running.
If an instance is found unhealthy, Auto Scaling will terminate the Instance and launch a new one.

ASG determines the health state of each instance by periodically checking the results of EC2 instance status checks.
ASG can be associated with an Elastic load balancer enabled to use the Elastic Load Balancing health check, Auto Scaling determines the health status of the instances by checking the results of both EC2 instance status and Elastic Load Balancing instance health.
Auto Scaling marks an instance unhealthy and launches a replacement if
- the instance is in a state other than running,
- the system status is impaired, or
- Elastic Load Balancing reports the instance state as OutOfService.

After an instance has been marked unhealthy as a result of an EC2 or ELB health check, it is almost immediately scheduled for replacement. It never automatically recovers its health.
For an unhealthy instance, the instance’s health check can be changed back to healthy manually but you will encounter an error if the instance is already terminating.
Because the interval between marking an instance unhealthy and its actual termination is so small, attempting to set an instance’s health status back to healthy is probably useful only for a suspended group.

When the instance is terminated, any associated Elastic IP addresses are disassociated and are not automatically associated with the new instance.
Elastic IP addresses must be associated with the new instance manually.
Similarly, when the instance is terminated, its attached EBS volumes are detached and must be attached to the new instance manually.

Manual Scaling

Manual scaling can be performed by
- Changing the desired capacity limit of the ASG
- Attaching/Detaching instances to the ASG
Attaching/Detaching an EC2 instance can be done only if
- Instance is in the running state.
- AMI used to launch the instance must still exist.
- Instance is not a member of another ASG.
- Instance is in the same Availability Zone as the ASG.
- If the ASG is associated with a load balancer, the instance and the load balancer must both be in the same VPC.
Auto Scaling increases the desired capacity of the group by the number of instances being attached. But if the number of instances being attached plus the desired capacity exceeds the maximum size, the request fails.
When Detaching instances, an option to decrement the desired capacity for the ASG by the number of instances being detached is provided. If chosen not to decrement the capacity, Auto Scaling launches new instances to replace the ones that you detached.

If an instance is detached from an ASG that is also registered with a load balancer, the instance is deregistered from the load balancer. If connection draining is enabled for the load balancer, Auto Scaling waits for the in-flight requests to complete.

Scheduled Scaling

Scaling based on a schedule allows you to scale the application in response to predictable load changes for e.g. last day of the month, the last day of a financial year.
Scheduled scaling requires the configuration of Scheduled actions, which tells Auto Scaling to perform a scaling action at a certain time in the future, with the start time at which the scaling action should take effect, and the new minimum, maximum, and desired size of group should have.

Auto Scaling guarantees the order of execution for scheduled actions within the same group, but not for scheduled actions across groups.
Multiple Scheduled Actions can be specified but should have unique time values and they cannot have overlapping times scheduled which will lead to their rejection.
Cooldown periods are not supported.

Dynamic Scaling

Allows automatic scaling in response to the changing demand for e.g. scale-out in case CPU utilization of the instance goes above 70% and scale in when the CPU utilization goes below 30%

ASG uses a combination of alarms & policies to determine when the conditions for scaling are met.
- An alarm is an object that watches over a single metric over a specified time period. When the value of the metric breaches the defined threshold, for the number of specified time periods the alarm performs one or more actions (such as sending messages to Auto Scaling).
- A policy is a set of instructions that tells Auto Scaling how to respond to alarm messages.

Dynamic scaling process works as below
1. CloudWatch monitors the specified metrics for all the instances in the Auto Scaling Group.
2. Changes are reflected in the metrics as the demand grows or shrinks
3. When the change in the metrics breaches the threshold of the CloudWatch alarm, the CloudWatch alarm performs an action. Depending on the breach, the action is a message sent to either the scale-in policy or the scale-out policy
4. After the Auto Scaling policy receives the message, Auto Scaling performs the scaling activity for the ASG.
5. This process continues until you delete either the scaling policies or the ASG.

When a scaling policy is executed, if the capacity calculation produces a number outside of the minimum and maximum size range of the group, EC2 Auto Scaling ensures that the new capacity never goes outside of the minimum and maximum size limits.
When the desired capacity reaches the maximum size limit, scaling out stops. If demand drops and capacity decreases, Auto Scaling can scale out again.

Dynamic Scaling Policy Types

Target tracking scaling

Increase or decrease the current capacity of the group based on a target value for a specific metric.

Auto Scaling Target Tracking Scaling

Step scaling

Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.

Simple scaling

Increase or decrease the current capacity of the group based on a single scaling adjustment.

Multiple Policies

ASG can have more than one scaling policy attached at any given time.

Each ASG would have at least two policies: one to scale the architecture out and another to scale the architecture in.
If an ASG has multiple policies, there is always a chance that both policies can instruct the Auto Scaling to Scale Out or Scale In at the same time.
When these situations occur, Auto Scaling chooses the policy that has the greatest impact i.e. provides the largest capacity for both scale out and scale in on the ASG for e.g. if two policies are triggered at the same time and Policy 1 instructs to scale out the instance by 1 while Policy 2 instructs to scale out the instances by 2, Auto Scaling will use the Policy 2 and scale out the instances by 2 as it has a greater impact.

Predictive Scaling

Predictive scaling can be used to increase the number of EC2 instances in the ASG in advance of daily and weekly patterns in traffic flows.
Predictive scaling is well suited for situations where you have:
- Cyclical traffic, such as high use of resources during regular business hours and low use of resources during evenings and weekends
- Recurring on-and-off workload patterns, such as batch processing, testing, or periodic data analysis
- Applications that take a long time to initialize, causing a noticeable latency impact on application performance during scale-out events
Predictive scaling provides proactive scaling that can help scale faster by launching capacity in advance of forecasted load, compared to using only dynamic scaling, which is reactive in nature.

Predictive scaling uses machine learning to predict capacity requirements based on historical data from CloudWatch. The machine learning algorithm consumes the available historical data and calculates the capacity that best fits the historical load pattern, and then continuously learns based on new data to make future forecasts more accurate.
Predictive scaling supports forecast only mode so that you can evaluate the forecast before you allow predictive scaling to actively scale capacity
When you are ready to start scaling with predictive scaling, switch the policy from forecast only mode to forecast and scale mode.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A user has created a web application with Auto Scaling. The user is regularly monitoring the application and he observed that the traffic is highest on Thursday and Friday between 8 AM to 6 PM. What is the best solution to handle scaling in this case?
1. Add a new instance manually by 8 AM Thursday and terminate the same by 6 PM Friday
2. Schedule Auto Scaling to scale up by 8 AM Thursday and scale down after 6 PM on Friday
3. Schedule a policy which may scale up every day at 8 AM and scales down by 6 PM
4. Configure a batch process to add a instance by 8 AM and remove it by Friday 6 PM
A customer has a website which shows all the deals available across the market. The site experiences a load of 5 large EC2 instances generally. However, a week before Thanksgiving vacation they encounter a load of almost 20 large instances. The load during that period varies over the day based on the office timings. Which of the below mentioned solutions is cost effective as well as help the website achieve better performance?
1. Keep only 10 instances running and manually launch 10 instances every day during office hours.
2. Setup to run 10 instances during the pre-vacation period and only scale up during the office time by launching 10 more instances using the AutoScaling schedule.
3. During the pre-vacation period setup a scenario where the organization has 15 instances running and 5 instances to scale up and down using Auto Scaling based on the network I/O policy.
4. During the pre-vacation period setup 20 instances to run continuously.
A user has setup Auto Scaling with ELB on the EC2 instances. The user wants to configure that whenever the CPU utilization is below 10%, Auto Scaling should remove one instance. How can the user configure this?
1. The user can get an email using SNS when the CPU utilization is less than 10%. The user can use the desired capacity of Auto Scaling to remove the instance
2. Use CloudWatch to monitor the data and Auto Scaling to remove the instances using scheduled actions
3. Configure CloudWatch to send a notification to Auto Scaling Launch configuration when the CPU utilization is less than 10% and configure the Auto Scaling policy to remove the instance
4. Configure CloudWatch to send a notification to the Auto Scaling group when the CPU Utilization is less than 10% and configure the Auto Scaling policy to remove the instance