AWS Auto Scaling Policies
- EC2 Auto Scaling Policies provide several ways for scaling the Auto Scaling group.
Maintain a Steady Count of Instances
- Auto Scaling ensures a steady minimum (or desired if specified) count of Instances will always be running.
- If an instance is found unhealthy, Auto Scaling will terminate the Instance and launch a new one.
- ASG determines the health state of each instance by periodically checking the results of EC2 instance status checks.
- ASG can be associated with an Elastic load balancer enabled to use the Elastic Load Balancing health check, Auto Scaling determines the health status of the instances by checking the results of both EC2 instance status and Elastic Load Balancing instance health.
- Auto Scaling marks an instance unhealthy and launches a replacement if
- the instance is in a state other than running,
- the system status is impaired, or
- Elastic Load Balancing reports the instance state as OutOfService.
- After an instance has been marked unhealthy as a result of an EC2 or ELB health check, it is almost immediately scheduled for replacement. It never automatically recovers its health.
- For an unhealthy instance, the instance’s health check can be changed back to healthy manually but you will encounter an error if the instance is already terminating.
- Because the interval between marking an instance unhealthy and its actual termination is so small, attempting to set an instance’s health status back to healthy is probably useful only for a suspended group.
- When the instance is terminated, any associated Elastic IP addresses are disassociated and are not automatically associated with the new instance.
- Elastic IP addresses must be associated with the new instance manually.
- Similarly, when the instance is terminated, its attached EBS volumes are detached and must be attached to the new instance manually.
Manual Scaling
- Manual scaling can be performed by
- Changing the desired capacity limit of the ASG
- Attaching/Detaching instances to the ASG
- Attaching/Detaching an EC2 instance can be done only if
- Instance is in the running state.
- AMI used to launch the instance must still exist.
- Instance is not a member of another ASG.
- Instance is in the same Availability Zone as the ASG.
- If the ASG is associated with a load balancer, the instance and the load balancer must both be in the same VPC.
- Auto Scaling increases the desired capacity of the group by the number of instances being attached. But if the number of instances being attached plus the desired capacity exceeds the maximum size, the request fails.
- When Detaching instances, an option to decrement the desired capacity for the ASG by the number of instances being detached is provided. If chosen not to decrement the capacity, Auto Scaling launches new instances to replace the ones that you detached.
- If an instance is detached from an ASG that is also registered with a load balancer, the instance is deregistered from the load balancer. If connection draining is enabled for the load balancer, Auto Scaling waits for the in-flight requests to complete.
Synchronous Instance Launch API (New – Dec 2025)
- EC2 Auto Scaling now offers a LaunchInstances API that allows synchronous launching of instances inside an Auto Scaling group.
- The API provides immediate feedback on capacity availability, returning instance IDs on success or error details on failure.
- Allows precise control over where instances are launched by specifying an override for any Availability Zone and/or subnet in the ASG.
- Unlike the traditional asynchronous scaling approach (where you must monitor scaling activities), this API immediately returns results.
- Use cases include workloads that need deterministic instance placement or immediate confirmation of capacity provisioning.
- Refer: Launching instances with synchronous provisioning
Scheduled Scaling
- Scaling based on a schedule allows you to scale the application in response to predictable load changes for e.g. last day of the month, the last day of a financial year.
- Scheduled scaling requires the configuration of Scheduled actions, which tells Auto Scaling to perform a scaling action at a certain time in the future, with the start time at which the scaling action should take effect, and the new minimum, maximum, and desired size of group should have.
- Auto Scaling guarantees the order of execution for scheduled actions within the same group, but not for scheduled actions across groups.
- Multiple Scheduled Actions can be specified but should have unique time values and they cannot have overlapping times scheduled which will lead to their rejection.
- Cooldown periods are not supported.
Dynamic Scaling
- Allows automatic scaling in response to the changing demand for e.g. scale-out in case CPU utilization of the instance goes above 70% and scale in when the CPU utilization goes below 30%
- ASG uses a combination of alarms & policies to determine when the conditions for scaling are met.
- An alarm is an object that watches over a single metric over a specified time period. When the value of the metric breaches the defined threshold, for the number of specified time periods the alarm performs one or more actions (such as sending messages to Auto Scaling).
- A policy is a set of instructions that tells Auto Scaling how to respond to alarm messages.
- Dynamic scaling process works as below
- CloudWatch monitors the specified metrics for all the instances in the Auto Scaling Group.
- Changes are reflected in the metrics as the demand grows or shrinks
- When the change in the metrics breaches the threshold of the CloudWatch alarm, the CloudWatch alarm performs an action. Depending on the breach, the action is a message sent to either the scale-in policy or the scale-out policy
- After the Auto Scaling policy receives the message, Auto Scaling performs the scaling activity for the ASG.
- This process continues until you delete either the scaling policies or the ASG.
- When a scaling policy is executed, if the capacity calculation produces a number outside of the minimum and maximum size range of the group, EC2 Auto Scaling ensures that the new capacity never goes outside of the minimum and maximum size limits.
- When the desired capacity reaches the maximum size limit, scaling out stops. If demand drops and capacity decreases, Auto Scaling can scale out again.
Dynamic Scaling Policy Types
Target tracking scaling
- Increase or decrease the current capacity of the group based on a target value for a specific metric.
- (Updated Nov 2024) Target Tracking policies now feature highly responsive scaling:
- Self-tuning responsiveness – Target Tracking automatically adapts to the unique usage patterns of individual applications using historical usage data, determining the optimal balance between cost and performance without manual intervention.
- Sub-minute metric support – Can be configured to monitor high-resolution CloudWatch metrics (as low as 10-second intervals) to make more timely scaling decisions.
- Ideal for applications with volatile demand patterns such as client-serving APIs, live streaming services, ecommerce websites, or on-demand data processing.
![]()
Step scaling
- Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
Simple scaling
- Increase or decrease the current capacity of the group based on a single scaling adjustment.
- Note: AWS recommends not using simple scaling policies and scaling cooldowns as a best practice. Use target tracking or step scaling instead for more responsive and efficient scaling behavior.
Multiple Policies
- ASG can have more than one scaling policy attached at any given time.
- Each ASG would have at least two policies: one to scale the architecture out and another to scale the architecture in.
- If an ASG has multiple policies, there is always a chance that both policies can instruct the Auto Scaling to Scale Out or Scale In at the same time.
- When these situations occur, Auto Scaling chooses the policy that has the greatest impact i.e. provides the largest capacity for both scale out and scale in on the ASG for e.g. if two policies are triggered at the same time and Policy 1 instructs to scale out the instance by 1 while Policy 2 instructs to scale out the instances by 2, Auto Scaling will use the Policy 2 and scale out the instances by 2 as it has a greater impact.
Predictive Scaling
- Predictive scaling can be used to increase the number of EC2 instances in the ASG in advance of daily and weekly patterns in traffic flows.
- Predictive scaling is well suited for situations where you have:
-
Cyclical traffic, such as high use of resources during regular business hours and low use of resources during evenings and weekends
-
Recurring on-and-off workload patterns, such as batch processing, testing, or periodic data analysis
-
Applications that take a long time to initialize, causing a noticeable latency impact on application performance during scale-out events
-
- Predictive scaling provides proactive scaling that can help scale faster by launching capacity in advance of forecasted load, compared to using only dynamic scaling, which is reactive in nature.
- Predictive scaling uses machine learning to predict capacity requirements based on historical data from CloudWatch. The machine learning algorithm consumes the available historical data and calculates the capacity that best fits the historical load pattern, and then continuously learns based on new data to make future forecasts more accurate.
- Predictive scaling supports forecast only mode so that you can evaluate the forecast before you allow predictive scaling to actively scale capacity
- When you are ready to start scaling with predictive scaling, switch the policy from forecast only mode to forecast and scale mode.
- (Updated Oct 2025) Predictive scaling is now available in additional AWS Regions, expanding its availability to more customers globally.
Warm Pools
- A warm pool is a pool of pre-initialized EC2 instances that sits alongside the Auto Scaling group, ready to be quickly placed into service when needed.
- Warm pools help decrease latency for applications that have exceptionally long boot times (e.g., instances that need to write large amounts of data to disk or perform lengthy initialization).
- Instances in a warm pool can be in one of the following states: Stopped, Running, or Hibernated.
- When a scale-out event occurs, instances from the warm pool are moved into the ASG, reducing launch latency significantly.
- Lifecycle hooks can be used with warm pools to perform custom actions while instances transition between states.
- (Updated Nov 2025) Warm pools now support Auto Scaling groups with mixed instances policies, allowing customers using multiple instance types and purchase options to benefit from pre-initialized instance pools.
- (Updated Apr 2026) Amazon EKS managed node groups now support EC2 Auto Scaling warm pools, enabling Kubernetes workloads to benefit from faster instance readiness.
- Refer: Warm pools for Amazon EC2 Auto Scaling
Zonal Shift and Zonal Autoshift
- (New – Nov 2024) EC2 Auto Scaling now supports Amazon Application Recovery Controller (ARC) zonal shift and zonal autoshift.
- Zonal shift allows you to rapidly recover from application impairments in a single Availability Zone by shifting traffic and instances away from the affected AZ.
- Zonal autoshift enables AWS to automatically detect AZ impairments and shift traffic away from the affected zone on your behalf.
- Can be initiated from the EC2 Auto Scaling console, Application Recovery Controller console, or via the AWS SDK.
- When a zonal shift is active, Auto Scaling will not launch new instances in the shifted-away AZ and will launch replacement capacity in healthy AZs.
- Refer: Auto Scaling group zonal shift
ASG Deletion Protection
- (New – Jan 2026) EC2 Auto Scaling now provides deletion protection at the group level to safeguard against accidental ASG deletions.
- Multiple protection levels are available:
- No protection – Default behavior, ASG can be deleted normally.
- Prevent force deletion – Blocks force-delete operations (ASG cannot be deleted while it still has running instances).
- Prevent all deletion – Blocks all delete operations on the ASG.
- A new IAM policy condition key autoscaling:ForceDelete can be used with the DeleteAutoScalingGroup action to control whether the ForceDelete parameter can be used during deletion.
- Deletion protection can be set when creating or updating an ASG.
- Combining the condition key with group-level protection provides layered defense against unwanted ASG termination.
- Available in all AWS Regions and AWS GovCloud (US) Regions.
- Refer: Configure deletion protection
Instance Lifecycle Policy
- (New – Nov 2025) EC2 Auto Scaling introduces instance lifecycle policy to control instance retention when termination lifecycle hooks fail or timeout.
- Customers can configure the ASG to retain instances (instead of terminating them) when lifecycle hook actions are abandoned, providing greater confidence in graceful shutdown processes.
- Useful for workloads that require guaranteed completion of cleanup tasks before instance termination.
- Refer: Control instance retention with instance lifecycle policies
Lambda as Lifecycle Hook Target
- (New – Jul 2025) AWS Lambda functions can now be used as direct notification targets for EC2 Auto Scaling lifecycle hooks.
- Previously, lifecycle hooks required EventBridge or SNS/SQS intermediaries to invoke Lambda functions.
- This simplifies the architecture for custom actions when instances enter a wait state (during both launch and termination).
- Common use cases include downloading logs, running configuration scripts, draining connections, or performing data backups before termination.
- Refer: Prepare for lifecycle notifications
Instance Maintenance Policy
- (Introduced Nov 2023) Instance maintenance policy allows you to control how Amazon EC2 Auto Scaling handles instance replacement during events such as instance refresh, health check replacements, and AZ rebalancing.
- Available options:
- Launch before terminating – A new instance must be provisioned first before an existing instance can be terminated (ensures availability but temporarily increases capacity).
- Terminate and launch – An existing instance is terminated first, then a new instance is launched (reduces cost but temporarily decreases capacity).
- Custom – Set min/max healthy percentage to control the capacity range during replacement.
- Helps maintain application availability and performance during routine maintenance operations.
- Refer: Instance maintenance policies
AWS Certification Exam Practice Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- A user has created a web application with Auto Scaling. The user is regularly monitoring the application and he observed that the traffic is highest on Thursday and Friday between 8 AM to 6 PM. What is the best solution to handle scaling in this case?
- Add a new instance manually by 8 AM Thursday and terminate the same by 6 PM Friday
- Schedule Auto Scaling to scale up by 8 AM Thursday and scale down after 6 PM on Friday
- Schedule a policy which may scale up every day at 8 AM and scales down by 6 PM
- Configure a batch process to add a instance by 8 AM and remove it by Friday 6 PM
- A customer has a website which shows all the deals available across the market. The site experiences a load of 5 large EC2 instances generally. However, a week before Thanksgiving vacation they encounter a load of almost 20 large instances. The load during that period varies over the day based on the office timings. Which of the below mentioned solutions is cost effective as well as help the website achieve better performance?
- Keep only 10 instances running and manually launch 10 instances every day during office hours.
- Setup to run 10 instances during the pre-vacation period and only scale up during the office time by launching 10 more instances using the AutoScaling schedule.
- During the pre-vacation period setup a scenario where the organization has 15 instances running and 5 instances to scale up and down using Auto Scaling based on the network I/O policy.
- During the pre-vacation period setup 20 instances to run continuously.
- A user has setup Auto Scaling with ELB on the EC2 instances. The user wants to configure that whenever the CPU utilization is below 10%, Auto Scaling should remove one instance. How can the user configure this?
- The user can get an email using SNS when the CPU utilization is less than 10%. The user can use the desired capacity of Auto Scaling to remove the instance
- Use CloudWatch to monitor the data and Auto Scaling to remove the instances using scheduled actions
- Configure CloudWatch to send a notification to Auto Scaling Launch configuration when the CPU utilization is less than 10% and configure the Auto Scaling policy to remove the instance
- Configure CloudWatch to send a notification to the Auto Scaling group when the CPU Utilization is less than 10% and configure the Auto Scaling policy to remove the instance
- A company has an application with unpredictable traffic that spikes rapidly. They are using target tracking scaling with a 60-second CloudWatch metric period. Despite scaling policies, users experience latency during sudden traffic bursts. What should they do to improve scaling responsiveness?
- Switch to simple scaling with a lower cooldown period
- Add more instances to the minimum capacity of the ASG
- Configure the target tracking policy to use high-resolution CloudWatch metrics with sub-minute (10-second) evaluation periods
- Replace target tracking with step scaling policies
- A company wants to protect their production Auto Scaling group from accidental deletion. The ASG runs critical workloads and must remain available at all times. What combination of features provides the strongest protection? (Select TWO)
- Enable instance scale-in protection on all instances
- Enable ASG deletion protection with “Prevent all deletion” level
- Set the minimum capacity to match the desired capacity
- Use the autoscaling:ForceDelete IAM condition key to restrict force-delete permissions
- Enable termination protection on individual EC2 instances
- An application running on an Auto Scaling group takes 10 minutes to fully initialize. The application experiences predictable daily traffic spikes at 9 AM. Which approach would minimize user-facing latency during the morning traffic increase?
- Use dynamic target tracking scaling with aggressive scale-out settings
- Use scheduled scaling to add instances at 8:50 AM daily
- Use predictive scaling combined with a warm pool of pre-initialized instances
- Increase the minimum capacity of the ASG to handle peak load
- A company detects degraded performance in one Availability Zone affecting their Auto Scaling group. They need to quickly shift traffic and instances away from the impaired AZ without manual intervention in the future. What should they configure?
- Remove the affected subnet from the ASG configuration
- Use a scheduled action to reduce capacity in the affected AZ
- Enable cross-zone load balancing on the load balancer
- Enable zonal autoshift with Amazon Application Recovery Controller for the ASG