AWS EC2 Monitoring – CloudWatch Metrics & Alarms

EC2 Monitoring

Status Checks

  • Status monitoring helps quickly determine whether EC2 has detected any problems that might prevent instances from running applications.
  • EC2 performs automated checks on every running EC2 instance to identify hardware and software issues.
  • Status checks are performed every minute and each returns a pass or a fail status.
  • If all checks pass, the overall status of the instance is OK.
  • If one or more checks fail, the overall status is Impaired.
  • Status checks are built into EC2, so they cannot be disabled or deleted.
  • There are three types of status checks:
    • System status checks
    • Instance status checks
    • Attached EBS status checks
  • Status checks data augments the information that EC2 already provides about the intended state of each instance (such as pending, running, and stopping) as well as the utilization metrics that CloudWatch monitors (CPU utilization, network traffic, and disk activity).
  • Alarms can be created or deleted, that are triggered based on the result of the status checks. for e.g., an alarm can be created to warn if status checks fail on a specific instance.

System Status Checks

  • monitor the AWS systems, required to use the instance, to ensure they are working properly.
  • detect problems with the instance that require AWS involvement to repair.
  • System status checks failure might due to
    • Loss of network connectivity
    • Loss of system power
    • Software issues on the physical host
    • Hardware issues on the physical host
  • When a system status check fails, one can either
    • check AWS Health Dashboard for any scheduled critical maintenance by AWS to the instance’s host.
    • wait for AWS to fix the issue
    • or resolve it by stopping and restarting or terminating and replacing an instance

Instance Status Checks

  • monitor the software and network configuration of the individual instance
  • checks to detect problems that require involvement to repair.
  • Instance status checks failure might be due to
    • Failed system status checks
    • Misconfigured networking or startup configuration
    • Exhausted memory
    • Corrupted file system
    • Incompatible kernel
  • When an instance status check fails, it can be resolved by either rebooting the instance or by making modifications to the operating system

Attached EBS Status Checks

  • monitor whether the EBS volumes attached to an instance are reachable and able to complete I/O operations.
  • available for Nitro-based instances only.
  • helps detect issues where the instance cannot communicate with one or more attached EBS volumes.
  • Attached EBS status check failure might be due to
    • Hardware or software issues on the storage subsystem underlying the EBS volume
    • Hardware issues on the physical host impacting reachability to EBS
  • The metric StatusCheckFailed_AttachedEBS is available at a 1-minute frequency at no additional charge.
  • Can be used with CloudWatch alarms and Auto Scaling health checks to replace instances with impaired EBS volumes.

EC2 Instance Recovery

  • Simplified Automatic Recovery
    • enabled by default during instance launch on supported instances.
    • automatically moves the instance from the impaired host to a different host when a system status check failure is detected.
    • recovered instance is identical to the original (instance ID, private IP, Elastic IP, metadata, placement group).
    • does not require a CloudWatch alarm to be configured.
    • works only for system status check failures, not for instance status check failures.
    • available for over 90% of deployed EC2 instances.
  • CloudWatch Action Based Recovery
    • can be configured optionally after instance launch using CloudWatch alarms.
    • provides the ability to set a recovery action on a CloudWatch alarm monitoring the StatusCheckFailed_System metric.
    • provides more granular control over recovery conditions and notification.

CloudWatch Monitoring

  • CloudWatch helps monitor EC2 instances, which collects and processes
    raw data from EC2 into readable, near real-time metrics.
  • Statistics are recorded for a period of two weeks so that historical information can be accessed and used to gain a better perspective on how
    the application or service is performing.
  • By default, Basic monitoring is enabled and EC2 metric data is sent to CloudWatch in 5-minute periods automatically
  • Detailed monitoring can be enabled on the EC2 instance, which sends data to CloudWatch in 1-minute periods.
  • Organization-wide Detailed Monitoring Enablement (2026)
    • CloudWatch Ingestion enablement rules can automatically enable detailed monitoring for both existing and newly launched EC2 instances matching the rule scope.
    • Ensures consistent 1-minute metrics collection across EC2 instances at the organization or account level.
  • Aggregating Statistics Across Instances/ASG/AMI ID
    • Aggregate statistics are available for the instances that have detailed monitoring (at an additional charge) enabled, which provides data in 1-minute periods
    • Instances that use basic monitoring are not included in the aggregates.
    • CloudWatch does not aggregate data across Regions. Therefore, metrics are completely separate between regions.
    • CloudWatch returns statistics for all dimensions in the AWS/EC2 namespace if no dimension is specified
    • The technique for retrieving all dimensions across an AWS namespace does not work for custom namespaces published to CloudWatch.
    • Statistics include Sum, Average, Minimum, Maximum, Data Samples
    • With custom namespaces, the complete set of dimensions that are associated with any given data point to retrieve statistics that include the data point must be specified
  • CloudWatch alarms
    • can be created to monitor any one of the EC2 instance’s metrics.
    • can be configured to automatically send you a notification when the metric reaches a specified threshold.
    • can automatically stop, terminate, reboot, or recover EC2 instances
    • can automatically recover an EC2 instance when the instance becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair
    • can automatically stop or terminate the instances to save costs (EC2 instances that use an EBS volume as the root device can be stopped
      or terminated, whereas instances that use the instance store as the root device can only be terminated)
    • can use EC2ActionsAccess IAM role, which enables AWS to perform stop, terminate, or reboot actions on EC2 instances
    • If you have read/write permissions for CloudWatch but not for EC2, alarms can still be created but the stop or terminate actions won’t be performed on the EC2 instance
    • Composite Alarms can combine multiple metric alarms into a single alarm for aggregated health, but cannot perform EC2 actions directly.

CloudWatch Agent

  • The unified CloudWatch agent collects system-level metrics and logs from EC2 instances that are not available through the default hypervisor-level metrics.
  • Key OS-level metrics collected by the agent include:
    • Memory utilization (mem_used_percent)
    • Disk usage (disk_used_percent)
    • Swap usage
    • Process-level metrics (procstat)
  • EC2 does NOT provide memory or disk usage metrics by default — these require the CloudWatch agent.
  • Can be installed and managed via AWS Systems Manager (SSM).
  • Configuration is stored in a JSON file or as an SSM Parameter Store parameter.
  • Metrics collected by the CloudWatch agent are billed as custom metrics.
  • In-Console Agent Management (2025/2026)
    • CloudWatch provides visibility into agent status across the EC2 fleet directly in the console.
    • Automatic detection of supported workloads and recommended monitoring configurations.
    • Visual configuration editor for the agent eliminates the need to hand-edit JSON (April 2026).

EC2 Monitoring Metrics

Instance Metrics

  • CPUUtilization
    • % of physical CPU time that EC2 uses to run the instance, including time spent running both user code and EC2 code.
    • At a very high level, CPUUtilization is the sum of guest CPUUtilization and hypervisor CPUUtilization.
  • DiskReadOps
    • Completed read operations from all instance store volumes available to the instance in a specified period of time.
    • If there are no instance store volumes, the value is 0 or the metric is not reported.
  • DiskWriteOps
    • Completed write operations to all instance store volumes available to the instance in a specified period of time.
    • If there are no instance store volumes, the value is 0 or the metric is not reported.
  • DiskReadBytes
    • Bytes read from all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application reads from the hard disk of the instance.
  • DiskWriteBytes
    • Bytes written to all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application writes onto the hard disk of the instance.
  • MetadataNoToken
    • The number of times the Instance Metadata Service (IMDS) was successfully accessed using a method that does not use a token (IMDSv1).
    • Used to determine if there are any processes accessing instance metadata using IMDSv1, which is less secure than IMDSv2.
    • If all requests use token-backed sessions (IMDSv2), the value is 0.
  • MetadataNoTokenRejected
    • The number of times an IMDSv1 call was attempted after IMDSv1 was disabled on the instance.
    • Indicates that software on the instance still attempts IMDSv1 calls and needs updating.
  • NetworkIn
    • The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to an application on a single instance.
  • NetworkOut
    • The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.
  • NetworkPacketsIn
    • The number of packets received on all network interfaces by the instance.
    • This metric is available for basic monitoring only (5-minute periods).
  • NetworkPacketsOut
    • The number of packets sent out on all network interfaces by the instance.
    • This metric is available for basic monitoring only (5-minute periods).

CPU Credit Metrics (Burstable Performance Instances)

  • Applicable to all burstable performance instances (T2, T3, T3a, T4g) — not just T2.
  • CPU Credit metrics are available at a 5-minute frequency only.
  • CPUCreditUsage
    • The number of CPU credits spent by the instance for CPU utilization.
    • One CPU credit equals one vCPU running at 100% utilization for one minute.
  • CPUCreditBalance
    • The number of earned CPU credits that an instance has accrued since it was launched or started.
    • For T2 Standard, also includes the number of launch credits accrued.
    • When a T3/T3a instance stops, the CPUCreditBalance persists for seven days. When a T2 instance stops, credits are lost.
    • Used to determine how long an instance can burst beyond its baseline performance level.
  • CPUSurplusCreditBalance (Unlimited mode only)
    • The number of surplus credits spent when the CPUCreditBalance is zero.
    • Surplus credits are paid down by earned CPU credits.
    • If surplus credits exceed the maximum earnable in a 24-hour period, additional charges apply.
  • CPUSurplusCreditsCharged (Unlimited mode only)
    • The number of surplus credits that are not paid down and incur an additional charge.
    • Charged when surplus credits exceed 24-hour maximum, instance is stopped/terminated, or switched from unlimited to standard mode.

Amazon EBS Metrics for Nitro-based Instances

  • Available for EBS volumes attached to Nitro-based instances (non-bare-metal).
  • EBSReadOps / EBSWriteOps – Completed read/write operations from all attached EBS volumes.
  • EBSReadBytes / EBSWriteBytes – Bytes read from/written to all attached EBS volumes.
  • EBSIOBalance%
    • Percentage of I/O credits remaining in the burst bucket.
    • Available for basic monitoring only.
    • Available for some *.4xlarge and smaller instance sizes that burst to maximum performance for 30 minutes every 24 hours.
  • EBSByteBalance%
    • Percentage of throughput credits remaining in the burst bucket.
    • Available for basic monitoring only.
    • Available for some *.4xlarge and smaller instance sizes that burst to maximum performance for 30 minutes every 24 hours.
  • InstanceEBSIOPSExceededCheck
    • Reports whether the application attempted to drive IOPS exceeding the maximum EBS IOPS limits for the instance.
    • Values: 0 (not exceeded) or 1 (exceeded).
  • InstanceEBSThroughputExceededCheck
    • Reports whether the application attempted to drive throughput exceeding the maximum EBS throughput limits for the instance.
    • Values: 0 (not exceeded) or 1 (exceeded).

Status Check Metrics

  • Available at a 1-minute frequency at no charge by default.
  • StatusCheckFailed
    • Reports if either of the status checks has failed.
    • Values: 0 (passed) or 1 (failed).
  • StatusCheckFailed_Instance
    • Reports whether the instance has passed the EC2 instance status check in the last minute.
    • Values: 0 (passed) or 1 (failed).
  • StatusCheckFailed_System
    • Reports whether the instance has passed the EC2 system status check in the last minute.
    • Values: 0 (passed) or 1 (failed).
  • StatusCheckFailed_AttachedEBS
    • Reports whether the instance has passed the attached EBS status check in the last minute.
    • Values: 0 (passed) or 1 (failed).
    • Available for Nitro-based instances only.

Accelerator Metrics

  • GPUPowerUtilization
    • Active power usage as a percentage of maximum active power.
    • Available for supported accelerated computing instances only.

CloudWatch Network Flow Monitor

  • Launched at re:Invent 2024 as part of CloudWatch Network Monitoring.
  • Provides near real-time visibility into network performance (packet loss and latency) for traffic between EC2 instances, EKS workloads, and AWS services (S3, DynamoDB).
  • Uses fully-managed agents installed on EC2 instances to collect TCP-based performance metrics.
  • Agents send aggregated metrics to the backend approximately every 30 seconds.
  • Top contributors feature identifies network flows with the highest retransmissions or latency to help pinpoint impairments.
  • Supports multi-account monitoring via AWS Organizations integration.

EC2 Metric Dimensions

  • InstanceId – Filters data for a specific instance.
  • InstanceType – Filters data for all instances of a specific type (requires Detailed Monitoring).
  • ImageId (AMI ID) – Filters data for all instances running a specific AMI (requires Detailed Monitoring).
  • AutoScalingGroupName – Filters data for all instances in a specified Auto Scaling group.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. In the basic monitoring package for EC2, Amazon CloudWatch provides the following metrics:
    1. Web server visible metrics such as number failed transaction requests
    2. Operating system visible metrics such as memory utilization
    3. Database visible metrics such as number of connections
    4. Hypervisor visible metrics such as CPU utilization
  2. Which of the following requires a custom CloudWatch metric to monitor?
    1. Memory Utilization of an EC2 instance
    2. CPU Utilization of an EC2 instance
    3. Disk usage activity of an EC2 instance
    4. Data transfer of an EC2 instance
  3. A user has configured CloudWatch monitoring on an EBS backed EC2 instance. If the user has not attached any additional device, which of the below mentioned metrics will always show a 0 value?
    1. DiskReadBytes
    2. NetworkIn
    3. NetworkOut
    4. CPUUtilization
  4. A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
    1. Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
    2. Setup the CloudWatch with Auto Scaling to terminate all the instances
    3. Setup a job which terminates all instances after 600 minutes
    4. It is not possible to terminate instances automatically
  5. An AWS account owner has setup multiple IAM users. One IAM user only has CloudWatch access. He has setup the alarm action, which stops the EC2 instances when the CPU utilization is below the threshold limit. What will happen in this case?
    1. It is not possible to stop the instance using the CloudWatch alarm
    2. CloudWatch will stop the instance when the action is executed
    3. The user cannot set an alarm on EC2 since he does not have the permission
    4. The user can setup the action but it will not be executed if the user does not have EC2 rights
  6. A user has launched 10 instances from the same AMI ID using Auto Scaling. The user is trying to see the average CPU utilization across all instances of the last 2 weeks under the CloudWatch console. How can the user achieve this?
    1. View the Auto Scaling CPU metrics (Refer AS Instance Monitoring)
    2. Aggregate the data over the instance AMI ID (Works but needs detailed monitoring enabled)
    3. The user has to use the CloudWatch analyser to find the average data across instances
    4. It is not possible to see the average CPU utilization of the same AMI ID since the instance ID is different
  7. Which EC2 status check type monitors whether the EBS volumes attached to a Nitro-based instance are reachable?
    1. System status check
    2. Instance status check
    3. Attached EBS status check
    4. Volume status check
  8. An organization wants to monitor memory utilization of their EC2 instances. Which approach should they use?
    1. Enable detailed monitoring on the instances
    2. Install the unified CloudWatch agent and configure memory metrics
    3. Use the default CloudWatch EC2 metrics
    4. Enable enhanced monitoring on the instances
  9. Which CloudWatch metric can help identify if an EC2 instance is still using the less secure IMDSv1 to access instance metadata?
    1. StatusCheckFailed_Instance
    2. MetadataNoToken
    3. CPUCreditBalance
    4. NetworkPacketsIn
  10. A company wants to ensure all EC2 instances across their AWS Organization have detailed monitoring enabled. What is the most efficient approach? [Select 2]
    1. Manually enable detailed monitoring on each instance
    2. Create CloudWatch Ingestion enablement rules scoped to the organization
    3. Use enablement rules to automatically enable detailed monitoring for existing and new instances
    4. Use AWS Config rules to detect and auto-remediate

References

CloudWatch Monitoring Supported AWS Services

CloudWatch Monitoring Supported AWS Services

  • CloudWatch offers either basic or detailed monitoring for supported AWS services.
  • Basic monitoring means that a service sends data points to CloudWatch every five minutes.
  • Detailed monitoring means that a service sends data points to CloudWatch every minute.
  • If the AWS service supports both basic and detailed monitoring, the basic would be enabled by default and the detailed monitoring needs to be enabled for detailed metrics.
  • High-Resolution Custom Metrics allow publishing data at 1-second resolution using the PutMetricData API with a StorageResolution of 1.

Monitoring Categories

  • Basic Monitoring – Free, default set of metrics published at 5-minute intervals for most services.
  • Detailed Monitoring – Paid, more frequent metrics (typically 1-minute intervals). Must be explicitly enabled.
  • High-Resolution Custom Metrics – Custom metrics published at up to 1-second intervals using PutMetricData API or Embedded Metric Format (EMF).

Services Offering Detailed Monitoring

The following services officially offer detailed monitoring (paid, more fine-grained metrics):

  • Amazon API Gateway – Additional dimensions for detailed metrics
  • AWS AppSync – Detailed CloudWatch metrics
  • Amazon CloudFront – Additional distribution metrics
  • Amazon EC2 – 1-minute metrics (vs. 5-minute basic)
  • AWS Elastic Beanstalk – Enhanced health reporting and monitoring
  • Amazon Kinesis Data Streams – Enhanced shard-level metrics
  • AWS Lambda – Event source mapping metrics
  • Amazon Managed Streaming for Apache Kafka (MSK) – Per-broker, per-topic metrics
  • Amazon S3 – Request metrics at 1-minute intervals
  • Amazon SES – Detailed monitoring via event publishing

AWS Services with Monitoring Support

  • Auto Scaling
    • By default, basic monitoring is enabled when the launch configuration is created using the AWS Management Console, and detailed monitoring is enabled when the launch configuration is created using the AWS CLI or an API.
    • Auto Scaling sends data to CloudWatch every 5 minutes by default when created from Console.
    • For an additional charge, you can enable detailed monitoring for Auto Scaling, which sends data to CloudWatch every minute.
  • Amazon CloudFront
    • Amazon CloudFront sends data to CloudWatch every minute by default.
    • Additional distribution metrics (detailed monitoring) can be enabled for more fine-grained visibility.
  • Amazon CloudSearch
    • Amazon CloudSearch sends data to CloudWatch every minute by default.
  • Amazon EventBridge (formerly Amazon CloudWatch Events)
    • Amazon EventBridge sends data to CloudWatch every minute by default.
  • Amazon CloudWatch Logs
    • Amazon CloudWatch Logs sends data to CloudWatch every minute by default.
  • Amazon DynamoDB
    • Amazon DynamoDB sends data to CloudWatch every minute for some metrics and every 5 minutes for other metrics.
    • DynamoDB Contributor Insights provides additional metrics for table and global secondary index access patterns.
  • Amazon Elastic Container Service (Amazon ECS)
    • Amazon ECS sends data to CloudWatch every minute.
    • Container Insights provides additional detailed metrics at the cluster, service, task, and container level including CPU, memory, network, and storage metrics.
  • Amazon ElastiCache
    • Amazon ElastiCache sends data to CloudWatch every minute.
  • Amazon Elastic Block Store (EBS)
    • Amazon EBS sends data to CloudWatch every 5 minutes for gp2, st1, and sc1 volumes.
    • Provisioned IOPS SSD (io1 and io2) volumes automatically send one-minute metrics to CloudWatch.
    • gp3 volumes also send metrics at 1-minute intervals.
  • Amazon Elastic Compute Cloud (EC2)
    • Amazon EC2 sends data to CloudWatch every 5 minutes by default. For an additional charge, you can enable detailed monitoring for Amazon EC2, which sends data to CloudWatch every minute.
  • Elastic Load Balancing
    • Elastic Load Balancing sends data to CloudWatch every minute (applies to ALB, NLB, GLB, and Classic Load Balancer).
  • Amazon EMR (formerly Amazon Elastic MapReduce)
    • Amazon EMR sends basic data to CloudWatch every 5 minutes by default at no additional cost.
    • Starting with Amazon EMR Release 7.0+, the CloudWatch Agent can publish 34 enhanced metrics every minute (additional charges apply).
    • EMR Serverless sends metrics to CloudWatch every minute.
  • Amazon OpenSearch Service (formerly Amazon Elasticsearch Service)
    • Amazon OpenSearch Service sends data to CloudWatch every minute.
  • Amazon Kinesis Data Streams (formerly Amazon Kinesis Streams)
    • Amazon Kinesis Data Streams sends stream-level data to CloudWatch every minute.
    • Enhanced shard-level metrics (detailed monitoring) provide additional per-shard metrics.
  • Amazon Data Firehose (formerly Amazon Kinesis Data Firehose)
    • Amazon Data Firehose sends data to CloudWatch every minute.
  • AWS Lambda
    • AWS Lambda sends data to CloudWatch every minute.
    • Lambda Insights provides enhanced monitoring with system-level metrics (CPU, memory, network) at 1-minute intervals.
  • Amazon SageMaker AI
    • Amazon SageMaker AI (which replaced the legacy Amazon Machine Learning service) sends training, endpoint, and transform job metrics to CloudWatch every minute.
  • ⚠️ Note: The original Amazon Machine Learning service is no longer accepting new users. AWS recommends using Amazon SageMaker AI for machine learning workloads.
  • Amazon Redshift
    • Amazon Redshift sends data to CloudWatch every minute.
  • Amazon Relational Database Service (RDS)
    • Amazon RDS sends data to CloudWatch every minute.
    • CloudWatch Database Insights (launched Dec 2024) provides comprehensive database observability with fleet-level and instance-level dashboards.
  • Amazon Route 53
    • Amazon Route 53 sends data to CloudWatch every minute.
  • Amazon Simple Notification Service (SNS)
    • Amazon SNS sends data to CloudWatch every 5 minutes.
    • SNS does not support detailed (1-minute) monitoring.
  • Amazon Simple Queue Service (SQS)
    • Amazon SQS sends data to CloudWatch every 5 minutes.
  • Amazon Simple Storage Service (S3)
    • Amazon S3 sends storage metrics (bucket size, object count) to CloudWatch once a day (basic monitoring, free).
    • Request metrics (detailed monitoring) are available at 1-minute intervals and are billed as CloudWatch custom metrics.
    • 1-minute metrics are available at the bucket-level by default when request metrics are enabled.
  • Amazon Simple Workflow Service (SWF)
    • Amazon SWF sends data to CloudWatch every 5 minutes.
    • Note: AWS Step Functions is the recommended alternative for new workflow orchestration workloads.
  • AWS Storage Gateway
    • AWS Storage Gateway sends data to CloudWatch every 5 minutes.
  • AWS WAF
    • AWS WAF sends data to CloudWatch every minute.
  • Amazon WorkSpaces
    • Amazon WorkSpaces sends data to CloudWatch every 5 minutes.

⚠️ AWS OpsWorks – End of Life

AWS OpsWorks reached End of Life (EOL) on May 26, 2024. The service has been disabled for both new and existing customers. The OpsWorks console, API, CLI, and CloudFormation resources are no longer available.

Alternatives: AWS Systems Manager, AWS CodeDeploy, AWS CloudFormation

Additional Services Publishing CloudWatch Metrics (2024-2026)

The following additional AWS services publish metrics to CloudWatch (not in the original list):

  • Amazon API Gateway – Sends metrics every minute
  • AWS AppSync – Sends metrics every minute
  • Amazon EKS – Control plane metrics and Container Insights
  • Amazon Bedrock – Model invocation and throughput metrics
  • AWS Step Functions – Execution metrics every minute
  • Amazon Aurora – Database metrics every minute (with Database Insights)
  • AWS Fargate – Container-level metrics via Container Insights
  • Amazon MSK – Streaming metrics with per-broker/topic detail
  • AWS Network Firewall – Firewall metrics every minute
  • Amazon MemoryDB – Database metrics every minute

CloudWatch Enhanced Observability Features

  • Container Insights – Collects and aggregates metrics and logs from containerized applications on Amazon ECS, Amazon EKS, and Kubernetes. Provides cluster, node, pod, task, and service level metrics.
  • Lambda Insights – Enhanced monitoring for Lambda functions with system-level metrics (CPU, memory, network, disk).
  • Database Insights (Dec 2024) – Comprehensive database observability for Amazon RDS and Aurora with fleet-level health monitoring and instance-level SQL query analysis.
  • Application Signals (June 2024) – Application performance monitoring (APM) with pre-built dashboards showing volume, availability, latency, faults, and errors.
  • Internet Monitor – Near-continuous internet measurements for availability and performance, tailored to your workload footprint on AWS.
  • CloudWatch Investigations – AI-powered investigation of operational issues across services.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What is the minimum time interval for the data that Amazon CloudWatch receives and aggregates?
    1. One second (High-resolution custom metrics support 1-second resolution)
    2. Five seconds
    3. One minute
    4. Three minutes
    5. Five minutes

    Note: The original answer was “One minute” which was correct for standard metrics. With high-resolution custom metrics (introduced 2017), CloudWatch supports 1-second resolution. Exam questions may still reference 1 minute as the minimum for AWS service metrics.

  2. In the ‘Detailed’ monitoring data available for your Amazon EBS volumes, Provisioned IOPS volumes automatically send _____ minute metrics to Amazon CloudWatch.
    1. 3
    2. 1
    3. 5
    4. 2
  3. Using Amazon CloudWatch’s Free Tier, what is the frequency of metric updates, which you receive?
    1. 5 minutes
    2. 500 milliseconds.
    3. 30 seconds
    4. 1 minute
  4. What is the type of monitoring data (for Amazon EBS volumes) which is available automatically in 5-minute periods at no charge called?
    1. Basic
    2. Primary
    3. Detailed
    4. Local
  5. A user has created an Auto Scaling group using CLI. The user wants to enable CloudWatch detailed monitoring for that group. How can the user configure this?
    1. When the user sets an alarm on the Auto Scaling group, it automatically enables detail monitoring
    2. By default detailed monitoring is enabled for Auto Scaling (Detailed monitoring is enabled when you create the launch configuration using the AWS CLI or an API)
    3. Auto Scaling does not support detailed monitoring
    4. Enable detail monitoring from the AWS console
  6. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services provides detailed monitoring with CloudWatch without charging the user extra?
    1. AWS Auto Scaling
    2. AWS Route 53
    3. AWS EMR
    4. AWS SNS
  7. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services does not provide detailed monitoring with CloudWatch?
    1. AWS EMR (EMR sends basic metrics every 5 minutes by default; enhanced monitoring at 1-minute intervals is available starting with EMR 7.0+ via CloudWatch Agent)
    2. AWS RDS
    3. AWS ELB
    4. AWS Route53
  8. A user has enabled detailed CloudWatch monitoring with the AWS Simple Notification Service. Which of the below mentioned statements helps the user understand detailed monitoring better?
    1. SNS will send data every minute after configuration
    2. There is no need to enable since SNS provides data every minute
    3. AWS CloudWatch does not support monitoring for SNS
    4. SNS cannot provide data every minute
  9. A user has configured an Auto Scaling group with ELB. The user has enabled detailed CloudWatch monitoring on Auto Scaling. Which of the below mentioned statements will help the user understand the functionality better?
    1. It is not possible to setup detailed monitoring for Auto Scaling
    2. In this case, Auto Scaling will send data every minute and will charge the user extra
    3. Detailed monitoring will send data every minute without additional charges
    4. Auto Scaling sends data every minute only and does not charge the user
  10. Which of the following CloudWatch monitoring features provides near real-time visibility into application performance with pre-built dashboards?
    1. CloudWatch Logs Insights
    2. CloudWatch Alarms
    3. CloudWatch Application Signals
    4. CloudWatch Contributor Insights
  11. What is the minimum resolution supported by CloudWatch high-resolution custom metrics?
    1. 5 seconds
    2. 10 seconds
    3. 30 seconds
    4. 1 second
  12. Which CloudWatch feature provides comprehensive database observability with fleet-level health monitoring for Amazon RDS and Aurora?
    1. CloudWatch Logs Insights
    2. Enhanced Monitoring
    3. Performance Insights
    4. CloudWatch Database Insights

References

AWS ELB Monitoring

AWS ELB Monitoring

  • Elastic Load Balancing publishes data points to Amazon CloudWatch about the load balancers and targets (or back-end instances for Classic Load Balancer).
  • Elastic Load Balancing reports metrics to CloudWatch only when requests are flowing through the load balancer.
    • If there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals.
    • If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported.
  • AWS provides four types of load balancers, each with its own monitoring capabilities:
    • Application Load Balancer (ALB) – Layer 7, HTTP/HTTPS/gRPC
    • Network Load Balancer (NLB) – Layer 4, TCP/UDP/TLS
    • Gateway Load Balancer (GWLB) – Layer 3, transparent network gateway
    • Classic Load Balancer (CLB) – Previous generation (Layer 4/7)
  • ELB monitoring options include CloudWatch metrics, access logs, connection logs, health check logs, CloudTrail logs, and CloudWatch Internet Monitor.

CloudWatch Metrics

Classic Load Balancer (CLB) Metrics

  • CLB metrics use the AWS/ELB namespace.
  • HealthyHostCount, UnHealthyHostCount
    • Number of healthy and unhealthy instances registered with the load balancer.
    • Most useful statistics are Average, Min, and Max.
  • RequestCount
    • Number of requests completed or connections made during the specified interval (1 or 5 minutes).
    • Most useful statistic is Sum.
  • Latency
    • Time elapsed, in seconds, after the request leaves the load balancer until the headers of the response are received.
    • Most useful statistic is Average.
  • SurgeQueueLength
    • Total number of requests that are pending routing.
    • Load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request.
    • Maximum size of the queue is 1,024. Additional requests are rejected when the queue is full.
    • Most useful statistic is Max, because it represents the peak of queued requests.
  • SpilloverCount
    • The total number of requests that were rejected because the surge queue is full. Should ideally be 0.
    • Most useful statistic is Sum.
  • HTTPCode_ELB_4XX, HTTPCode_ELB_5XX
    • Client and server error codes generated by the load balancer.
    • Most useful statistic is Sum.
  • HTTPCode_Backend_2XX, HTTPCode_Backend_3XX, HTTPCode_Backend_4XX, HTTPCode_Backend_5XX
    • Number of HTTP response codes generated by registered instances.
    • Most useful statistic is Sum.

Application Load Balancer (ALB) Metrics

  • ALB metrics use the AWS/ApplicationELB namespace.
  • ActiveConnectionCount – Total concurrent TCP connections active from clients to the load balancer and from the load balancer to targets. Useful statistic: Sum.
  • NewConnectionCount – Total new TCP connections established from clients to the load balancer and from the load balancer to targets. Useful statistic: Sum.
  • RejectedConnectionCount – Number of connections rejected because the load balancer reached its maximum number of connections. Useful statistic: Sum.
  • RequestCount – Number of requests processed over IPv4 and IPv6. Useful statistic: Sum.
  • TargetResponseTime – Time elapsed after the request leaves the load balancer until the target starts to send response headers. Useful statistics: Average, pNN.NN (percentiles).
  • HealthyHostCount, UnHealthyHostCount – Number of healthy/unhealthy targets. Useful statistics: Average, Min, Max.
  • HTTPCode_Target_2XX_Count through 5XX_Count – HTTP response codes generated by targets. Useful statistic: Sum.
  • HTTPCode_ELB_4XX_Count, HTTPCode_ELB_5XX_Count – HTTP error codes generated by the load balancer itself. Useful statistic: Sum.
  • ClientTLSNegotiationErrorCount – TLS connections initiated by clients that did not establish a session with the load balancer. Useful statistic: Sum.
  • TargetConnectionErrorCount – Connections that were not successfully established between the load balancer and target. Useful statistic: Sum.
  • ProcessedBytes – Total bytes processed by the load balancer over IPv4 and IPv6. Useful statistic: Sum.
  • ConsumedLCUs – Number of Load Balancer Capacity Units (LCU) consumed. Used for billing calculations.
  • RuleEvaluations – Number of rules evaluated while processing requests.
  • AnomalousHostCount – Number of targets detected with anomalies (used with Automatic Target Weights). Useful statistics: Min, Max.

Network Load Balancer (NLB) Metrics

  • NLB metrics use the AWS/NetworkELB namespace.
  • ActiveFlowCount – Total number of concurrent flows (connections) from clients to targets. Useful statistic: Average.
  • NewFlowCount – Total number of new flows established from clients to targets. Useful statistic: Sum.
  • ProcessedBytes – Total bytes processed by the load balancer (TCP/TLS, UDP). Useful statistic: Sum.
  • TCP_Client_Reset_Count, TCP_Target_Reset_Count, TCP_ELB_Reset_Count – Number of reset (RST) packets sent from client, target, or the load balancer.
  • HealthyHostCount, UnHealthyHostCount – Number of healthy/unhealthy targets.
  • ConsumedLCUs – Number of Network Load Balancer Capacity Units consumed.
  • PeakBytesPerSecond – Highest average bytes per second for the load balancer during a period.

Gateway Load Balancer (GWLB) Metrics

  • GWLB metrics use the AWS/GatewayELB namespace.
  • ActiveFlowCount, NewFlowCount – Concurrent and new flows from clients to targets.
  • ProcessedBytes – Total bytes processed by the GWLB.
  • HealthyHostCount, UnHealthyHostCount – Number of healthy/unhealthy targets.
  • GWLB does NOT generate access logs since it is a transparent Layer 3 load balancer that does not terminate flows.

Elastic Load Balancer Access Logs

  • Elastic Load Balancing provides access logs that capture detailed information about all requests sent to the load balancer.
  • Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses.
  • Access logging is disabled by default and can be enabled without any additional charge. You are only charged for S3 storage.
  • Access logs are supported for ALB, NLB, and CLB. GWLB does not generate access logs.

ALB Access Logs

  • ALB publishes a log file for each load balancer node every 5 minutes to Amazon S3.
  • Log entries include: request type, timestamp, ELB name, client:port, target:port, request processing time, target processing time, response processing time, ELB/target status codes, received/sent bytes, request details, user agent, SSL cipher/protocol, target group ARN, trace ID, and more.

NLB Access Logs

  • NLB access logs capture information about TLS requests sent to the load balancer.
  • Logs can be stored in Amazon S3.
  • New (Nov 2025): NLB access logs now support delivery as CloudWatch Vended Logs, enabling direct delivery to CloudWatch Logs, Amazon Data Firehose, and Amazon S3 with Apache Parquet format support. This allows real-time log analysis using CloudWatch Logs Insights and Live Tail.

ALB Connection Logs

  • Connection logs capture detailed information about TLS connections established between clients and the ALB.
  • Useful for troubleshooting TLS client connection issues (e.g., mTLS failures, cipher mismatches).
  • Connection logs are stored in Amazon S3, with a log file published every 5 minutes.
  • This is an optional feature, disabled by default.
  • Log entries include: timestamp, client IP:port, listener port, TLS protocol/cipher, connection status, client certificate details (for mTLS), and more.

ALB Health Check Logs

  • New (Nov 2025): ALB now supports Health Check Logs that send detailed target health check data directly to a designated Amazon S3 bucket.
  • This optional feature captures:
    • Health check status (healthy/unhealthy)
    • Timestamps
    • Target identification data
    • Failure reasons for unhealthy targets
  • Health check logs are published every 5 minutes per load balancer node.
  • Helps troubleshoot intermittent target health check failures without needing to rely solely on CloudWatch metrics.
  • No additional charge; you pay only for S3 storage.

CloudWatch Internet Monitor

  • Amazon CloudWatch Internet Monitor provides internet performance and availability measurements for user traffic to load balancers.
  • Monitors internet traffic patterns and identifies issues that affect internet connectivity between users and AWS.
  • Supported for both ALB and NLB.
  • NLB integration (Sep 2024): You can create or associate a monitor for an NLB directly when creating it in the AWS Management Console.
  • Provides city-level visibility into performance impairments and their geographic scope.

CloudWatch Network Flow Monitor

  • New (Dec 2024, re:Invent): CloudWatch Network Flow Monitor offers network performance monitoring across AWS managed services.
  • Provides near real-time visibility into network performance for traffic between compute resources (EC2, EKS), to AWS services (S3, DynamoDB), and to other AWS Regions.
  • Uses lightweight agents to gather TCP connection performance statistics (packet loss, latency).
  • Can determine if AWS is the cause of a detected network issue for monitored flows.

ALB Automatic Target Weights (ATW)

  • New (Nov 2023): ALB supports Automatic Target Weights (ATW), which uses anomaly detection to optimize traffic routing.
  • ATW detects and mitigates gray failures — situations where a target passes health checks but still returns elevated errors.
  • Anomaly detection is automatically enabled on HTTP/HTTPS target groups with at least three healthy targets.
  • ATW analyzes HTTP return status codes and TCP/TLS errors to identify anomalous targets and reduces traffic to them.
  • Provides the AnomalousHostCount CloudWatch metric to monitor detected anomalies.

CloudWatch Anomaly Detection Alarms

  • CloudWatch anomaly detection uses machine learning to model expected metric behavior and automatically creates upper and lower bounds.
  • Can be used with ELB metrics like TargetResponseTime, RequestCount, HTTPCode_ELB_5XX to detect unusual patterns.
  • Recommended approach for monitoring ELB performance without manually setting static thresholds.
  • Works with ALB, NLB, CLB, and GWLB metrics.

CloudTrail Logs

  • AWS CloudTrail captures all API calls to the Elastic Load Balancing API made by or on behalf of your AWS account.
  • API calls can be made directly, or indirectly through the AWS Management Console, AWS CLI, or SDKs.
  • CloudTrail stores the information as log files in an Amazon S3 bucket.
  • Logs can be used to monitor load balancer activity and determine what API call was made, what source IP address was used, who made the call, when it was made, and so on.
  • Applies to all ELB types (ALB, NLB, GWLB, CLB).

Classic Load Balancer – Migration Recommendation

⚠️ Note: Classic Load Balancer is the previous generation load balancer. AWS strongly recommends migrating to Application Load Balancer (Layer 7) or Network Load Balancer (Layer 4).

EC2-Classic networking was fully retired in August 2023. While CLB continues to function in VPC, no new features are being added to it. Use the AWS Migration Wizard to move to ALB or NLB.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An admin is planning to monitor the ELB. Which of the below mentioned services does not help the admin capture the monitoring information about the ELB activity?
    1. ELB Access logs
    2. ELB health check
    3. CloudWatch metrics
    4. ELB API calls with CloudTrail
  2. A customer needs to capture all client connection information from their load balancer every five minutes. The company wants to use this data for analyzing traffic patterns and troubleshooting their applications. Which of the following options meets the customer requirements?
    1. Enable AWS CloudTrail for the load balancer.
    2. Enable access logs on the load balancer.
    3. Install the Amazon CloudWatch Logs agent on the load balancer.
    4. Enable Amazon CloudWatch metrics on the load balancer.
  3. Your supervisor has requested a way to analyze traffic patterns for your application. You need to capture all connection information from your load balancer every 10 minutes. Pick a solution from below. Choose the correct answer:
    1. Enable access logs on the load balancer.
    2. Create a custom metric CloudWatch filter on your load balancer.
    3. Use a CloudWatch Logs Agent.
    4. Use AWS CloudTrail with your load balancer.
  4. A company runs a web application behind an Application Load Balancer. Some users are experiencing intermittent 5XX errors but health checks show all targets as healthy. Which ALB feature can automatically detect and mitigate this issue?
    1. Cross-Zone Load Balancing
    2. Automatic Target Weights (ATW)
    3. Connection Draining
    4. Sticky Sessions
  5. A DevOps engineer needs to troubleshoot why targets behind an ALB are intermittently failing health checks. Which recently introduced feature provides detailed health check failure reasons stored in S3?
    1. ALB Access Logs
    2. CloudWatch HealthyHostCount metric
    3. ALB Health Check Logs
    4. AWS CloudTrail
  6. A solutions architect wants to analyze NLB access logs in near real-time using CloudWatch Logs Insights. Which delivery option should they configure?
    1. Enable NLB access logs to S3 and create Athena queries
    2. Configure NLB access logs as CloudWatch Vended Logs
    3. Enable VPC Flow Logs on the NLB
    4. Install CloudWatch Agent on NLB nodes
  7. Which of the following is a metric specific to Classic Load Balancer that indicates the load balancer cannot route requests because the queue is full?
    1. RejectedConnectionCount
    2. TargetConnectionErrorCount
    3. SpilloverCount
    4. HTTPCode_ELB_503
  8. A company wants to identify if AWS infrastructure is causing latency issues for users connecting to their Network Load Balancer from different geographic locations. Which service should they use?
    1. AWS X-Ray
    2. CloudWatch Metrics
    3. Amazon CloudWatch Internet Monitor
    4. VPC Flow Logs

References