AWS EC2 Monitoring

EC2 Monitoring

Status Checks

  • Status monitoring helps quickly determine whether EC2 has detected any problems that might prevent instances from running applications.
  • EC2 performs automated checks on every running EC2 instance to identify hardware and software issues.
  • Status checks are performed every minute and each returns a pass or a fail status.
  • If all checks pass, the overall status of the instance is OK.
  • If one or more checks fail, the overall status is Impaired.
  • Status checks are built into EC2, so they cannot be disabled or deleted.
  • Status checks data augments the information that EC2 already provides about the intended state of each instance (such as pending, running, and stopping) as well as the utilization metrics that CloudWatch monitors (CPU utilization, network traffic, and disk activity).
  • Alarms can be created or deleted, that are triggered based on the result of the status checks. for e.g., an alarm can be created to warn if status checks fail on a specific instance.

System Status Checks

  • monitor the AWS systems, required to use the instance, to ensure they are working properly.
  • detect problems with the instance that require AWS involvement to repair.
  • System status checks failure might due to
    • Loss of network connectivity
    • Loss of system power
    • Software issues on the physical host
    • Hardware issues on the physical host
  • When a system status check fails, one can either
    • check Personal Health Dashboard for any scheduled critical maintenance by AWS to the instance’s host.
    • wait for AWS to fix the issue
    • or resolve it by stopping and restarting or terminating and replacing an instance

Instance Status Checks

  • monitor the software and network configuration of the individual instance
  • checks to detect problems that require involvement to repair.
  • Instance status checks failure might be due to
    • Failed system status checks
    • Misconfigured networking or startup configuration
    • Exhausted memory
    • Corrupted file system
    • Incompatible kernel
  • When an instance status check fails, it can be resolved by either rebooting the instance or by making modifications to the operating system

CloudWatch Monitoring

  • CloudWatch helps monitor EC2 instances, which collects and processes
    raw data from EC2 into readable, near real-time metrics.
  • Statistics are recorded for a period of two weeks so that historical information can be accessed and used to gain a better perspective on how
    the application or service is performing.
  • By default, Basic monitoring is enabled and EC2 metric data is sent to CloudWatch in 5-minute periods automatically
  • Detailed monitoring can be enabled on the EC2 instance, which sends data to CloudWatch in 1-minute periods.
  • Aggregating Statistics Across Instances/ASG/AMI ID
    • Aggregate statistics are available for the instances that have detailed monitoring (at an additional charge) enable, which provides data in 1-minute periods
    • Instances that use basic monitoring are not included in the aggregates.
    • CloudWatch does not aggregate data across Regions. Therefore, metrics are completely separate between regions.
    • CloudWatch returns statistics for all dimensions in the AWS/EC2 namespace if no dimension is specified
    • The technique for retrieving all dimensions across an AWS namespace does not work for custom namespaces published to CloudWatch.
    • Statistics include Sum, Average, Minimum, Maximum, Data Samples
    • With custom namespaces, the complete set of dimensions that are associated with any given data point to retrieve statistics that include the data point must be specified
  • CloudWatch alarms
    • can be created to monitor any one of the EC2 instance’s metrics.
    • can be configured to automatically send you a notification when the metric reaches a specified threshold.
    • can automatically stop, terminate, reboot, or recover EC2 instances
    • can automatically recover an EC2 instance when the instance becomes impaired due to an underlying hardware failure a problem that requires AWS involvement to repair
    • can automatically stop or terminate the instances to save costs (EC2 instances that use an EBS volume as the root device can be stopped
      or terminated, whereas instances that use the instance store as the root device can only be terminated)
    • can use EC2ActionsAccess IAM role, which enables AWS to perform stop, terminate, or reboot actions on EC2 instances
    • If you have read/write permissions for CloudWatch but not for EC2, alarms can still be created but the stop or terminate actions won’t be performed on the EC2 instance

EC2 Monitoring Metrics

  • CPUCreditUsage
    • (Only valid for T2 instances) The number of CPU credits consumed
      during the specified period.
    • This metric identifies the amount of time during which physical CPUs
      were used for processing instructions by virtual CPUs allocated to
      the instance.
    • CPU Credit metrics are available at a 5-minute frequency.
  • CPUCreditBalance
    • (Only valid for T2 instances) The number of CPU credits that an instance has accumulated.
    • This metric is used to determine how long an instance can burst beyond its baseline performance level at a given rate.
    • CPU Credit metrics are available at a 5-minute frequency.
  • CPUUtilization
    • % of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.
  • DiskReadOps
    • Completed read operations from all instance store volumes available to the instance in a specified period of time.
  • DiskWriteOps
    • Completed write operations to all instance store volumes available to the instance in a specified period of time.
  • DiskReadBytes
    • Bytes read from all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application reads from the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • DiskWriteBytes
    • Bytes written to all instance store volumes available to the instance.
    • This metric is used to determine the volume of the data the application writes onto the hard disk of the instance.
    • This can be used to determine the speed of the application.
  • NetworkIn
    • The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to an application on a single instance.
  • NetworkOut
    • The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic to an application on a single instance.
  • NetworkPacketsIn
    • The number of packets received on all network interfaces by the instance. This metric identifies the volume of incoming traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only
  • NetworkPacketsOut
    • The number of packets sent out on all network interfaces by the instance. This metric identifies the volume of outgoing traffic in terms of the number of packets on a single instance.
    • This metric is available for basic monitoring only.
  • StatusCheckFailed
    • Reports if either of the status checks, StatusCheckFailed_Instance and StatusCheckFailed_System that has failed.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency
  • StatusCheckFailed_Instance
    • Reports whether the instance has passed the EC2 instance status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency
  • StatusCheckFailed_System
    • Reports whether the instance has passed the EC2 system status check in the last minute.
    • Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status checks passed. A one indicates a status check failure.
    • Status check metrics are available at a 1-minute frequency

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. In the basic monitoring package for EC2, Amazon CloudWatch provides the following metrics:
    1. Web server visible metrics such as number failed transaction requests
    2. Operating system visible metrics such as memory utilization
    3. Database visible metrics such as number of connections
    4. Hypervisor visible metrics such as CPU utilization
  2. Which of the following requires a custom CloudWatch metric to monitor?
    1. Memory Utilization of an EC2 instance
    2. CPU Utilization of an EC2 instance
    3. Disk usage activity of an EC2 instance
    4. Data transfer of an EC2 instance
  3. A user has configured CloudWatch monitoring on an EBS backed EC2 instance. If the user has not attached any additional device, which of the below mentioned metrics will always show a 0 value?
    1. DiskReadBytes
    2. NetworkIn
    3. NetworkOut
    4. CPUUtilization
  4. A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
    1. Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
    2. Setup the CloudWatch with Auto Scaling to terminate all the instances
    3. Setup a job which terminates all instances after 600 minutes
    4. It is not possible to terminate instances automatically
  5. An AWS account owner has setup multiple IAM users. One IAM user only has CloudWatch access. He has setup the alarm action, which stops the EC2 instances when the CPU utilization is below the threshold limit. What will happen in this case?
    1. It is not possible to stop the instance using the CloudWatch alarm
    2. CloudWatch will stop the instance when the action is executed
    3. The user cannot set an alarm on EC2 since he does not have the permission
    4. The user can setup the action but it will not be executed if the user does not have EC2 rights
  6. A user has launched 10 instances from the same AMI ID using Auto Scaling. The user is trying to see the average CPU utilization across all instances of the last 2 weeks under the CloudWatch console. How can the user achieve this?
    1. View the Auto Scaling CPU metrics (Refer AS Instance Monitoring)
    2. Aggregate the data over the instance AMI ID (Works but needs detailed monitoring enabled)
    3. The user has to use the CloudWatchanalyser to find the average data across instances
    4. It is not possible to see the average CPU utilization of the same AMI ID since the instance ID is different

References

EC2_Monitoring

CloudWatch Monitoring Supported AWS Services

CloudWatch Monitoring Supported AWS Services

  • CloudWatch offers either basic or detailed monitoring for supported AWS services.
  • Basic monitoring means that a service sends data points to CloudWatch every five minutes.
  • Detailed monitoring means that a service sends data points to CloudWatch every minute.
  • If the AWS service supports both basic and detailed monitoring, the basic would be enabled by default and the detailed monitoring needs to be enabled for details metrics

AWS Services with Monitoring support

  • Auto Scaling
    • By default, basic monitoring is enabled when the launch configuration is created using the AWS Management Console, and detailed monitoring is enabled when the launch configuration is created using the AWS CLI or an API
    • Auto Scaling sends data to CloudWatch every 5 minutes by default when created from Console.
    • For an additional charge, you can enable detailed monitoring for Auto Scaling, which sends data to CloudWatch every minute.
  • Amazon CloudFront
    • Amazon CloudFront sends data to CloudWatch every minute by default.
  • Amazon CloudSearch
    • Amazon CloudSearch sends data to CloudWatch every minute by default.
  • Amazon CloudWatch Events
    • Amazon CloudWatch Events sends data to CloudWatch every minute by default.
  • Amazon CloudWatch Logs
    • Amazon CloudWatch Logs sends data to CloudWatch every minute by default.
  • Amazon DynamoDB
    • Amazon DynamoDB sends data to CloudWatch every minute for some metrics and every 5 minutes for other metrics.
  • Amazon EC2 Container Service
    • Amazon EC2 Container Service sends data to CloudWatch every minute.
  • Amazon ElastiCache
    • Amazon ElastiCache sends data to CloudWatch every minute.
  • Amazon Elastic Block Store
    • Amazon Elastic Block Store sends data to CloudWatch every 5 minutes.
    • Provisioned IOPS SSD (io1) volumes automatically send one-minute metrics to CloudWatch.
  • Amazon Elastic Compute Cloud
    • Amazon EC2 sends data to CloudWatch every 5 minutes by default. For an additional charge, you can enable detailed monitoring for Amazon EC2, which sends data to CloudWatch every minute.
  • Elastic Load Balancing
    • Elastic Load Balancing sends data to CloudWatch every minute.
  • Amazon Elastic MapReduce
    • Amazon Elastic MapReduce sends data to CloudWatch every 5 minutes.
  • Amazon Elasticsearch Service
    • Amazon Elasticsearch Service sends data to CloudWatch every minute.
  • Amazon Kinesis Streams
    • Amazon Kinesis Streams sends data to CloudWatch every minute.
  • Amazon Kinesis Firehose
    • Amazon Kinesis Firehose sends data to CloudWatch every minute.
  • AWS Lambda
    • AWS Lambda sends data to CloudWatch every minute.
  • Amazon Machine Learning
    • Amazon Machine Learning sends data to CloudWatch every 5 minutes.
  • AWS OpsWorks
    • AWS OpsWorks sends data to CloudWatch every minute.
  • Amazon Redshift
    • Amazon Redshift sends data to CloudWatch every minute.
  • Amazon Relational Database Service
    • Amazon Relational Database Service sends data to CloudWatch every minute.
  • Amazon Route 53
    • Amazon Route 53 sends data to CloudWatch every minute.
  • Amazon Simple Notification Service
    • Amazon Simple Notification Service sends data to CloudWatch every 5 minutes.
  • Amazon Simple Queue Service
    • Amazon Simple Queue Service sends data to CloudWatch every 5 minutes.
  • Amazon Simple Storage Service
    • Amazon Simple Storage Service sends data to CloudWatch once a day.
  • Amazon Simple Workflow Service
    • Amazon Simple Workflow Service sends data to CloudWatch every 5 minutes.
  • AWS Storage Gateway
    • AWS Storage Gateway sends data to CloudWatch every 5 minutes.
  • AWS WAF
    • AWS WAF sends data to CloudWatch every minute.
  • Amazon WorkSpaces
    • Amazon WorkSpaces sends data to CloudWatch every 5 minutes.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What is the minimum time Interval for the data that Amazon CloudWatch receives and aggregates?
    1. One second
    2. Five seconds
    3. One minute
    4. Three minutes
    5. Five minutes
  2. In the ‘Detailed’ monitoring data available for your Amazon EBS volumes, Provisioned IOPS volumes automatically send _____ minute metrics to Amazon CloudWatch.
    1. 3
    2. 1
    3. 5
    4. 2
  3. Using Amazon CloudWatch’s Free Tier, what is the frequency of metric updates, which you receive?
    1. 5 minutes
    2. 500 milliseconds.
    3. 30 seconds
    4. 1 minute
  4. What is the type of monitoring data (for Amazon EBS volumes) which is available automatically in 5-minute periods at no charge called?
    1. Basic
    2. Primary
    3. Detailed
    4. Local
  5. A user has created an Auto Scaling group using CLI. The user wants to enable CloudWatch detailed monitoring for that group. How can the user configure this?
    1. When the user sets an alarm on the Auto Scaling group, it automatically enables detail monitoring
    2. By default detailed monitoring is enabled for Auto Scaling (Detailed monitoring is enabled when you create the launch configuration using the AWS CLI or an API)
    3. Auto Scaling does not support detailed monitoring
    4. Enable detail monitoring from the AWS console
  6. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services provides detailed monitoring with CloudWatch without charging the user extra?
    1. AWS Auto Scaling
    2. AWS Route 53
    3. AWS EMR
    4. AWS SNS
  7. A user is trying to understand the detailed CloudWatch monitoring concept. Which of the below mentioned services does not provide detailed monitoring with CloudWatch?
    1. AWS EMR
    2. AWS RDS
    3. AWS ELB
    4. AWS Route53
  8. A user has enabled detailed CloudWatch monitoring with the AWS Simple Notification Service. Which of the below mentioned statements helps the user understand detailed monitoring better?
    1. SNS will send data every minute after configuration
    2. There is no need to enable since SNS provides data every minute
    3. AWS CloudWatch does not support monitoring for SNS
    4. SNS cannot provide data every minute
  9. A user has configured an Auto Scaling group with ELB. The user has enabled detailed CloudWatch monitoring on Auto Scaling. Which of the below mentioned statements will help the user understand the functionality better?
    1. It is not possible to setup detailed monitoring for Auto Scaling
    2. In this case, Auto Scaling will send data every minute and will charge the user extra
    3. Detailed monitoring will send data every minute without additional charges
    4. Auto Scaling sends data every minute only and does not charge the user

References

AWS ELB Monitoring

AWS ELB Monitoring

  • Elastic Load Balancing publishes data points to CloudWatch about the load balancers and back-end instances
  • Elastic Load Balancing reports metrics to CloudWatch only when requests are flowing through the load balancer.
    • If there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals.
    • If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported.

CloudWatch Metrics

  • HealthyHostCount, UnHealthyHostCount
    • Number of healthy and unhealthy instances registered with the load balancer.
    • Most useful statistics are average, min, and max
  • RequestCount
    • Number of requests completed or connections made during the specified interval (1 or 5 minutes).
    • Most useful statistic is sum
  • Latency
    • Time elapsed, in seconds, after the request leaves the load balancer until the headers of the response are received.
    • Most useful statistic is average
  • SurgeQueueLength
    • Total number of requests that are pending routing.
    • Load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request.
    • Maximum size of the queue is 1,024. Additional requests are rejected when the queue is full.
    • Most useful statistic is max, because it represents the peak of queued requests.
  • SpilloverCount
    • The total number of requests that were rejected because the surge queue is full. Should ideally be 0
    • Most useful statistic is sum.
  • HTTPCode_ELB_4XX, HTTPCode_ELB_5XX
    • Client and Server error code generated by the load balancer
    • Most useful statistic is sum.
  • HTTPCode_Backend_2XX, HTTPCode_Backend_3XX, HTTPCode_Backend_4XX, HTTPCode_Backend_5XX
    • Number of HTTP response codes generated by registered instances
    • Most useful statistic is sum.

Elastic Load Balancer Access Logs

  • Elastic Load Balancing provides access logs that capture detailed information about all requests sent to your load balancer.
  • Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses.
  • Elastic Load Balancing captures the logs and stores them in the Amazon S3 bucket
  • Access logging is disabled by default and can be enabled without any additional charge. You are only charged for S3 storage

CloudTrail Logs

  • AWS CloudTrail can be used to capture all calls to the Elastic Load Balancing API made by or on behalf of your AWS account and either made using Elastic Load Balancing API directly or indirectly through the AWS Management Console or AWS CLI
  • CloudTrail stores the information as log files in an Amazon S3 bucket that you specify.
  • Logs collected by CloudTrail can be used to monitor the activity of your load balancers and determine what API call was made, what source IP address was used, who made the call, when it was made, and so on

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An admin is planning to monitor the ELB. Which of the below mentioned services does not help the admin capture the monitoring information about the ELB activity
    1. ELB Access logs
    2. ELB health check
    3. CloudWatch metrics
    4. ELB API calls with CloudTrail
  2. A customer needs to capture all client connection information from their load balancer every five minutes. The company wants to use this data for analyzing traffic patterns and troubleshooting their applications. Which of the following options meets the customer requirements?
    1. Enable AWS CloudTrail for the load balancer.
    2. Enable access logs on the load balancer.
    3. Install the Amazon CloudWatch Logs agent on the load balancer.
    4. Enable Amazon CloudWatch metrics on the load balancer
  3. Your supervisor has requested a way to analyze traffic patterns for your application. You need to capture all connection information from your load balancer every 10 minutes. Pick a solution from below. Choose the correct answer:
    1. Enable access logs on the load balancer
    2. Create a custom metric CloudWatch filter on your load balancer
    3. Use a CloudWatch Logs Agent
    4. Use AWS CloudTrail with your load balancer

References

Elastic Load Balance developer guide