- CloudWatch namespaces are containers for metrics.
- Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
- AWS namespaces all follow the convention AWS/<service>, for e.g. AWS/EC2 and AWS/ELB
- Namespace names must be fewer than 256 characters in length.
- There is no default namespace. Each data element put into CloudWatch must specify a namespace.
Metrics
- Metric is the fundamental concept in CloudWatch.
- Uniquely defined by a name, a namespace, and one or more dimensions.
- Represents a time-ordered set of data points published to CloudWatch.
- Each data point has a time stamp, and (optionally) a unit of measure.
- Data points can be either custom metrics or metrics from other services in AWS.
- Statistics can be retrieved about those data points as an ordered set of time-series data that occur within a specified time window.
- When the statistics are requested, the returned data stream is identified by namespace, metric name, dimension, and (optionally) the unit.
- Metrics exist only in the region in which they are created.
- CloudWatch stores the metric data for two weeks
- Metrics cannot be deleted, but they automatically expire after 15 months, if no new data is published to them.
- Metric retention is as follows
- Data points with a period of less than 60 seconds are available for 3 hours. These data points are high-resolution custom metrics.
- Data points with a 60 secs (1 min) period are available for 15 days
- Data points with a 300 secs (5 min) period are available for 63 days
- Data points with a 3600 secs (1 hour) period are available for 455 days (15 months)
Dimensions
- A dimension is a name/value pair that uniquely identifies a metric.
- Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
- Dimensions help design a structure for the statistics plan.
- Dimensions are part of the unique identifier for a metric, whenever a unique name pair is added to one of the metrics, a new metric is created.
- Dimensions can be used to filter result sets that CloudWatch query returns.
- A metric can be assigned up to ten dimensions to a metric.
Time Stamps
- Each metric data point must be marked with a time stamp to identify the data point on a time series.
- Timestamp can be up to two weeks in the past and up to two hours into the future.
- If no timestamp is provided, a time stamp based on the time the data element was received is created.
- All times reflect the UTC time zone when statistics are retrieved
Resolution
- Each metric is one of the following:
- Standard resolution, with data having a one-minute granularity
- High resolution, with data at a granularity of one second
Units
- Units represent the statistic’s unit of measure e.g. count, bytes, %, etc
Statistics
- Statistics are metric data aggregations over specified periods of time
- Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period
Periods
- Period is the length of time associated with a specific statistic.
- Each statistic represents an aggregation of the metrics data collected for a specified period of time.
- Although periods are expressed in seconds, the minimum granularity for a period is one minute.
Aggregation
- CloudWatch aggregates statistics according to the period length specified in calls to GetMetricStatistics.
- Multiple data points can be published with the same or similar time stamps. CloudWatch aggregates them by period length when the statistics about those data points are requested.
- Aggregated statistics are only available when using detailed monitoring.
- Instances that use basic monitoring are not included in the aggregates
- CloudWatch does not aggregate data across regions.
Alarms
- Alarms can automatically initiate actions on behalf of the user, based on specified parameters.
- Alarm watches a single metric over a specified time period, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
- Alarms invoke actions for sustained state changes only i.e. the state must have changed and been maintained for a specified number of periods.
- Action can be a
- SNS notification
- Auto Scaling policies
- EC2 action – stop or terminate EC2 instances
- Lambda function invocation (same or cross-account)
- Systems Manager OpsItems or Incident Manager incidents
- After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action associated with the alarm.
- For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.
- For SNS notifications, no additional actions are invoked.
- An alarm has three possible states:
- OK—The metric is within the defined threshold
- ALARM—The metric is outside of the defined threshold
- INSUFFICIENT_DATA—Alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
- Alarms exist only in the region in which they are created.
- Alarm actions must reside in the same region as the alarm
- Alarm history is available for the last 14 days.
- Alarm can be tested by setting it to any state using the
SetAlarmStateAPI (mon-set-alarm-statecommand). This temporary state change lasts only until the next alarm comparison occurs. - Alarms can be disabled and enabled using the
DisableAlarmActionsandEnableAlarmActionsAPIs (mon-disable-alarm-actionsandmon-enable-alarm-actionscommands). - CloudWatch now allows alarming on data up to 7 days old (increased from previous 24-hour limit), with a period of at least 3,600 seconds (1 hour).
Composite Alarms
- Composite alarms determine their state by monitoring the states of other alarms.
- Rules are defined to combine the status of monitored alarms using Boolean logic (AND, OR, NOT).
- Composite alarms help reduce alarm noise by taking actions only at an aggregated level.
- A composite alarm’s rule expression can include as many as 100 underlying alarms.
- Any single alarm can be included in the rule expressions of as many as 150 composite alarms.
- Composite alarms can:
- Send Amazon SNS notifications when they change state
- Create Systems Manager OpsItems or incidents when they go into ALARM state
- Start CloudWatch Investigations for root cause analysis
- Composite alarms cannot perform EC2 actions or Auto Scaling actions.
Anomaly Detection
- CloudWatch anomaly detection analyzes past metric data and creates a model of expected values.
- Alarms can be created based on anomaly detection to alert when metric values are outside the expected band.
- The model adjusts for time-of-day patterns, day-of-week patterns, and trends over time.
- Useful for metrics without a fixed threshold, such as CPU utilization with seasonal patterns.
Regions
- CloudWatch does not aggregate data across regions. Therefore, metrics are completely separate between regions.
CloudWatch Metrics Insights
- CloudWatch Metrics Insights is a high-performance SQL-based query engine to query metrics at scale.
- Enables identification of trends and patterns across all CloudWatch metrics in real time.
- Provides access to up to two weeks of historical data for trend analysis.
- Supports functions including AVG, COUNT, MAX, MIN, and SUM.
- Can be combined with CloudWatch Alarms to set up dynamic alarms that consistently monitor fast-moving environments.
- Supports natural language query generation powered by generative AI – users can ask questions in plain English to generate metric queries.
Custom Metrics
- CloudWatch allows publishing custom metrics with
put-metric-dataCLI command (or its Query API equivalentPutMetricData) - CloudWatch creates a new metric if
put-metric-datais called with a new metric name, else it associates the data with the specified existing metric put-metric-datacommand can only publish one data point per call- CloudWatch stores data about a metric as a series of data points and each data point has an associated time stamp
- Creating a new metric using the put-metric-data command, can take up to two minutes before statistics can be retrieved on the new metric using the get-metric-statistics command and can take up to fifteen minutes before the new metric appears in the list of metrics retrieved using the list-metrics command.
- CloudWatch allows publishing
- Single data point
- Data points can be published with time stamps as granular as one-thousandth of a second, CloudWatch aggregates the data to a minimum granularity of one minute
- CloudWatch records the average (sum of all items divided by number of items) of the values received for every 1-minute period, as well as number of samples, maximum value, and minimum value for the same time period
- CloudWatch uses one-minute boundaries when aggregating data points
- Aggregated set of data points called a statistics set
- Data can also be aggregated before being published to CloudWatch
- Aggregating data minimizes the number of calls reducing it to a single call per minute with the statistic set of data
- Statistics include Sum, Average, Minimum, Maximum, SampleCount
- Single data point
- If the application produces data that is more sporadic and have periods that have no associated data, either a the value zero (
0) or no value at all can be published - However, it can be helpful to publish zero instead of no value
- to monitor the health of your application for e.g. alarm can be configured to notify if no metrics published every 5 minutes
- to track the total number of data points
- to have statistics such as minimum and average to include data points with the value 0.
CloudWatch Dashboards
- CloudWatch dashboards are customizable home pages in the CloudWatch console used to monitor the resources in a single view, even those resources that are spread across different Regions.
- Dashboards can be used to create customized views of the metrics and alarms for the AWS resources.
- Dashboards can help to create
- A single view for selected metrics and alarms to help assess the health of the resources and applications across one or more Regions.
- An operational playbook that provides guidance for team members during operational events about how to respond to specific incidents.
- A common view of critical resource and application measurements that can be shared by team members for faster communication flow during operational events.
CloudWatch Cross-Account Observability
- CloudWatch cross-account observability helps monitor and troubleshoot applications that span multiple accounts within a Region.
- Uses Observability Access Manager (OAM) to create and manage links between source and monitoring accounts.
- Cross-account observability includes monitoring and source accounts
- A monitoring account is a central AWS account that can view and interact with observability data generated from source accounts.
- A source account is an individual AWS account that generates observability data for the resources that reside in it.
- Source accounts share their observability data with the monitoring account which can include the following types of telemetry:
- Metrics in CloudWatch
- Log groups in CloudWatch Logs
- Traces in AWS X-Ray
- Application Signals services and SLOs
- Available in all commercial AWS Regions and AWS GovCloud (US) Regions at no extra cost for logs and metrics.
- Cross-Account and Cross-Region Log Centralization (2025) allows copying log data from multiple accounts and regions into a single destination account for consolidated analysis.
CloudWatch Agent
- CloudWatch Agent helps collect metrics and logs from EC2 instances and on-premises servers and push them to CloudWatch.
- Logs collected by the unified agent are processed and stored in CloudWatch Logs.
- Supports collecting metrics and traces via OpenTelemetry Protocol (OTLP).
- Can collect from AWS, on-premises, or multicloud environments.
CloudWatch Logs
Refer blog post @ CloudWatch Logs
CloudWatch Application Signals
- CloudWatch Application Signals (GA June 2024) provides application performance monitoring (APM) capabilities.
- Automatically instruments applications on AWS to track application performance against business objectives.
- Provides pre-built, standardized dashboards showing key metrics: volume, availability, latency, faults, and errors.
- Supports Service Level Objectives (SLOs) to define and track application performance targets.
- Application map automatically discovers and visualizes application topology without requiring instrumentation.
- Supports runtime metrics for Java, Python, and .NET applications.
- Supports serverless applications on AWS Lambda.
- Works with cross-account observability for centralized monitoring.
CloudWatch Internet Monitor
- Internet Monitor provides near-continuous internet measurements for internet traffic tailored to your workload footprint on AWS.
- Provides availability and performance metrics (performance score, availability score, round-trip time).
- Alerts when internet issues affect application users.
- Helps troubleshoot and mitigate network problems before they impact customers.
- Creates aggregated CloudWatch metrics for traffic to your application, by AWS Region and edge location.
CloudWatch Investigations
- CloudWatch Investigations (GA June 2025) is an AI-powered root cause analysis feature.
- Powered by Amazon Q Developer, it analyzes metrics, logs, traces, and deployment events.
- Helps find root cause without manually querying multiple sources of telemetry.
- Identifies anomalies, surfaces related signals, suggests remediation steps.
- Can be triggered automatically from a CloudWatch alarm or created manually.
- Supports integration with Slack for DevOps troubleshooting workflows.
- Significantly reduces Mean Time to Resolution (MTTR).
CloudWatch Pipelines
- CloudWatch Pipelines is a fully managed service that ingests, transforms, and routes log data to CloudWatch without managing infrastructure.
- Supports log data from AWS services, third-party applications, and custom sources.
- Provides processors for parsing, transforming, and enriching log data.
- Supports conditional processing and drop event processing for fine-grained control.
- Includes compliance and governance capabilities for data integrity and access control.
- Supports AI-assisted configuration of processors via natural language descriptions.
- Supports OCSF (Open Cybersecurity Schema Framework) and OpenTelemetry formats.
CloudWatch OpenTelemetry Support
- CloudWatch now supports native OpenTelemetry (OTel) metrics ingestion (preview, April 2026).
- Enables sending metrics directly using OTLP without custom conversion logic or additional tooling.
- Supports high-cardinality metrics with up to 150 labels per metric.
- Supports metric types: gauge, sum, histogram, and exponential histogram.
- Metrics can be queried using PromQL (Prometheus Query Language).
- Automatic AWS vended metric enrichment adds AWS context to ingested metrics.
- Completes OTel support across all three observability pillars (traces, logs, metrics).
CloudWatch Database Insights
- CloudWatch Database Insights (launched December 2024) provides comprehensive database observability from fleet-level to instance-level.
- Supports cross-account and cross-region database fleet monitoring.
- Offers two service tiers:
- Standard mode – 7 days of data retention at no additional cost
- Advanced mode – Extended 15-month retention with enhanced capabilities
- Transitioning from RDS Performance Insights to Database Insights.
CloudWatch Evidently (Deprecated)
⚠️ CloudWatch Evidently reached End of Life on October 17, 2025.
Alternative: Use AWS Systems Manager AppConfig for feature flag management.
CloudWatch Events → Amazon EventBridge
- CloudWatch Events has been superseded by Amazon EventBridge.
- EventBridge is the evolution of CloudWatch Events with additional capabilities including partner events, Schema Registry, and EventBridge Pipes.
- New features are added to EventBridge only and not to CloudWatch Events.
- The underlying API and service remain the same – changes in either console appear in both.
- AWS recommends using EventBridge for all new event-driven architectures.
CloudWatch Supported Services
Refer blog post @ CloudWatch Supported Services
Accessing CloudWatch
- CloudWatch can be accessed using
- AWS CloudWatch console
- CloudWatch CLI
- AWS CLI
- CloudWatch API
- AWS SDKs
- AWS Console Mobile App (supports interactive alarm graphs, AI log summaries, and natural language log search)
CloudWatch Key Updates (2024-2026)
- Unified Data Management (Dec 2025) – CloudWatch provides unified management and analytics for operational, security, and compliance data across AWS and third-party sources.
- Organization-wide EC2 Detailed Monitoring (Mar 2026) – Enablement rules automatically enable detailed monitoring for existing and new EC2 instances matching the rule scope.
- Natural Language Query Generation (GA June 2024) – Powered by generative AI for both Logs Insights and Metrics Insights, supporting CloudWatch Logs Insights QL, OpenSearch PPL, and OpenSearch SQL.
- Integrated Analytics with OpenSearch (Dec 2024) – Supports SQL and PPL query languages for log analytics with JOIN, sub-queries, and SQL functions.
- Cross-Region Telemetry Auditing (Apr 2026) – Enable telemetry auditing for account/organization across all regions with enablement rules.
- Lambda Tiered Pricing for Logs (May 2025) – Reduces Lambda CloudWatch logging costs with tiered pricing and additional logging destinations.
AWS Certification Exam Practice Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- A company needs to monitor the read and write IOPs metrics for their AWS MySQL RDS instance and send real-time alerts to their operations team. Which AWS services can accomplish this? Choose 2 answers
- Amazon Simple Email Service (Cannot be integrated with CloudWatch directly)
- Amazon CloudWatch
- Amazon Simple Queue Service
- Amazon Route 53
- Amazon Simple Notification Service
- A customer needs to capture all client connection information from their load balancer every five minutes. The company wants to use this data for analyzing traffic patterns and troubleshooting their applications. Which of the following options meets the customer requirements?
- Enable AWS CloudTrail for the load balancer.
- Enable access logs on the load balancer. (Refer link)
- Install the Amazon CloudWatch Logs agent on the load balancer.
- Enable Amazon CloudWatch metrics on the load balancer (does not provide Client connection information)
- A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
- Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
- Setup the CloudWatch with Auto Scaling to terminate all the instances
- Setup a job which terminates all instances after 600 minutes
- It is not possible to terminate instances automatically
- A user has two EC2 instances running in two separate regions. The user is running an internal memory management tool, which captures the data and sends it to CloudWatch in US East, using a CLI with the same namespace and metric. Which of the below mentioned options is true with respect to the above statement?
- The setup will not work as CloudWatch cannot receive data across regions
- CloudWatch will receive and aggregate the data based on the namespace and metric
- CloudWatch will give an error since the data will conflict due to two sources
- CloudWatch will take the data of the server, which sends the data first
- A user is sending the data to CloudWatch using the CloudWatch API. The user is sending data 90 minutes in the future. What will CloudWatch do in this case?
- CloudWatch will accept the data
- It is not possible to send data of the future
- It is not possible to send the data manually to CloudWatch
- The user cannot send data for more than 60 minutes in the future
- A user is having data generated randomly based on a certain event. The user wants to upload that data to CloudWatch. It may happen that event may not have data generated for some period due to randomness. Which of the below mentioned options is a recommended option for this case?
- For the period when there is no data, the user should not send the data at all
- For the period when there is no data the user should send a blank value
- For the period when there is no data the user should send the value as 0 (Refer User Guide)
- The user must upload the data to CloudWatch as having no data for some period will cause an error at CloudWatch monitoring
- A user has a weighing plant. The user measures the weight of some goods every 5 minutes and sends data to AWS CloudWatch for monitoring and tracking. Which of the below mentioned parameters is mandatory for the user to include in the request list?
- Value
- Namespace (refer put-metric request)
- Metric Name
- Timezone
- A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
- The user needs to use AWS CLI or API to upload the data
- The user can use the AWS Import Export facility to import data to CloudWatch
- The user will upload data from the AWS console
- The user cannot upload data to CloudWatch since it is not an AWS service metric
- A user has launched an EC2 instance. The user is planning to setup the CloudWatch alarm. Which of the below mentioned actions is not supported by the CloudWatch alarm?
- Notify the Auto Scaling launch config to scale up
- Send an SMS using SNS
- Notify the Auto Scaling group to scale down
- Stop the EC2 instance
- A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
- The user needs to use AWS CLI or API to upload the data
- The user can use the AWS Import Export facility to import data to CloudWatch
- The user will upload data from the AWS console
- The user cannot upload data to CloudWatch since it is not an AWS service metric
- A user is trying to aggregate all the CloudWatch metric data of the last 1 week. Which of the below mentioned statistics is not available for the user as a part of data aggregation?
- Aggregate
- Sum
- Sample data
- Average
- A user has setup a CloudWatch alarm on an EC2 action when the CPU utilization is above 75%. The alarm sends a notification to SNS on the alarm state. If the user wants to simulate the alarm action how can he achieve this?
- Run activities on the CPU such that its utilization reaches above 75%
- From the AWS console change the state to ‘Alarm’
- The user can set the alarm state to ‘Alarm’ using CLI
- Run the SNS action manually
- A user is publishing custom metrics to CloudWatch. Which of the below mentioned statements will help the user understand the functionality better?
- The user can use the CloudWatch Import tool
- The user should be able to see the data in the console after around 15 minutes
- If the user is uploading the custom data, the user must supply the namespace, timezone, and metric name as part of the command
- The user can view as well as upload data using the console, CLI and APIs
- An application that you are managing has EC2 instances and DynamoDB tables deployed to several AWS Regions. In order to monitor the performance of the application globally, you would like to see two graphs 1) Avg CPU Utilization across all EC2 instances and 2) Number of Throttled Requests for all DynamoDB tables. How can you accomplish this? [PROFESSIONAL]
- Tag your resources with the application name, and select the tag name as the dimension in the CloudWatch Management console to view the respective graphs (CloudWatch metrics are regional)
- Use the CloudWatch CLI tools to pull the respective metrics from each regional endpoint. Aggregate the data offline & store it for graphing in CloudWatch. (Note: CloudWatch cross-account cross-region dashboards and log centralization are now available but this question tests the concept that metrics are regional)
- Add SNMP traps to each instance and DynamoDB table. Leverage a central monitoring server to capture data from each instance and table. Put the aggregate data into CloudWatch for graphing (Can’t add SNMP traps to DynamoDB as it is a managed service)
- Add a CloudWatch agent to each instance and attach one to each DynamoDB table. When configuring the agent set the appropriate application name & view the graphs in CloudWatch. (Can’t add agents to DynamoDB as it is a managed service)
- You have set up Individual AWS accounts for each project. You have been asked to make sure your AWS Infrastructure costs do not exceed the budget set per project for each month. Which of the following approaches can help ensure that you do not exceed the budget each month? [PROFESSIONAL]
- Consolidate your accounts so you have a single bill for all accounts and projects (Consolidation will not help limit per account)
- Set up auto scaling with CloudWatch alarms using SNS to notify you when you are running too many Instances in a given account (many instances do not directly map to cost and would not give exact cost)
- Set up CloudWatch billing alerts for all AWS resources used by each project, with a notification occurring when the amount for each resource tagged to a particular project matches the budget allocated to the project. (as each project already has a account, no need for resource tagging)
- Set up CloudWatch billing alerts for all AWS resources used by each account, with email notifications when it hits 50%. 80% and 90% of its budgeted monthly spend
- You meet once per month with your operations team to review the past month’s data. During the meeting, you realize that 3 weeks ago, your monitoring system which pings over HTTP from outside AWS recorded a large spike in latency on your 3-tier web service API. You use DynamoDB for the database layer, ELB, EBS, and EC2 for the business logic tier, and SQS, ELB, and EC2 for the presentation layer. Which of the following techniques will NOT help you figure out what happened?
- Check your CloudTrail log history around the spike’s time for any API calls that caused slowness.
- Review CloudWatch Metrics graphs to determine which component(s) slowed the system down. (Note: CloudWatch metric data is retained for 15 months. Data at 1-min resolution is available for 15 days, 5-min for 63 days, and 1-hour for 455 days. The 3-week old data would still be available at 5-min or 1-hour resolution.)
- Review your ELB access logs in S3 to see if any ELBs in your system saw the latency.
- Analyze your logs to detect bursts in traffic at that time.
- You have a high security requirement for your AWS accounts. What is the most rapid and sophisticated setup you can use to react to AWS API calls to your account?
- Subscription to AWS Config via an SNS Topic. Use a Lambda Function to perform in-flight analysis and reactivity to changes as they occur.
- Global AWS CloudTrail setup delivering to S3 with an SNS subscription to the deliver notifications, pushing into a Lambda, which inserts records into an ELK stack for analysis.
- Use a CloudWatch Rule ScheduleExpression to periodically analyze IAM credential logs. Push the deltas for events into an ELK stack and perform ad-hoc analysis there.
- Amazon EventBridge Rules (formerly CloudWatch Events Rules), which trigger based on all AWS API calls, submitting all events to an AWS Kinesis Stream for arbitrary downstream analysis. (EventBridge (evolution of CloudWatch Events) allows subscription to AWS API calls, and direction of these events into Kinesis Streams. This allows a unified, near real-time stream for all API calls. Refer link)
- To monitor API calls against our AWS account by different users and entities, we can use ____ to create a history of calls in bulk for later review, and use ____ for reacting to AWS API calls in real-time.
- AWS Config; AWS Inspector
- AWS CloudTrail; AWS Config
- AWS CloudTrail; Amazon EventBridge (formerly CloudWatch Events) (CloudTrail is a batch API call collection service, EventBridge enables real-time monitoring of calls through the Rules object interface. Refer link)
- AWS Config; AWS Lambda
- You are hired as the new head of operations for a SaaS company. Your CTO has asked you to make debugging any part of your entire operation simpler and as fast as possible. She complains that she has no idea what is going on in the complex, service-oriented architecture, because the developers just log to disk, and it’s very hard to find errors in logs on so many services. How can you best meet this requirement and satisfy your CTO? [PROFESSIONAL]
- Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the <code>PutBucket</code> event and publish events to AWS Lambda. Use the Lambda to analyze logs as soon as they come in and flag issues. (is not fast in search and introduces delay)
- Begin using CloudWatch Logs on every service. Stream all Log Groups into S3 objects. Use AWS EMR cluster jobs to perform adhoc MapReduce analysis and write new queries when needed. (is not fast in search and introduces delay)
- Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the <code>PutBucket</code> event and publish events to AWS Kinesis. Use Apache Spark on AWS EMR to perform at-scale stream processing queries on the log chunks and flag issues. (is not fast in search and introduces delay)
- Begin using CloudWatch Logs on every service. Stream all Log Groups into an Amazon OpenSearch Service Domain running Kibana and perform log analysis on a search cluster. (OpenSearch (formerly Elasticsearch) with Kibana/OpenSearch Dashboards is designed for real-time, ad-hoc log analysis and aggregation. Note: CloudWatch now also offers integrated analytics with OpenSearch Service directly.)
- Your EC2-Based Multi-tier application includes a monitoring instance that periodically makes application-level read only requests of various application components and if any of those fail more than three times 30 seconds calls CloudWatch to fire an alarm, and the alarm notifies your operations team by email and SMS of a possible application health problem. However, you also need to watch the watcher -the monitoring instance itself – and be notified if it becomes unhealthy. Which of the following is a simple way to achieve that goal? [PROFESSIONAL]
- Run another monitoring instance that pings the monitoring instance and fires a CloudWatch alarm that notifies your operations team should the primary monitoring instance become unhealthy.
- Set a CloudWatch alarm based on EC2 system and instance status checks and have the alarm notify your operations team of any detected problem with the monitoring instance.
- Set a CloudWatch alarm based on the CPU utilization of the monitoring instance and have the alarm notify your operations team if the CPU usage exceeds 50% for more than one minute: then have your monitoring application go into a CPU-bound loop should it detect any application problems.
- Have the monitoring instances post messages to an SQS queue and then dequeue those messages on another instance should the queue cease to have new messages, the second instance should first terminate the original monitoring instance start another backup monitoring instance and assume the role of the previous monitoring instance and beginning adding messages to the SQS queue.
- A company wants to reduce alarm noise in their monitoring system. They have multiple alarms for different components of a web application. Which CloudWatch feature should they use to consolidate multiple alarms into a single notification?
- CloudWatch Metrics Insights
- CloudWatch Composite Alarms (Composite alarms combine multiple alarm states using Boolean logic to reduce noise and send notifications only at an aggregated level)
- CloudWatch Anomaly Detection
- CloudWatch Dashboards
- A DevOps team wants to automatically identify root causes when CloudWatch alarms fire, correlating metrics, logs, and deployment events without manual investigation. Which CloudWatch feature should they use?
- CloudWatch Logs Insights
- CloudWatch Contributor Insights
- CloudWatch Investigations (Powered by Amazon Q Developer, Investigations uses AI to analyze multiple telemetry sources and suggest root causes and remediation steps)
- CloudWatch Application Signals
- An organization wants to send application metrics to CloudWatch using open standards without vendor-specific conversion logic. They need support for high-cardinality metrics with many labels. Which approach should they use?
- Use CloudWatch PutMetricData API with custom metrics
- Use CloudWatch Metric Streams to export to a third-party tool
- Send metrics directly using the OpenTelemetry Protocol (OTLP) to CloudWatch’s native OTel endpoint (CloudWatch supports native OTLP metrics ingestion with up to 150 labels per metric, queryable via PromQL)
- Use the CloudWatch Agent with StatsD protocol
- A company wants to ingest, transform, and route log data from multiple sources to CloudWatch without managing any infrastructure. They need to parse complex log formats and enrich the data during ingestion. Which service should they use?
- CloudWatch Logs Subscription Filters
- Amazon Kinesis Data Firehose
- CloudWatch Pipelines (CloudWatch Pipelines is a fully managed service that ingests, transforms, and routes log data with support for parsing, conditional processing, and AI-assisted processor configuration)
- CloudWatch Log Transformers only
- A team needs to monitor their application’s availability, latency, and error rate with pre-built dashboards and set up Service Level Objectives (SLOs) without building custom monitoring. Which CloudWatch capability provides this?
- CloudWatch Synthetics
- CloudWatch Container Insights
- CloudWatch Internet Monitor
- CloudWatch Application Signals (Application Signals provides automatic instrumentation, pre-built APM dashboards with golden metrics, and SLO tracking without custom setup)
Amazon CloudWatch
- CloudWatch monitors AWS resources and applications in real time.
- CloudWatch can be used to collect and track metrics, which are the variables to be measured for resources and applications.
- CloudWatch is basically a metrics repository where the metrics can be inserted and statistics retrieved based on those metrics.
- In addition to monitoring the built-in metrics that come with AWS, custom metrics can also be monitored
- CloudWatch provides system-wide visibility into resource utilization, application performance, and operational health.
- By default, CloudWatch stores the log data indefinitely, and the retention can be changed for each log group at any time.
- CloudWatch alarms can be configured
- to send notifications or
- to automatically make changes to the resources based on defined rules
- CloudWatch dashboards are customizable home pages in the CloudWatch console used to monitor the resources in a single view, even those resources that are spread across different Regions.
- CloudWatch Agent helps collect metrics and logs from EC2 instances and on-premises servers and push them to CloudWatch.
- CloudWatch now supports native OpenTelemetry (OTel) metrics, enabling metrics to be sent directly using the OpenTelemetry Protocol (OTLP) without custom conversion logic.
- CloudWatch supports PromQL (Prometheus Query Language) for querying metrics including OTel-ingested and AWS vended metrics.
- CloudWatch Investigations uses AI-powered root cause analysis to help identify anomalies, surface related signals, and suggest remediation steps.
CloudWatch Architecture

- CloudWatch collects various metrics from various resources
- These metrics, as statistics, are available to the user through Console, CLI
- CloudWatch allows the creation of alarms with defined rules
- to perform actions to auto-scaling or stop, start, or terminate instances
- to send notifications using SNS actions on your behalf
CloudWatch Concepts
Namespaces
- CloudWatch namespaces are containers for metrics.
- Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics.
- AWS namespaces all follow the convention AWS/<service>, for e.g. AWS/EC2 and AWS/ELB
- Namespace names must be fewer than 256 characters in length.
- There is no default namespace. Each data element put into CloudWatch must specify a namespace.
Metrics
- Metric is the fundamental concept in CloudWatch.
- Uniquely defined by a name, a namespace, and one or more dimensions.
- Represents a time-ordered set of data points published to CloudWatch.
- Each data point has a time stamp, and (optionally) a unit of measure.
- Data points can be either custom metrics or metrics from other services in AWS.
- Statistics can be retrieved about those data points as an ordered set of time-series data that occur within a specified time window.
- When the statistics are requested, the returned data stream is identified by namespace, metric name, dimension, and (optionally) the unit.
- Metrics exist only in the region in which they are created.
- CloudWatch stores the metric data for two weeks
- Metrics cannot be deleted, but they automatically expire after 15 months, if no new data is published to them.
- Metric retention is as follows
- Data points with a period of less than 60 seconds are available for 3 hours. These data points are high-resolution custom metrics.
- Data points with a 60 secs (1 min) period are available for 15 days
- Data points with a 300 secs (5 min) period are available for 63 days
- Data points with a 3600 secs (1 hour) period are available for 455 days (15 months)
Dimensions
- A dimension is a name/value pair that uniquely identifies a metric.
- Every metric has specific characteristics that describe it, and you can think of dimensions as categories for those characteristics.
- Dimensions help design a structure for the statistics plan.
- Dimensions are part of the unique identifier for a metric, whenever a unique name pair is added to one of the metrics, a new metric is created.
- Dimensions can be used to filter result sets that CloudWatch query returns.
- A metric can be assigned up to ten dimensions to a metric.
Time Stamps
- Each metric data point must be marked with a time stamp to identify the data point on a time series.
- Timestamp can be up to two weeks in the past and up to two hours into the future.
- If no timestamp is provided, a time stamp based on the time the data element was received is created.
- All times reflect the UTC time zone when statistics are retrieved
Resolution
- Each metric is one of the following:
- Standard resolution, with data having a one-minute granularity
- High resolution, with data at a granularity of one second
Units
- Units represent the statistic’s unit of measure e.g. count, bytes, %, etc
Statistics
- Statistics are metric data aggregations over specified periods of time
- Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period
Periods
- Period is the length of time associated with a specific statistic.
- Each statistic represents an aggregation of the metrics data collected for a specified period of time.
- Although periods are expressed in seconds, the minimum granularity for a period is one minute.
Aggregation
- CloudWatch aggregates statistics according to the period length specified in calls to GetMetricStatistics.
- Multiple data points can be published with the same or similar time stamps. CloudWatch aggregates them by period length when the statistics about those data points are requested.
- Aggregated statistics are only available when using detailed monitoring.
- Instances that use basic monitoring are not included in the aggregates
- CloudWatch does not aggregate data across regions.
Alarms
- Alarms can automatically initiate actions on behalf of the user, based on specified parameters.
- Alarm watches a single metric over a specified time period, and performs one or more actions based on the value of the metric relative to a given threshold over a number of time periods.
- Alarms invoke actions for sustained state changes only i.e. the state must have changed and been maintained for a specified number of periods.
- Action can be a
- SNS notification
- Auto Scaling policies
- EC2 action – stop or terminate EC2 instances
- Lambda function invocation (same or cross-account)
- Systems Manager OpsItems or Incident Manager incidents
- After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action associated with the alarm.
- For Auto Scaling policy notifications, the alarm continues to invoke the action for every period that the alarm remains in the new state.
- For SNS notifications, no additional actions are invoked.
- An alarm has three possible states:
- OK—The metric is within the defined threshold
- ALARM—The metric is outside of the defined threshold
- INSUFFICIENT_DATA—Alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
- Alarms exist only in the region in which they are created.
- Alarm actions must reside in the same region as the alarm
- Alarm history is available for the last 14 days.
- Alarm can be tested by setting it to any state using the
SetAlarmStateAPI (mon-set-alarm-statecommand). This temporary state change lasts only until the next alarm comparison occurs. - Alarms can be disabled and enabled using the
DisableAlarmActionsandEnableAlarmActionsAPIs (mon-disable-alarm-actionsandmon-enable-alarm-actionscommands). - CloudWatch now allows alarming on data up to 7 days old (increased from previous 24-hour limit), with a period of at least 3,600 seconds (1 hour).
Composite Alarms
- Composite alarms determine their state by monitoring the states of other alarms.
- Rules are defined to combine the status of monitored alarms using Boolean logic (AND, OR, NOT).
- Composite alarms help reduce alarm noise by taking actions only at an aggregated level.
- A composite alarm’s rule expression can include as many as 100 underlying alarms.
- Any single alarm can be included in the rule expressions of as many as 150 composite alarms.
- Composite alarms can:
- Send Amazon SNS notifications when they change state
- Create Systems Manager OpsItems or incidents when they go into ALARM state
- Start CloudWatch Investigations for root cause analysis
- Composite alarms cannot perform EC2 actions or Auto Scaling actions.
Anomaly Detection
- CloudWatch anomaly detection analyzes past metric data and creates a model of expected values.
- Alarms can be created based on anomaly detection to alert when metric values are outside the expected band.
- The model adjusts for time-of-day patterns, day-of-week patterns, and trends over time.
- Useful for metrics without a fixed threshold, such as CPU utilization with seasonal patterns.
Regions
- CloudWatch does not aggregate data across regions. Therefore, metrics are completely separate between regions.
CloudWatch Metrics Insights
- CloudWatch Metrics Insights is a high-performance SQL-based query engine to query metrics at scale.
- Enables identification of trends and patterns across all CloudWatch metrics in real time.
- Provides access to up to two weeks of historical data for trend analysis.
- Supports functions including AVG, COUNT, MAX, MIN, and SUM.
- Can be combined with CloudWatch Alarms to set up dynamic alarms that consistently monitor fast-moving environments.
- Supports natural language query generation powered by generative AI – users can ask questions in plain English to generate metric queries.
Custom Metrics
- CloudWatch allows publishing custom metrics with
put-metric-dataCLI command (or its Query API equivalentPutMetricData) - CloudWatch creates a new metric if
put-metric-datais called with a new metric name, else it associates the data with the specified existing metric put-metric-datacommand can only publish one data point per call- CloudWatch stores data about a metric as a series of data points and each data point has an associated time stamp
- Creating a new metric using the put-metric-data command, can take up to two minutes before statistics can be retrieved on the new metric using the get-metric-statistics command and can take up to fifteen minutes before the new metric appears in the list of metrics retrieved using the list-metrics command.
- CloudWatch allows publishing
- Single data point
- Data points can be published with time stamps as granular as one-thousandth of a second, CloudWatch aggregates the data to a minimum granularity of one minute
- CloudWatch records the average (sum of all items divided by number of items) of the values received for every 1-minute period, as well as number of samples, maximum value, and minimum value for the same time period
- CloudWatch uses one-minute boundaries when aggregating data points
- Aggregated set of data points called a statistics set
- Data can also be aggregated before being published to CloudWatch
- Aggregating data minimizes the number of calls reducing it to a single call per minute with the statistic set of data
- Statistics include Sum, Average, Minimum, Maximum, SampleCount
- Single data point
- If the application produces data that is more sporadic and have periods that have no associated data, either a the value zero (
0) or no value at all can be published - However, it can be helpful to publish zero instead of no value
- to monitor the health of your application for e.g. alarm can be configured to notify if no metrics published every 5 minutes
- to track the total number of data points
- to have statistics such as minimum and average to include data points with the value 0.
CloudWatch Dashboards
- CloudWatch dashboards are customizable home pages in the CloudWatch console used to monitor the resources in a single view, even those resources that are spread across different Regions.
- Dashboards can be used to create customized views of the metrics and alarms for the AWS resources.
- Dashboards can help to create
- A single view for selected metrics and alarms to help assess the health of the resources and applications across one or more Regions.
- An operational playbook that provides guidance for team members during operational events about how to respond to specific incidents.
- A common view of critical resource and application measurements that can be shared by team members for faster communication flow during operational events.
CloudWatch Cross-Account Observability
- CloudWatch cross-account observability helps monitor and troubleshoot applications that span multiple accounts within a Region.
- Uses Observability Access Manager (OAM) to create and manage links between source and monitoring accounts.
- Cross-account observability includes monitoring and source accounts
- A monitoring account is a central AWS account that can view and interact with observability data generated from source accounts.
- A source account is an individual AWS account that generates observability data for the resources that reside in it.
- Source accounts share their observability data with the monitoring account which can include the following types of telemetry:
- Metrics in CloudWatch
- Log groups in CloudWatch Logs
- Traces in AWS X-Ray
- Application Signals services and SLOs
- Available in all commercial AWS Regions and AWS GovCloud (US) Regions at no extra cost for logs and metrics.
- Cross-Account and Cross-Region Log Centralization (2025) allows copying log data from multiple accounts and regions into a single destination account for consolidated analysis.
CloudWatch Agent
- CloudWatch Agent helps collect metrics and logs from EC2 instances and on-premises servers and push them to CloudWatch.
- Logs collected by the unified agent are processed and stored in CloudWatch Logs.
- Supports collecting metrics and traces via OpenTelemetry Protocol (OTLP).
- Can collect from AWS, on-premises, or multicloud environments.
CloudWatch Logs
Refer blog post @ CloudWatch Logs
CloudWatch Application Signals
- CloudWatch Application Signals (GA June 2024) provides application performance monitoring (APM) capabilities.
- Automatically instruments applications on AWS to track application performance against business objectives.
- Provides pre-built, standardized dashboards showing key metrics: volume, availability, latency, faults, and errors.
- Supports Service Level Objectives (SLOs) to define and track application performance targets.
- Application map automatically discovers and visualizes application topology without requiring instrumentation.
- Supports runtime metrics for Java, Python, and .NET applications.
- Supports serverless applications on AWS Lambda.
- Works with cross-account observability for centralized monitoring.
CloudWatch Internet Monitor
- Internet Monitor provides near-continuous internet measurements for internet traffic tailored to your workload footprint on AWS.
- Provides availability and performance metrics (performance score, availability score, round-trip time).
- Alerts when internet issues affect application users.
- Helps troubleshoot and mitigate network problems before they impact customers.
- Creates aggregated CloudWatch metrics for traffic to your application, by AWS Region and edge location.
CloudWatch Investigations
- CloudWatch Investigations (GA June 2025) is an AI-powered root cause analysis feature.
- Powered by Amazon Q Developer, it analyzes metrics, logs, traces, and deployment events.
- Helps find root cause without manually querying multiple sources of telemetry.
- Identifies anomalies, surfaces related signals, suggests remediation steps.
- Can be triggered automatically from a CloudWatch alarm or created manually.
- Supports integration with Slack for DevOps troubleshooting workflows.
- Significantly reduces Mean Time to Resolution (MTTR).
CloudWatch Pipelines
- CloudWatch Pipelines is a fully managed service that ingests, transforms, and routes log data to CloudWatch without managing infrastructure.
- Supports log data from AWS services, third-party applications, and custom sources.
- Provides processors for parsing, transforming, and enriching log data.
- Supports conditional processing and drop event processing for fine-grained control.
- Includes compliance and governance capabilities for data integrity and access control.
- Supports AI-assisted configuration of processors via natural language descriptions.
- Supports OCSF (Open Cybersecurity Schema Framework) and OpenTelemetry formats.
CloudWatch OpenTelemetry Support
- CloudWatch now supports native OpenTelemetry (OTel) metrics ingestion (preview, April 2026).
- Enables sending metrics directly using OTLP without custom conversion logic or additional tooling.
- Supports high-cardinality metrics with up to 150 labels per metric.
- Supports metric types: gauge, sum, histogram, and exponential histogram.
- Metrics can be queried using PromQL (Prometheus Query Language).
- Automatic AWS vended metric enrichment adds AWS context to ingested metrics.
- Completes OTel support across all three observability pillars (traces, logs, metrics).
CloudWatch Database Insights
- CloudWatch Database Insights (launched December 2024) provides comprehensive database observability from fleet-level to instance-level.
- Supports cross-account and cross-region database fleet monitoring.
- Offers two service tiers:
- Standard mode – 7 days of data retention at no additional cost
- Advanced mode – Extended 15-month retention with enhanced capabilities
- Transitioning from RDS Performance Insights to Database Insights.
CloudWatch Evidently (Deprecated)
⚠️ CloudWatch Evidently reached End of Life on October 17, 2025.
Alternative: Use AWS Systems Manager AppConfig for feature flag management.
CloudWatch Events → Amazon EventBridge
- CloudWatch Events has been superseded by Amazon EventBridge.
- EventBridge is the evolution of CloudWatch Events with additional capabilities including partner events, Schema Registry, and EventBridge Pipes.
- New features are added to EventBridge only and not to CloudWatch Events.
- The underlying API and service remain the same – changes in either console appear in both.
- AWS recommends using EventBridge for all new event-driven architectures.
CloudWatch Supported Services
Refer blog post @ CloudWatch Supported Services
Accessing CloudWatch
- CloudWatch can be accessed using
- AWS CloudWatch console
- CloudWatch CLI
- AWS CLI
- CloudWatch API
- AWS SDKs
- AWS Console Mobile App (supports interactive alarm graphs, AI log summaries, and natural language log search)
CloudWatch Key Updates (2024-2026)
- Unified Data Management (Dec 2025) – CloudWatch provides unified management and analytics for operational, security, and compliance data across AWS and third-party sources.
- Organization-wide EC2 Detailed Monitoring (Mar 2026) – Enablement rules automatically enable detailed monitoring for existing and new EC2 instances matching the rule scope.
- Natural Language Query Generation (GA June 2024) – Powered by generative AI for both Logs Insights and Metrics Insights, supporting CloudWatch Logs Insights QL, OpenSearch PPL, and OpenSearch SQL.
- Integrated Analytics with OpenSearch (Dec 2024) – Supports SQL and PPL query languages for log analytics with JOIN, sub-queries, and SQL functions.
- Cross-Region Telemetry Auditing (Apr 2026) – Enable telemetry auditing for account/organization across all regions with enablement rules.
- Lambda Tiered Pricing for Logs (May 2025) – Reduces Lambda CloudWatch logging costs with tiered pricing and additional logging destinations.
AWS Certification Exam Practice Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- A company needs to monitor the read and write IOPs metrics for their AWS MySQL RDS instance and send real-time alerts to their operations team. Which AWS services can accomplish this? Choose 2 answers
- Amazon Simple Email Service (Cannot be integrated with CloudWatch directly)
- Amazon CloudWatch
- Amazon Simple Queue Service
- Amazon Route 53
- Amazon Simple Notification Service
- A customer needs to capture all client connection information from their load balancer every five minutes. The company wants to use this data for analyzing traffic patterns and troubleshooting their applications. Which of the following options meets the customer requirements?
- Enable AWS CloudTrail for the load balancer.
- Enable access logs on the load balancer. (Refer link)
- Install the Amazon CloudWatch Logs agent on the load balancer.
- Enable Amazon CloudWatch metrics on the load balancer (does not provide Client connection information)
- A user is running a batch process on EBS backed EC2 instances. The batch process starts a few instances to process Hadoop Map reduce jobs, which can run between 50 – 600 minutes or sometimes for more time. The user wants to configure that the instance gets terminated only when the process is completed. How can the user configure this with CloudWatch?
- Setup the CloudWatch action to terminate the instance when the CPU utilization is less than 5%
- Setup the CloudWatch with Auto Scaling to terminate all the instances
- Setup a job which terminates all instances after 600 minutes
- It is not possible to terminate instances automatically
- A user has two EC2 instances running in two separate regions. The user is running an internal memory management tool, which captures the data and sends it to CloudWatch in US East, using a CLI with the same namespace and metric. Which of the below mentioned options is true with respect to the above statement?
- The setup will not work as CloudWatch cannot receive data across regions
- CloudWatch will receive and aggregate the data based on the namespace and metric
- CloudWatch will give an error since the data will conflict due to two sources
- CloudWatch will take the data of the server, which sends the data first
- A user is sending the data to CloudWatch using the CloudWatch API. The user is sending data 90 minutes in the future. What will CloudWatch do in this case?
- CloudWatch will accept the data
- It is not possible to send data of the future
- It is not possible to send the data manually to CloudWatch
- The user cannot send data for more than 60 minutes in the future
- A user is having data generated randomly based on a certain event. The user wants to upload that data to CloudWatch. It may happen that event may not have data generated for some period due to randomness. Which of the below mentioned options is a recommended option for this case?
- For the period when there is no data, the user should not send the data at all
- For the period when there is no data the user should send a blank value
- For the period when there is no data the user should send the value as 0 (Refer User Guide)
- The user must upload the data to CloudWatch as having no data for some period will cause an error at CloudWatch monitoring
- A user has a weighing plant. The user measures the weight of some goods every 5 minutes and sends data to AWS CloudWatch for monitoring and tracking. Which of the below mentioned parameters is mandatory for the user to include in the request list?
- Value
- Namespace (refer put-metric request)
- Metric Name
- Timezone
- A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
- The user needs to use AWS CLI or API to upload the data
- The user can use the AWS Import Export facility to import data to CloudWatch
- The user will upload data from the AWS console
- The user cannot upload data to CloudWatch since it is not an AWS service metric
- A user has launched an EC2 instance. The user is planning to setup the CloudWatch alarm. Which of the below mentioned actions is not supported by the CloudWatch alarm?
- Notify the Auto Scaling launch config to scale up
- Send an SMS using SNS
- Notify the Auto Scaling group to scale down
- Stop the EC2 instance
- A user has a refrigerator plant. The user is measuring the temperature of the plant every 15 minutes. If the user wants to send the data to CloudWatch to view the data visually, which of the below mentioned statements is true with respect to the information given above?
- The user needs to use AWS CLI or API to upload the data
- The user can use the AWS Import Export facility to import data to CloudWatch
- The user will upload data from the AWS console
- The user cannot upload data to CloudWatch since it is not an AWS service metric
- A user is trying to aggregate all the CloudWatch metric data of the last 1 week. Which of the below mentioned statistics is not available for the user as a part of data aggregation?
- Aggregate
- Sum
- Sample data
- Average
- A user has setup a CloudWatch alarm on an EC2 action when the CPU utilization is above 75%. The alarm sends a notification to SNS on the alarm state. If the user wants to simulate the alarm action how can he achieve this?
- Run activities on the CPU such that its utilization reaches above 75%
- From the AWS console change the state to ‘Alarm’
- The user can set the alarm state to ‘Alarm’ using CLI
- Run the SNS action manually
- A user is publishing custom metrics to CloudWatch. Which of the below mentioned statements will help the user understand the functionality better?
- The user can use the CloudWatch Import tool
- The user should be able to see the data in the console after around 15 minutes
- If the user is uploading the custom data, the user must supply the namespace, timezone, and metric name as part of the command
- The user can view as well as upload data using the console, CLI and APIs
- An application that you are managing has EC2 instances and DynamoDB tables deployed to several AWS Regions. In order to monitor the performance of the application globally, you would like to see two graphs 1) Avg CPU Utilization across all EC2 instances and 2) Number of Throttled Requests for all DynamoDB tables. How can you accomplish this? [PROFESSIONAL]
- Tag your resources with the application name, and select the tag name as the dimension in the CloudWatch Management console to view the respective graphs (CloudWatch metrics are regional)
- Use the CloudWatch CLI tools to pull the respective metrics from each regional endpoint. Aggregate the data offline & store it for graphing in CloudWatch. (Note: CloudWatch cross-account cross-region dashboards and log centralization are now available but this question tests the concept that metrics are regional)
- Add SNMP traps to each instance and DynamoDB table. Leverage a central monitoring server to capture data from each instance and table. Put the aggregate data into CloudWatch for graphing (Can’t add SNMP traps to DynamoDB as it is a managed service)
- Add a CloudWatch agent to each instance and attach one to each DynamoDB table. When configuring the agent set the appropriate application name & view the graphs in CloudWatch. (Can’t add agents to DynamoDB as it is a managed service)
- You have set up Individual AWS accounts for each project. You have been asked to make sure your AWS Infrastructure costs do not exceed the budget set per project for each month. Which of the following approaches can help ensure that you do not exceed the budget each month? [PROFESSIONAL]
- Consolidate your accounts so you have a single bill for all accounts and projects (Consolidation will not help limit per account)
- Set up auto scaling with CloudWatch alarms using SNS to notify you when you are running too many Instances in a given account (many instances do not directly map to cost and would not give exact cost)
- Set up CloudWatch billing alerts for all AWS resources used by each project, with a notification occurring when the amount for each resource tagged to a particular project matches the budget allocated to the project. (as each project already has a account, no need for resource tagging)
- Set up CloudWatch billing alerts for all AWS resources used by each account, with email notifications when it hits 50%. 80% and 90% of its budgeted monthly spend
- You meet once per month with your operations team to review the past month’s data. During the meeting, you realize that 3 weeks ago, your monitoring system which pings over HTTP from outside AWS recorded a large spike in latency on your 3-tier web service API. You use DynamoDB for the database layer, ELB, EBS, and EC2 for the business logic tier, and SQS, ELB, and EC2 for the presentation layer. Which of the following techniques will NOT help you figure out what happened?
- Check your CloudTrail log history around the spike’s time for any API calls that caused slowness.
- Review CloudWatch Metrics graphs to determine which component(s) slowed the system down. (Note: CloudWatch metric data is retained for 15 months. Data at 1-min resolution is available for 15 days, 5-min for 63 days, and 1-hour for 455 days. The 3-week old data would still be available at 5-min or 1-hour resolution.)
- Review your ELB access logs in S3 to see if any ELBs in your system saw the latency.
- Analyze your logs to detect bursts in traffic at that time.
- You have a high security requirement for your AWS accounts. What is the most rapid and sophisticated setup you can use to react to AWS API calls to your account?
- Subscription to AWS Config via an SNS Topic. Use a Lambda Function to perform in-flight analysis and reactivity to changes as they occur.
- Global AWS CloudTrail setup delivering to S3 with an SNS subscription to the deliver notifications, pushing into a Lambda, which inserts records into an ELK stack for analysis.
- Use a CloudWatch Rule ScheduleExpression to periodically analyze IAM credential logs. Push the deltas for events into an ELK stack and perform ad-hoc analysis there.
- Amazon EventBridge Rules (formerly CloudWatch Events Rules), which trigger based on all AWS API calls, submitting all events to an AWS Kinesis Stream for arbitrary downstream analysis. (EventBridge (evolution of CloudWatch Events) allows subscription to AWS API calls, and direction of these events into Kinesis Streams. This allows a unified, near real-time stream for all API calls. Refer link)
- To monitor API calls against our AWS account by different users and entities, we can use ____ to create a history of calls in bulk for later review, and use ____ for reacting to AWS API calls in real-time.
- AWS Config; AWS Inspector
- AWS CloudTrail; AWS Config
- AWS CloudTrail; Amazon EventBridge (formerly CloudWatch Events) (CloudTrail is a batch API call collection service, EventBridge enables real-time monitoring of calls through the Rules object interface. Refer link)
- AWS Config; AWS Lambda
- You are hired as the new head of operations for a SaaS company. Your CTO has asked you to make debugging any part of your entire operation simpler and as fast as possible. She complains that she has no idea what is going on in the complex, service-oriented architecture, because the developers just log to disk, and it’s very hard to find errors in logs on so many services. How can you best meet this requirement and satisfy your CTO? [PROFESSIONAL]
- Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the <code>PutBucket</code> event and publish events to AWS Lambda. Use the Lambda to analyze logs as soon as they come in and flag issues. (is not fast in search and introduces delay)
- Begin using CloudWatch Logs on every service. Stream all Log Groups into S3 objects. Use AWS EMR cluster jobs to perform adhoc MapReduce analysis and write new queries when needed. (is not fast in search and introduces delay)
- Copy all log files into AWS S3 using a cron job on each instance. Use an S3 Notification Configuration on the <code>PutBucket</code> event and publish events to AWS Kinesis. Use Apache Spark on AWS EMR to perform at-scale stream processing queries on the log chunks and flag issues. (is not fast in search and introduces delay)
- Begin using CloudWatch Logs on every service. Stream all Log Groups into an Amazon OpenSearch Service Domain running Kibana and perform log analysis on a search cluster. (OpenSearch (formerly Elasticsearch) with Kibana/OpenSearch Dashboards is designed for real-time, ad-hoc log analysis and aggregation. Note: CloudWatch now also offers integrated analytics with OpenSearch Service directly.)
- Your EC2-Based Multi-tier application includes a monitoring instance that periodically makes application-level read only requests of various application components and if any of those fail more than three times 30 seconds calls CloudWatch to fire an alarm, and the alarm notifies your operations team by email and SMS of a possible application health problem. However, you also need to watch the watcher -the monitoring instance itself – and be notified if it becomes unhealthy. Which of the following is a simple way to achieve that goal? [PROFESSIONAL]
- Run another monitoring instance that pings the monitoring instance and fires a CloudWatch alarm that notifies your operations team should the primary monitoring instance become unhealthy.
- Set a CloudWatch alarm based on EC2 system and instance status checks and have the alarm notify your operations team of any detected problem with the monitoring instance.
- Set a CloudWatch alarm based on the CPU utilization of the monitoring instance and have the alarm notify your operations team if the CPU usage exceeds 50% for more than one minute: then have your monitoring application go into a CPU-bound loop should it detect any application problems.
- Have the monitoring instances post messages to an SQS queue and then dequeue those messages on another instance should the queue cease to have new messages, the second instance should first terminate the original monitoring instance start another backup monitoring instance and assume the role of the previous monitoring instance and beginning adding messages to the SQS queue.
- A company wants to reduce alarm noise in their monitoring system. They have multiple alarms for different components of a web application. Which CloudWatch feature should they use to consolidate multiple alarms into a single notification?
- CloudWatch Metrics Insights
- CloudWatch Composite Alarms (Composite alarms combine multiple alarm states using Boolean logic to reduce noise and send notifications only at an aggregated level)
- CloudWatch Anomaly Detection
- CloudWatch Dashboards
- A DevOps team wants to automatically identify root causes when CloudWatch alarms fire, correlating metrics, logs, and deployment events without manual investigation. Which CloudWatch feature should they use?
- CloudWatch Logs Insights
- CloudWatch Contributor Insights
- CloudWatch Investigations (Powered by Amazon Q Developer, Investigations uses AI to analyze multiple telemetry sources and suggest root causes and remediation steps)
- CloudWatch Application Signals
- An organization wants to send application metrics to CloudWatch using open standards without vendor-specific conversion logic. They need support for high-cardinality metrics with many labels. Which approach should they use?
- Use CloudWatch PutMetricData API with custom metrics
- Use CloudWatch Metric Streams to export to a third-party tool
- Send metrics directly using the OpenTelemetry Protocol (OTLP) to CloudWatch’s native OTel endpoint (CloudWatch supports native OTLP metrics ingestion with up to 150 labels per metric, queryable via PromQL)
- Use the CloudWatch Agent with StatsD protocol
- A company wants to ingest, transform, and route log data from multiple sources to CloudWatch without managing any infrastructure. They need to parse complex log formats and enrich the data during ingestion. Which service should they use?
- CloudWatch Logs Subscription Filters
- Amazon Kinesis Data Firehose
- CloudWatch Pipelines (CloudWatch Pipelines is a fully managed service that ingests, transforms, and routes log data with support for parsing, conditional processing, and AI-assisted processor configuration)
- CloudWatch Log Transformers only
- A team needs to monitor their application’s availability, latency, and error rate with pre-built dashboards and set up Service Level Objectives (SLOs) without building custom monitoring. Which CloudWatch capability provides this?
- CloudWatch Synthetics
- CloudWatch Container Insights
- CloudWatch Internet Monitor
- CloudWatch Application Signals (Application Signals provides automatic instrumentation, pre-built APM dashboards with golden metrics, and SLO tracking without custom setup)