AWS RDS Monitoring & Notification

AWS RDS Monitoring & Notification

  • RDS integrates with CloudWatch and provides metrics for monitoring
  • CloudWatch alarms can be created over a single metric that sends an SNS message when the alarm changes state
  • RDS also provides SNS notification whenever any RDS event occurs
  • RDS events are also delivered natively to Amazon EventBridge, enabling advanced event-driven automation and routing to multiple targets beyond SNS.
  • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
  • CloudWatch Database Insights is the successor to Performance Insights, providing comprehensive database observability with fleet-wide monitoring, on-demand analysis, and advanced diagnostics.
  • RDS Recommendations provides automated recommendations for database resources.
  • Amazon DevOps Guru for RDS uses machine learning to detect anomalous database behaviors and provide proactive insights.
  • AWS Compute Optimizer for RDS provides rightsizing recommendations for RDS DB instances.

RDS CloudWatch Monitoring

  • RDS DB instance can be monitored using CloudWatch, which collects and processes raw data from RDS into readable, near real-time metrics.
  • Statistics are recorded so that you can access historical information and gain a better perspective on how the service is performing.
  • By default, RDS metric data is automatically sent to CloudWatch in 1-minute periods
  • CloudWatch RDS Metrics
    • BinLogDiskUsage – Amount of disk space occupied by binary logs on the master. Applies to MySQL read replicas.
    • CPUUtilization – Percentage of CPU utilization.
    • CPUCreditBalance – Number of CPU credits available (for burstable instance types like db.t3, db.t4g).
    • CPUCreditUsage – Number of CPU credits consumed (for burstable instance types).
    • DatabaseConnections – Number of database connections in use.
    • DiskQueueDepth – The number of outstanding IOs (read/write requests) waiting to access the disk.
    • EBSIOBalance% – Percentage of I/O credits remaining in the burst bucket (for instances with burst I/O capability).
    • EBSByteBalance% – Percentage of throughput credits remaining in the burst bucket.
    • FreeableMemory – Amount of available random access memory.
    • FreeStorageSpace – Amount of available storage space.
    • ReplicaLag – Amount of time a Read Replica DB instance lags behind the source DB instance.
    • SwapUsage – Amount of swap space used on the DB instance.
    • ReadIOPS – Average number of disk I/O operations per second.
    • WriteIOPS – Average number of disk I/O operations per second.
    • ReadLatency – Average amount of time taken per disk I/O operation.
    • WriteLatency – Average amount of time taken per disk I/O operation.
    • ReadThroughput – Average number of bytes read from disk per second.
    • WriteThroughput – Average number of bytes written to disk per second.
    • NetworkReceiveThroughput – Incoming (Receive) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.
    • NetworkTransmitThroughput – Outgoing (Transmit) network traffic on the DB instance, including both customer database traffic and Amazon RDS traffic used for monitoring and replication.

RDS Enhanced Monitoring

  • RDS provides metrics in real-time for the operating system (OS) that the DB instance runs on.
  • Enhanced Monitoring uses an agent on the instance to collect OS-level metrics with granularity as fine as 1 second (options: 1, 5, 10, 15, 30, or 60 seconds).
  • By default, Enhanced Monitoring metrics are stored for 30 days in the CloudWatch Logs, which are different from typical CloudWatch metrics.
  • Enhanced Monitoring metrics can be consumed from CloudWatch Logs and imported into CloudWatch as custom metrics for alarming and dashboarding.
  • Enhanced Monitoring is disabled by default; it can be enabled when creating or modifying a DB instance.
  • Enhanced Monitoring requires an IAM role to publish metrics to CloudWatch Logs.

CloudWatch vs Enhanced Monitoring Metrics

  • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
  • Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
  • There might be differences between the measurements because the hypervisor layer performs a small amount of work. The differences can be greater if the DB instances use smaller instance classes because then there are likely more virtual machines (VMs) that are managed by the hypervisor layer on a single physical instance.

RDS Performance Insights

⚠️ End-of-Life Notice: AWS has announced Performance Insights will reach End of Life on July 31, 2026. After this date, the Performance Insights console experience, flexible retention periods (1-24 months), and their associated pricing will no longer be available. The Performance Insights API will continue to exist with no pricing changes.

Migration: Users should transition to CloudWatch Database Insights. If you don’t upgrade, DB instances using Performance Insights will default to the Standard mode of Database Insights.

  • Performance Insights is a database performance tuning and monitoring feature that helps check the database’s performance and helps analyze any issues that affect it.
  • Database load is measured using a metric called Average Active Sessions or AAS which is calculated by sampling memory to determine the state of each active database connection.
  • AAS is the total number of sessions divided by the total number of samples for a specific time period.
  • Performance Insights help visualize the database load and filter the load by waits, SQL statements, hosts, or users.
  • Supported on Amazon Aurora (MySQL and PostgreSQL), RDS for MySQL, RDS for PostgreSQL, RDS for Oracle, RDS for SQL Server, and RDS for MariaDB.

CloudWatch Database Insights

  • CloudWatch Database Insights is the next-generation database monitoring service that replaces and extends Performance Insights capabilities.
  • Provides comprehensive database observability for Amazon Aurora and Amazon RDS databases at scale.
  • Database Insights has two modes:
    • Standard Mode (default) – Analyze top contributors to DB load by dimension, query/graph/set alarms on metrics with up to 7 days retention, and define fine-grained access control policies.
    • Advanced Mode – Adds fleet-wide monitoring dashboards, SQL lock analysis (15 months retention), execution plan analysis, per-query statistics, slow SQL query analysis, on-demand performance analysis with ML-powered insights, viewing RDS events in CloudWatch, and cross-account cross-region monitoring.
  • Advanced mode retains 15 months of all metrics collected by Database Insights automatically.
  • On-demand analysis uses machine learning to compare a selected time period against normal baseline performance, identify anomalies, and provide specific remediation advice.
  • Fleet Health Dashboard enables monitoring databases simultaneously across hundreds of instances.
  • Supports cross-account and cross-region monitoring for centralized observability.
  • Integrates with CloudWatch Application Signals to view calling services.

RDS CloudTrail Logs

  • CloudTrail provides a record of actions taken by a user, role, or an AWS service in RDS.
  • CloudTrail captures all API calls for RDS as events, including calls from the console and from code calls to RDS API operations.
  • CloudTrail can help determine the request that was made to RDS, the IP address from which the request was made, who made the request, when it was made, and additional details.

RDS Database Activity Streams

  • Database Activity Streams provide a near real-time stream of database activity for monitoring and auditing purposes.
  • Activity data is collected and transmitted to Amazon Kinesis Data Streams.
  • From Kinesis, you can configure services such as Amazon Data Firehose and AWS Lambda to consume the stream and store the data.
  • Provides a protection mechanism for compliance and auditing, independent of the database itself (DBA cannot tamper with the audit logs).
  • Supports two modes:
    • Asynchronous mode – prioritizes database performance; activity stream events may be lost if the Kinesis stream becomes unavailable.
    • Synchronous mode – prioritizes accuracy of activity stream; database session may block until the event is written to the stream.
  • Uses AWS KMS for encryption of the activity stream.
  • Supported for RDS for Oracle, RDS for SQL Server (Multi-AZ), and Amazon Aurora.
  • Integrates with third-party database activity monitoring (DAM) tools for compliance.

RDS Recommendations

  • RDS provides automated recommendations for database resources.
  • The recommendations provide best practice guidance by analyzing DB instance configuration, usage, and performance data.
  • Recommendations cover areas such as:
    • DB instance class rightsizing
    • DB parameter group settings
    • Security best practices
    • Engine version upgrades
    • Backup and recovery configuration
    • Multi-AZ deployment enablement
  • Recommendations can be automated with notifications using EventBridge and Lambda.

Amazon DevOps Guru for RDS

  • Amazon DevOps Guru for RDS is an ML-powered capability that detects, diagnoses, and remediates database performance issues.
  • Uses data collected by Performance Insights to detect anomalous behaviors.
  • Provides both Reactive Insights (when issues are occurring) and Proactive Insights (before issues impact performance).
  • Proactive Insights detect potential issues that can lead to degraded database health in the future, such as:
    • Connections approaching configured limits
    • Memory nearing exhaustion
    • Idle transactions consuming resources
  • Provides detailed analysis of wait events and recommendations for remediation.
  • Requires Performance Insights to be enabled with a paid tier retention period.
  • Supported for Amazon Aurora (PostgreSQL and MySQL) and RDS for PostgreSQL.

AWS Compute Optimizer for RDS

  • AWS Compute Optimizer analyzes RDS database instance utilization metrics and provides rightsizing recommendations.
  • Helps identify idle RDS instances and choose the optimal DB instance class and provisioned IOPS settings.
  • Recommendations help reduce costs for over-provisioned workloads and increase performance for under-provisioned workloads.
  • Supports Amazon Aurora, RDS for MySQL, RDS for PostgreSQL, RDS for Oracle, RDS for SQL Server, and RDS for MariaDB.
  • Evaluates Graviton-based instance classes for improved price-performance ratios.
  • Analyzes the last 14 days of CloudWatch metrics to generate recommendations.

RDS Event Notification

  • RDS uses the SNS to provide notification when an RDS event occurs
  • RDS groups the events into categories, which can be subscribed so that a notification is sent when an event in that category occurs.
  • Event category for a DB instance, DB cluster, DB snapshot, DB cluster snapshot, DB security group, or for a DB parameter group can be subscribed
  • Event notifications are sent to the email addresses provided during subscription creation
  • Subscriptions can be easily turned off without deleting a subscription by setting the Enabled radio button to No in the RDS console or by setting the Enabled parameter to false using the CLI or RDS API.

RDS Events with Amazon EventBridge

  • Amazon RDS sends service events directly to Amazon EventBridge in near real time.
  • EventBridge provides more flexible event routing compared to traditional SNS-based event subscriptions.
  • EventBridge rules can be used to react to RDS events and trigger automated workflows, such as:
    • Lambda functions for custom notification formatting
    • Step Functions for complex remediation workflows
    • SNS topics for multi-channel alerting
    • SQS queues for event buffering and processing
  • Supports event patterns for filtering specific RDS event types (e.g., failovers, reboots, maintenance).
  • Can be combined with RDS native event notifications for comprehensive event management.

RDS Trusted Advisor

  • Trusted Advisor inspects the AWS environment and then makes recommendations when opportunities exist to save money, improve system availability and performance, or help close security gaps.
  • Trusted Advisor now evaluates across six categories: cost optimization, performance, resilience, security, operational excellence, and service limits.
  • Trusted Advisor has the following RDS-related checks:
    • RDS Idle DB Instances
    • RDS Security Group Access Risk
    • RDS Backups
    • RDS Multi-AZ
    • RDS Idle DB Connections
    • RDS Overutilized DB Instances
    • RDS Continuous Backup Not Enabled

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You run a web application with the following components Elastic Load Balancer (ELB), 3 Web/Application servers, 1 MySQL RDS database with read replicas, and Amazon Simple Storage Service (Amazon S3) for static content. Average response time for users is increasing slowly. What three CloudWatch RDS metrics will allow you to identify if the database is the bottleneck? Choose 3 answers
    1. The number of outstanding IOs waiting to access the disk
    2. The amount of write latency
    3. The amount of disk space occupied by binary logs on the master.
    4. The amount of time a Read Replica DB Instance lags behind the source DB Instance
    5. The average number of disk I/O operations per second.
  2. Typically, you want your application to check whether a request generated an error before you spend any time processing results. The easiest way to find out if an error occurred is to look for an __________ node in the response from the Amazon RDS API.
    1. Incorrect
    2. Error
    3. FALSE
  3. In the Amazon CloudWatch, which metric should I be checking to ensure that your DB Instance has enough free storage space?
    1. FreeStorage
    2. FreeStorageSpace
    3. FreeStorageVolume
    4. FreeDBStorageSpace
  4. A user is receiving a notification from the RDS DB whenever there is a change in the DB security group. The user does not want to receive these notifications for only a month. Thus, he does not want to delete the notification. How can the user configure this?
    1. Change the Disable button for notification to “Yes” in the RDS console
    2. Set the send mail flag to false in the DB event notification console
    3. The only option is to delete the notification from the console
    4. Change the Enable button for notification to “No” in the RDS console
  5. A sys admin is planning to subscribe to the RDS event notifications. For which of the below mentioned source categories the subscription cannot be configured?
    1. DB security group
    2. DB snapshot
    3. DB options group
    4. DB parameter group
  6. A user is planning to setup notifications on the RDS DB for a snapshot. Which of the below mentioned event categories is not supported by RDS for this snapshot source type?
    1. Backup (Refer link)
    2. Creation
    3. Deletion
    4. Restoration
  7. A system admin is planning to setup event notifications on RDS. Which of the below mentioned services will help the admin setup notifications?
    1. AWS SES
    2. AWS Cloudtrail
    3. AWS CloudWatch
    4. AWS SNS
  8. A user has setup an RDS DB with Oracle. The user wants to get notifications when someone modifies the security group of that DB. How can the user configure that?
    1. It is not possible to get the notifications on a change in the security group
    2. Configure SNS to monitor security group changes
    3. Configure event notification on the DB security group
    4. Configure the CloudWatch alarm on the DB for a change in the security group
  9. It is advised that you watch the Amazon CloudWatch “_____” metric (available via the AWS Management Console or Amazon Cloud Watch APIs) carefully and recreate the Read Replica should it fall behind due to replication errors.
    1. Write Lag
    2. Read Replica
    3. Replica Lag
    4. Single Replica
  10. A company wants to monitor its RDS database for performance anomalies using machine learning without setting up complex monitoring rules. Which AWS service provides ML-powered anomaly detection specifically for RDS databases?
    1. Amazon CloudWatch Anomaly Detection
    2. AWS Trusted Advisor
    3. Amazon DevOps Guru for RDS
    4. Amazon Inspector
  11. A database administrator needs to audit all SQL activities on an Amazon RDS for Oracle database for compliance requirements. The audit logs must be tamper-proof and cannot be modified by database administrators. Which feature should be used?
    1. Enhanced Monitoring
    2. CloudTrail Logs
    3. Performance Insights
    4. Database Activity Streams
  12. An organization is transitioning from RDS Performance Insights to the new monitoring solution. Which AWS service is the designated successor providing fleet-wide monitoring, on-demand ML-powered analysis, and lock diagnostics for RDS databases?
    1. Amazon DevOps Guru for RDS
    2. AWS Compute Optimizer
    3. Amazon CloudWatch Database Insights
    4. Amazon Managed Grafana
  13. A company needs to receive RDS events and trigger automated remediation workflows using Step Functions when a failover occurs. Which service should be used to capture RDS events and route them to the Step Function?
    1. Amazon SNS
    2. Amazon CloudWatch Alarms
    3. Amazon EventBridge
    4. AWS CloudTrail
  14. Which of the following is true about the difference between CloudWatch metrics and Enhanced Monitoring for RDS? (Choose 2)
    1. CloudWatch collects metrics from the hypervisor while Enhanced Monitoring collects from an agent on the instance
    2. Enhanced Monitoring provides metrics at 5-minute intervals only
    3. Enhanced Monitoring is useful for understanding how different processes or threads use the CPU
    4. CloudWatch provides more granular OS-level metrics than Enhanced Monitoring