AWS RDS Best Practices

AWS RDS Best Practices

AWS recommends RDS best practices in terms of Monitoring, Performance, Security, and Operational Excellence.

RDS Basic Operational Guidelines

  • Monitoring
    • Memory, CPU, replica lag, and storage usage should be monitored.
    • CloudWatch can be setup for notifications when usage patterns change or when the capacity of deployment is approached, so that system performance and availability can be maintained.
  • Scaling
    • Scale up the DB instance when approaching storage capacity limits.
    • There should be some buffer in storage and memory to accommodate unforeseen increases in demand from the applications.
    • Enable RDS Storage Autoscaling to automatically scale storage when free storage space is low, helping avoid storage-full issues without manual intervention.
  • Backups
    • Enable Automatic Backups and set the backup window to occur during the daily low in WriteIOPS.
    • Use Multi-AZ to reduce the impact of backups on the primary DB instance.
  • Storage
    • Use General Purpose SSD (gp3) or Provisioned IOPS SSD (io2 Block Express) storage for all new instances.
    • Magnetic (standard) storage is deprecated as of May 1, 2026. Migrate existing magnetic storage instances to gp3 or io2 Block Express.
    • gp3 provides baseline performance of 3,000 IOPS and 125 MiB/s throughput, with the ability to independently scale IOPS up to 64,000 and throughput up to 4,000 MiB/s.
    • For latency-sensitive production workloads, use io2 Block Express which delivers sub-millisecond latency and up to 256,000 IOPS.
  • On a MySQL DB instance,
    • Do not create more than 10,000 tables using Provisioned IOPS or standard storage. Large numbers of tables will significantly increase database recovery time after a failover or database crash.
    • Avoid tables in the database growing too large. Provisioned storage limits restrict the maximum size of a MySQL table file to 16 TiB. Partition large tables so that file sizes are well under the 16 TiB limit. This can also improve performance and recovery time.
  • Performance
    • If the database workload requires more I/O than provisioned, recovery after a failover or database failure will be slow.
    • To increase the I/O capacity of a DB instance,
      • Migrate to a DB instance class with High I/O capacity.
      • Convert from magnetic storage to either General Purpose (gp3) or Provisioned IOPS (io2 Block Express) storage, depending on how much of an increase is needed.
      • If using Provisioned IOPS storage, provision additional throughput capacity.
    • Enable RDS Optimized Reads to achieve up to 2x faster complex query processing by leveraging local NVMe-based SSD storage for temporary tables.
    • Enable RDS Optimized Writes for MySQL and MariaDB to achieve up to 2x improvement in write transaction throughput at no additional cost.
  • Multi-AZ & Failover
    • Deploy applications in all Availability Zones, so if an AZ goes down, applications in other AZs will still be available.
    • Use RDS DB events to monitor failovers.
    • Set a TTL of less than 30 seconds, if the client application is caching the DNS data of the DB instances. As the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if the application tries to connect to an IP address that no longer is in service.
    • Multi-AZ requires the transaction logging feature to be enabled. Do not use features like Simple recovery mode, offline mode or Read-only mode which turn off transaction logging.
    • To shorten failover time
      • Ensure that sufficient Provisioned IOPS are allocated for your workload. Inadequate I/O can lengthen failover times. Database recovery requires I/O.
      • Use smaller transactions. Database recovery relies on transactions, so break up large transactions into multiple smaller transactions to shorten failover time.
      • Consider using RDS Proxy to reduce failover time by up to 79% for Aurora MySQL and 32% for RDS MySQL by maintaining application connections during failover.
    • Test failover for your DB instance to understand how long the process takes for your use case and to ensure that the application that accesses your DB instance can automatically connect to the new DB instance after failover.
    • Consider using Multi-AZ DB Cluster deployment for faster failovers (typically under 35 seconds) and readable standby instances.
  • Database Engine Versions
    • Regularly upgrade database engine versions to maintain security, performance, and compliance.
    • Enable automatic minor version upgrades for easier patching.
    • Schedule major version upgrades with proper testing in staging environments.
    • Be aware of RDS Extended Support charges that apply automatically after a major version reaches its end of standard support date. Plan upgrades before end-of-standard-support to avoid additional costs.
  • AWS Database Drivers
    • Use the AWS JDBC Driver, AWS Python Driver, or other AWS suite of drivers for faster switchover and failover times (single-digit seconds vs. tens of seconds for open-source drivers).
    • AWS drivers provide built-in support for authentication with AWS Secrets Manager, IAM, and Federated Identity.

Multi-AZ Deployment Options

  • RDS provides two Multi-AZ deployment options:
    • Multi-AZ DB Instance – One primary and one standby DB instance with synchronous replication. Failover time is typically 60–120 seconds.
    • Multi-AZ DB Cluster – One writer and two readable standby DB instances across three AZs with semi-synchronous replication. Failover time is typically under 35 seconds.
  • Multi-AZ DB Cluster Advantages:
    • Up to 2x faster transaction commit latency compared to Multi-AZ DB instance.
    • Two readable standby instances that can serve read traffic.
    • Faster automated failovers (typically under 35 seconds).
    • Supports RDS Optimized Writes and RDS Optimized Reads.
    • Supports gp3, io1, and io2 Block Express storage types.
    • IAM database authentication support.
  • Multi-AZ DB Cluster is supported for MySQL, PostgreSQL, and MariaDB engines.

DB Instance RAM Recommendations

  • An RDS performance best practice is to allocate enough RAM so that the working set resides almost completely in memory.
  • The working set is the data and indexes that are frequently in use on the instance.
  • Value of ReadIOPS should be small and stable.
  • ReadIOPS metric can be checked, using AWS CloudWatch while the DB instance is under load, to tell if the working set is almost all in memory.
  • If scaling up the DB instance class with more RAM results in a dramatic drop in ReadIOPS, the working set was not almost completely in memory.
  • Continue to scale up until ReadIOPS no longer drops dramatically after a scaling operation, or ReadIOPS is reduced to a very small amount.

RDS Security Best Practices

  • Do not use AWS root credentials to manage RDS resources; create IAM users for everyone.
  • Grant each user the minimum set of permissions required to perform his or her duties.
  • Use IAM groups to effectively manage permissions for multiple users.
  • Rotate your IAM credentials regularly.
  • Configure AWS Secrets Manager to automatically rotate the secrets for Amazon RDS, including the master user password.
  • Use IAM Database Authentication to authenticate to DB instances using IAM roles instead of database passwords. This eliminates the need to store credentials in the database and provides short-lived authentication tokens.
  • Use SSL/TLS connections to encrypt data in transit between applications and DB instances.
  • Enable encryption at rest using AWS KMS for database storage and backups.
  • Use AWS Security Hub to monitor RDS usage as it relates to security best practices.
  • Change the master user password using the AWS Management Console, CLI, or RDS API only (not SQL clients, which may unintentionally revoke privileges).

RDS Proxy

  • Amazon RDS Proxy is a fully managed, highly available database proxy that makes applications more scalable, more resilient to database failures, and more secure.
  • Connection Pooling: RDS Proxy establishes a pool of database connections and reuses them, reducing the stress on database compute and memory resources.
  • Improved Failover: RDS Proxy automatically connects to a standby DB instance while preserving application connections, reducing failover times significantly.
  • Security: RDS Proxy enforces IAM authentication and can retrieve credentials from AWS Secrets Manager, eliminating the need for database credentials in application code.
  • Best Use Cases:
    • Serverless and event-driven applications (e.g., AWS Lambda) with many short-lived connections.
    • Applications that open and close database connections rapidly.
    • Applications that require high availability with fast failover.
    • Applications that need to enforce IAM-based access to databases.
  • Supports MySQL, PostgreSQL, MariaDB, and SQL Server engines.

Blue/Green Deployments

  • Amazon RDS Blue/Green Deployments create a fully managed staging (green) environment that mirrors the production (blue) environment for safer database changes.
  • Key Benefits:
    • Perform major version upgrades, schema changes, and parameter group modifications with minimal downtime.
    • Test changes in the green environment while production continues to run on the blue environment.
    • Switchover typically completes in under a minute with built-in safeguards.
    • Automatic rollback if the switchover encounters issues.
  • Best Practices:
    • Thoroughly test the green environment before switching over.
    • Keep databases in the green environment read-only to avoid replication conflicts.
    • Make only replication-compatible schema changes.
    • Ensure data loading (lazy loading) is complete before switching over.
  • Supported for RDS for MySQL, MariaDB, and PostgreSQL.

CloudWatch Database Insights

  • Amazon CloudWatch Database Insights (successor to Performance Insights) provides database monitoring and analysis capabilities.
  • Note: Performance Insights is being replaced by CloudWatch Database Insights. The Performance Insights console experience reaches end-of-life on June 30, 2026. Migrate to the Advanced mode of Database Insights before that date.
  • Key Features:
    • Monitors database load using the DB Load metric based on active sessions.
    • On-demand analysis uses machine learning to identify performance bottlenecks and provide remediation advice.
    • Available for all RDS engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Db2.
  • Modes:
    • Standard mode – Included at no additional cost; provides 7 days of performance data history.
    • Advanced mode – Provides extended retention (up to 24 months), execution plans, and on-demand analysis features.

Using Enhanced Monitoring to Identify Operating System Issues

  • RDS provides metrics in real time for the operating system (OS) that your DB instance runs on.
  • Enhanced Monitoring provides granularity at 1, 5, 10, 15, 30, or 60 second intervals.
  • Enhanced Monitoring is available for all current generation DB instance classes.
  • Note: Previous generation instances (db.t1.micro, db.m1.small) have been deprecated since February 2023.

Using Metrics to Identify Performance Issues

  • To identify performance issues caused by insufficient resources and other common bottlenecks, you can monitor the metrics available for your Amazon RDS DB instance.
  • Performance metrics should be monitored on a regular basis to benchmark the average, maximum, and minimum values for a variety of time ranges to help identify performance degradation.
  • CloudWatch alarms can be set for particular metric thresholds to be alerted when they are reached or breached.
  • A DB instance has a number of different categories of metrics which includes CPU, memory, disk space, IOPS, db connections and network traffic, and how to determine acceptable values depends on the metric.
  • One of the best ways to improve DB instance performance is to tune the most commonly used and most resource-intensive queries to make them less expensive to run.
  • For Multi-AZ DB clusters, monitor replica lag – the time difference between the latest transaction on the writer DB instance and the latest applied transaction on a reader DB instance.

Recovery

  • MySQL
    • InnoDB is the recommended and supported storage engine for MySQL DB instances on Amazon RDS.
    • However, MyISAM performs better than InnoDB if you require intense, full-text search capability.
    • Point-In-Time Restore and snapshot restore features of Amazon RDS for MySQL require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
    • Although MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
    • MyISAM storage engine does not support reliable crash recovery and might prevent a Point-In-Time Restore or snapshot restore from working as intended which might result in lost or corrupt data when MySQL is restarted after a crash.
    • InnoDB instances can be migrated to Aurora, while MyISAM instances cannot.
    • MySQL table file size limit is 16 TiB (updated from 6 TB). Partition large tables well under this limit.
  • MariaDB
    • InnoDB is the recommended and supported storage engine for MariaDB DB instances on Amazon RDS.
    • Point-In-Time Restore and snapshot restore features of Amazon RDS for MariaDB require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
    • Although MariaDB supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
    • For e.g., although Aria is a crash-safe replacement for MyISAM, it might still prevent a Point-In-Time Restore or snapshot restore from working as intended. This might result in lost or corrupt data when MariaDB is restarted after a crash.

RDS Extended Support

  • Amazon RDS Extended Support provides up to three additional years of critical security and bug fixes beyond a major version’s end of standard support date.
  • Extended Support charges apply automatically when running a major version past its end of standard support date.
  • Best Practice: Plan and execute major version upgrades before the end-of-standard-support date to avoid Extended Support charges.
  • Use Blue/Green Deployments to perform major version upgrades with minimal downtime.
  • Currently supported MySQL versions: 8.0 and 8.4 (MySQL 5.7 under Extended Support).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  2. Amazon RDS automated backups and DB Snapshots are currently supported for only the __________ storage engine
    1. InnoDB
    2. MyISAM
  3. A company wants to reduce the failover time for their Amazon RDS Multi-AZ deployment from approximately 60-120 seconds. Which approach would provide the fastest failover? Choose the correct answer.
    1. Use RDS Proxy in front of the Multi-AZ DB instance
    2. Migrate to a Multi-AZ DB Cluster deployment which provides failover typically under 35 seconds
    3. Increase the Provisioned IOPS on the DB instance
    4. Enable Enhanced Monitoring to detect failures faster
  4. A serverless application using AWS Lambda functions experiences database connection exhaustion on its Amazon RDS MySQL instance during peak traffic. What is the recommended solution?
    1. Increase the max_connections parameter in the DB parameter group
    2. Scale up to a larger DB instance class
    3. Deploy Amazon RDS Proxy to manage connection pooling and multiplexing
    4. Add a read replica to distribute the load
  5. A company wants to perform a major version upgrade of their production Amazon RDS for PostgreSQL database with minimal downtime. Which approach is recommended?
    1. Take a snapshot, restore to a new instance with the new version, and switch DNS
    2. Use the modify DB instance option to upgrade in-place during a maintenance window
    3. Use Amazon RDS Blue/Green Deployments to create a staging environment, test the upgrade, and switch over
    4. Create a read replica with the new version and promote it
  6. Which of the following are advantages of Amazon RDS Multi-AZ DB Cluster deployment over Multi-AZ DB Instance deployment? (Choose 3)
    1. Readable standby instances that can serve read traffic
    2. Faster automated failover (typically under 35 seconds)
    3. Support for all RDS database engines
    4. Up to 2x faster transaction commit latency
    5. Zero-downtime failover
  7. A company is running Amazon RDS for MySQL 5.7 which has reached end of standard support. What happens if they take no action? Choose the correct answer.
    1. The database will be automatically upgraded to MySQL 8.0
    2. The database will be terminated
    3. AWS will automatically charge RDS Extended Support fees for continued security patches
    4. The database will stop receiving any updates but continue running at the same cost
  8. Which storage type is AWS deprecating for Amazon RDS as of May 2026?
    1. General Purpose SSD (gp2)
    2. Magnetic (standard) storage
    3. Provisioned IOPS (io1)
    4. General Purpose SSD (gp3)

References