Kubernetes Overview

Kubernetes Overview

  • Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.
  • Kubernetes originates from Greek, meaning helmsman or pilot.
  • Kubernetes provides an orchestration framework to run distributed systems resiliently. It takes care of scaling and failover for the application, provides deployment patterns, and more.

Container Deployment Model

Deployment evolution

  • Containers are similar to VMs, but they have relaxed isolation properties to share the Operating System (OS) among the applications.
  • Containers are lightweight and have their own filesystem, share of CPU, memory, process space, and more.
  • Containers are decoupled from the underlying infrastructure, they are portable across clouds and OS distributions.
  • Containers provide the following benefits
    • Agile application creation and deployment
    • Continuous development, integration, and deployment
    • Dev and Ops separation of concerns
    • Observability
    • Environmental consistency across development, testing, and production
    • Cloud and OS distribution portability
    • Application-centric management
    • Loosely coupled, distributed, elastic, liberated micro-services
    • Resource isolation & utilization

Kubernetes Features

  • Service discovery and load balancing
    • Kubernetes can expose a container using the DNS name or using their own IP address.
    • If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable.
  • Storage orchestration
    • Kubernetes allows you to automatically mount a storage system of your choice, such as local storage, public cloud providers, and more.
  • Automated rollouts and rollbacks
    • Kubernetes can change the actual state of the deployed containers to the desired state at a controlled rate ensuring zero downtime.
  • Automatic bin packing
    • Kubernetes can fit containers onto the available nodes to make the best use of the resources as per the specified container specification.
  • Self-healing & High Availability

    • Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to the user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
  • Scalability
    • Kubernetes can help scale the application as per the load.
  • Secret and configuration management
    • Kubernetes helps store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys.
    • Secrets and application configuration can be deployed without rebuilding the container images, and without exposing secrets in the stack configuration.

Kubernetes Architecture

Refer to detailed blog post @ Kubernetes Architecture

Kubernetes ArchitectureMaster components

  • Master components provide the cluster’s control plane.
  • Master components make global decisions about the cluster (for example, scheduling), and that they detect and answer cluster events (for example, beginning a replacement pod when a deployment’s replicas field is unsatisfied).
  • Master components include
    • Kube-API server – Exposes the API.
    • Etcd – key-value stores all cluster data. (Can be run on the same server as a master node or on a dedicated cluster.)
    • Kube-scheduler – Schedules new pods on worker nodes.
    • Kube-controller-manager – Runs the controllers.
    • Cloud-controller-manager – Talks to cloud providers.

Node components

  • Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment.
    • Kubelet – Agent that ensures containers in a pod are running.
    • Kube-proxy – Keeps network rules and performs forwarding.
    • Container runtime – Runs containers.

Kubernetes Components

Refer to blog post @ Kubernetes Components

Kubernetes Security

Refer to blog post @ Kubernetes Security

 

AWS ElastiCache

AWS ElastiCache

🆕 Major Updates (2024-2026)

  • Valkey is now the recommended engine (open-source Redis fork, BSD licensed, stewarded by Linux Foundation)
  • ElastiCache Serverless (GA Nov 2023) – zero infrastructure management with instant scaling
  • Vector Search (GA Oct 2025) – microsecond-latency similarity search with 99% recall
  • Full-Text & Hybrid Search (Valkey 9.0, May 2026) – real-time search without separate service
  • Durability (June 2026) – Multi-AZ transactional log with zero data loss option
  • ElastiCache now supports three engines: Valkey, Memcached, and Redis OSS
  • AWS ElastiCache is a managed web service that helps deploy and run Valkey, Memcached, or Redis OSS protocol-compliant cache clusters in the cloud easily.
  • ElastiCache is available in three engines: Valkey (recommended), Memcached, and Redis OSS
  • ElastiCache helps
    • simplify and offload the management, monitoring, and operation of in-memory cache environments, enabling the engineering resources to focus on developing applications.
    • automate common administrative tasks required to operate a distributed cache environment.
    • improves the performance of web applications by allowing retrieval of information from a fast, managed, in-memory caching system, instead of relying entirely on slower disk-based databases.
    • helps improve load & response times to user actions and queries, but also reduces the cost associated with scaling web applications.
    • helps automatically detect and replace failed cache nodes, providing a resilient system that mitigates the risk of overloaded databases, which can slow website and application load times.
    • provides enhanced visibility into key performance metrics associated with the cache nodes through integration with CloudWatch.
    • code, applications, and popular tools already using Memcached, Redis OSS, or Valkey environments work seamlessly, with being protocol-compliant with these environments
  • ElastiCache provides in-memory caching which can
    • significantly lower latency and improve throughput for many
      • read-heavy application workloads e.g. social networking, gaming, media sharing, and Q&A portals.
      • compute-intensive workloads such as a recommendation engine.
    • improve application performance by storing critical pieces of data in memory for low-latency access.
    • be used to cache the results of I/O-intensive database queries or the results of computationally-intensive calculations.
  • ElastiCache currently allows access only from within a VPC. It can be accessed from EC2 instances, Lambda functions, or other services within the same VPC, or via VPN/Direct Connect from on-premises networks.

ElastiCache Engine Options

  • ElastiCache supports three engines:
    • Valkey – Recommended engine. Open-source, BSD-licensed, high-performance key-value datastore stewarded by the Linux Foundation. Drop-in replacement for Redis OSS with 230% higher throughput and 20% better memory efficiency.
    • Redis OSS – Open-source key-value store (versions up to 7.2 under BSD license). Redis 7.4+ changed to SSPL/RSALv2, and Redis 8.0+ moved to AGPLv3. ElastiCache continues to support Redis OSS 7.x.
    • Memcached – Simple, high-performance in-memory key-value store for small chunks of arbitrary data.
  • ElastiCache offers two deployment options:
    • Serverless – Zero infrastructure management, instant scaling, create a cache in under a minute. Pay-per-use based on data stored and requests executed.
    • Self-designed (Node-based) – Traditional cluster deployment with control over node types, shard count, and replica configuration.

Valkey (Recommended Engine)

  • Valkey is an open-source, high-performance key-value datastore stewarded by the Linux Foundation, backed by 40+ companies including AWS, Google, and Microsoft.
  • Valkey was forked from Redis OSS 7.2.4 (the last BSD-licensed release) in March 2024, after Redis Ltd. changed its license to SSPL/RSALv2.
  • ElastiCache for Valkey provides:
    • 230% higher throughput compared to Redis OSS
    • 20% better memory efficiency
    • 33% lower pricing on Serverless compared to other engines
    • 20% lower pricing on self-designed (node-based) clusters
    • Full wire-compatibility with Redis OSS – existing code works without changes
  • Valkey version history on ElastiCache:
    • Valkey 7.2 (Oct 2024) – Initial release, drop-in Redis OSS replacement
    • Valkey 8.0 (Nov 2024) – Faster scaling for Serverless, improved memory efficiency
    • Valkey 8.1 (Jul 2025) – Vector search, Bloom filters, performance improvements (8% more ops/sec, 22% lower P99 latency)
    • Valkey 9.0 (May 2026) – Full-text search, hybrid search, aggregation pipelines, durability

Valkey Key Features

  • All Redis OSS features (replication, Multi-AZ, backup/restore, cluster mode, Global Datastore)
  • Vector Search (GA Oct 2025) – Index, search, and update billions of high-dimensional vectors with microsecond latency and up to 99% recall. Supports HNSW and FLAT algorithms with Euclidean, cosine, and inner product distance metrics.
  • Full-Text Search (May 2026) – Real-time full-text, exact-match, and numeric range search directly in cache. Search terabytes of data with microsecond latency and millions of search ops/sec.
  • Hybrid Search (May 2026) – Combine vector similarity with full-text search, tag filters, and numeric filters in a single query for optimized relevance.
  • Durability (Jun 2026) – Multi-AZ transactional log prevents data loss during failures:
    • Synchronous writes: Data persisted across 2+ AZs before responding. Zero data loss at single-digit millisecond write latency.
    • Asynchronous writes: Data persisted after responding. Microsecond write latency at no extra cost, with up to 10 seconds of possible data loss in rare failures.
  • Bloom Filters (Jul 2025) – Space-efficient probabilistic data structure to quickly check set membership.
  • Semantic Caching for AI – Use vector search to cache and retrieve semantically similar queries for GenAI/LLM applications, reducing API costs and latency.

ElastiCache Valkey/Redis vs Memcached

AWS ElastiCache Redis vs Memcached

ElastiCache Serverless

  • ElastiCache Serverless (GA November 2023) provides a serverless option that eliminates infrastructure management and capacity planning.
  • Key capabilities:
    • Create a cache in under a minute by providing just a name
    • Automatically scales capacity based on application traffic patterns
    • Monitors memory, CPU, and network utilization continuously
    • Provides a simple endpoint experience abstracting cluster topology
    • Data automatically replicated across multiple AZs with up to 99.99% availability SLA
    • Zero downtime maintenance
  • Supported engines for Serverless:
    • Valkey 7.2 and above (recommended, 33% lower pricing)
    • Memcached 1.6 and above
    • Redis OSS 7.0 and above
  • Pricing: Pay-per-use based on data stored (per GB-hour) and ElastiCache Processing Units (ECPUs) consumed
  • Serverless for Valkey 8.0 can scale from zero to 5M requests per second in under 13 minutes with consistent sub-millisecond p50 read latency
  • Ideal for:
    • Variable or unpredictable workloads
    • New applications where traffic patterns are unknown
    • Development and testing environments
    • Applications with spiky traffic that want to avoid over-provisioning

Redis OSS

  • Redis is an open source key-value cache & store. Note: Redis 7.4+ changed to SSPL/RSALv2 license (March 2024), and Redis 8.0 moved to AGPLv3 (March 2025).
  • ElastiCache for Redis OSS continues to support versions up to Redis OSS 7.x. AWS recommends migrating to ElastiCache for Valkey for better performance, lower cost, and continued open-source (BSD) licensing.
  • Redis OSS versions 4 and 5 reached community End of Life. Standard support for ElastiCache versions 4 and 5 ended January 31, 2026, after which clusters are enrolled in Extended Support.
  • ElastiCache for Redis OSS can be used as a primary in-memory key-value data store, providing fast, sub-millisecond data performance, high availability and scalability up to 16 nodes plus up to 5 read replicas, each of up to 3.55 TiB of in-memory data.
  • ElastiCache for Redis OSS supports (similar to RDS features)
    • Redis Master/Slave replication.
    • Multi-AZ operation by creating read replicas in another AZ
    • Backup and Restore feature for persistence using snapshots
  • ElastiCache for Redis OSS can be vertically scaled upwards by selecting a larger node type or by adding shards (with cluster mode enabled).
  • Parameter group can be specified for Redis OSS during installation, which acts as a “container” for configuration values that can be applied to one or more primary clusters.
  • Append Only File – AOF
    • provides persistence and can be enabled for recovery scenarios.
    • if a node restarts or service crashes, Redis will replay the updates from an AOF file, thereby recovering the data lost due to the restart or crash.
    • cannot protect against all failure scenarios, cause if the underlying hardware fails, a new server would be provisioned and the AOF file will no longer be available to recover the data.
  • ElastiCache for Redis OSS doesn’t support the AOF feature but you can achieve persistence by snapshotting the Redis data using the Backup and Restore feature.
  • Enabling Redis Multi-AZ is a Better Approach to Fault Tolerance, as failing over to a read replica is much faster than rebuilding the primary from an AOF file.
  • Note: For new deployments, AWS recommends using ElastiCache for Valkey with the new Durability feature (Multi-AZ transactional log) instead of AOF for data persistence.

Redis OSS / Valkey Features

  • High Availability, Fault Tolerance & Auto Recovery
    • Multi-AZ for a failed primary cluster to a read replica, in Redis/Valkey clusters that support replication.
    • Fault Tolerance – Flexible AZ placement of nodes and clusters
    • High Availability – Primary instance and a synchronous secondary instance to fail over when problems occur. You can also use read replicas to increase read scaling.
    • Auto-Recovery – Automatic detection of and recovery from cache node failures.
    • Backup & Restore – Automated backups or manual snapshots can be performed. Restore process works reliably and efficiently.
  • Performance
    • Data Partitioning – Cluster mode supports partitioning the data across up to 500 shards.
    • Data Tiering – Provides a price-performance option by utilizing lower-cost solid state drives (SSDs) in each cluster node in addition to storing data in memory. It is ideal for workloads that access up to 20% of their overall dataset regularly, and for applications that can tolerate additional latency when accessing data on SSD.
    • Auto Scaling – Automatically adjusts the number of shards or replicas in response to changes in demand (not supported for Global Datastores, Outposts, or Local Zones).
  • Security
    • Encryption – Supports encryption in transit and encryption at rest. This support helps you build HIPAA-compliant applications.
    • Access Control – Control access using AWS IAM to define users and permissions.
    • Supports Redis AUTH or Managed Role-Based Access Control (RBAC).
    • AWS PrivateLink – Privately access ElastiCache APIs from within a VPC without exposing traffic to the public internet.
  • Administration
    • Low Administration – Manages backups, software patching, automatic failure detection, and recovery.
    • Integration with other AWS services such as EC2, CloudWatch, CloudTrail, and SNS.
    • Global Datastore provides fully managed, fast, reliable, and secure replication across AWS Regions. Cross-Region read replica clusters can be created to enable low-latency reads and disaster recovery across AWS Regions.

Read Replica (Valkey/Redis OSS)

  • Read Replicas help provide Read scaling and handling failures
  • Read Replicas are kept in sync with the Primary node using asynchronous replication technology
  • Read Replicas provides
    • Horizontal scaling beyond the compute or I/O capacity of a single primary node for read-heavy workloads.
    • Serving read traffic while the primary is unavailable either being down due to failure or maintenance
    • Data protection scenarios to promote a Read Replica as the primary node, in case the primary node or the AZ of the primary node fails.
  • ElastiCache supports initiated or forced failover where it flips the DNS record for the primary node to point at the read replica, which is in turn promoted to become the new primary.
  • Read replica cannot span across regions and may only be provisioned in the same or different AZ of the same Region as the cache node primary. (Use Global Datastore for cross-region replication.)

Multi-AZ (Valkey/Redis OSS)

  • ElastiCache for Valkey/Redis OSS shard consists of a primary and up to 5 read replicas
  • Data is asynchronously replicated from the primary node to the read replicas
  • Multi-AZ mode
    • provides enhanced availability and a smaller need for administration as the node failover is automatic.
    • impact on the ability to read/write to the primary is limited to the time it takes for automatic failover to complete.
    • no longer needs monitoring of nodes and manually initiating a recovery in the event of a primary node disruption.
  • During certain types of planned maintenance, or in the unlikely event of node failure or AZ failure,
    • it automatically detects the failure,
    • selects a replica, depending upon the read replica with the smallest asynchronous replication lag to the primary, and promotes it to become the new primary node
    • it will also propagate the DNS changes so that the primary endpoint remains the same
  • If Multi-AZ is not enabled,
    • ElastiCache monitors the primary node.
    • in case the node becomes unavailable or unresponsive, it will repair the node by acquiring new service resources.
    • it propagates the DNS endpoint changes to redirect the node’s existing DNS name to point to the new service resources.
    • If the primary node cannot be healed and you will have the choice to promote one of the read replicas to be the new primary.

Backup & Restore (Valkey/Redis OSS)

  • Backup and Restore allow users to create snapshots of clusters.
  • Snapshots can be used for recovery, restoration, archiving purposes, or warm start a cluster with preloaded data
  • Snapshots can be created on a cluster basis using the native mechanism to create and store an RDB file as the snapshot.
  • Increased latencies for a brief period at the node might be encountered while taking a snapshot and is recommended to be taken from a Read Replica minimizing performance impact
  • Snapshots can be created either automatically (if configured) or manually
  • When a cluster is deleted, automatic snapshots are removed. However, manual snapshots are retained.

Cluster Mode (Valkey/Redis OSS)

ElastiCache provides the ability to create distinct types of clusters:

  • A cluster mode disabled cluster
    • always has a single shard with up to 5 read replica nodes.
  • A cluster mode enabled cluster
    • has up to 500 shards with 1 to 5 read replica nodes in each.

ElastiCache Redis Cluster Mode

  • Scaling vs Partitioning
    • Cluster mode disabled supports Horizontal scaling for read capacity by adding or deleting replica nodes, or vertical scaling by scaling up to a larger node type.
    • Cluster mode enabled supports partitioning the data across up to 500 node groups. The number of shards can be changed dynamically as the demand changes. It also helps spread the load over a greater number of endpoints, which reduces access bottlenecks during peak demand.
  • Node Size vs Number of Nodes
    • Cluster mode disabled has only one shard and the node type must be large enough to accommodate all the cluster’s data plus necessary overhead.
    • Cluster mode enabled can have smaller node types as the data can be spread across partitions.
  • Reads vs Writes
    • Cluster mode disabled can be scaled for reads by adding more read replicas (5 max)
    • Cluster mode enabled can be scaled for both reads and writes by adding read replicas and multiple shards.

Memcached

  • Memcached is an in-memory key-value store for small chunks of arbitrary data.
  • ElastiCache for Memcached can be used to cache a variety of objects
    • from the content in persistent data stores such as RDS, DynamoDB, or self-managed databases hosted on EC2)
    • dynamically generated web pages e.g. with Nginx
    • transient session data that may not require a persistent backing store
  • ElastiCache for Memcached
    • can be scaled Vertically by increasing the node type size
    • can be scaled Horizontally by adding and removing nodes
    • does not support the persistence of data
    • does not support replication, Multi-AZ, or backups
  • ElastiCache for Memcached cluster can have
    • nodes that can span across multiple AZs within the same region
    • maximum of 20 nodes per cluster with a maximum of 100 nodes per region (soft limit and can be extended).
  • ElastiCache for Memcached supports auto-discovery, which enables the automatic discovery of cache nodes by clients when they are added to or removed from an ElastiCache cluster.

ElastiCache Mitigating Failures

  • ElastiCache should be designed to plan so that failures have a minimal impact on the application and data.
  • Mitigating Failures when Running Memcached
    • Mitigating Node Failures
      • spread the cached data over more nodes
      • as Memcached does not support replication, a node failure will always result in some data loss from the cluster
      • having more nodes will reduce the proportion of cache data lost
    • Mitigating Availability Zone Failures
      • locate the nodes in as many availability zones as possible, only the data cached in that AZ is lost, not the data cached in the other AZs
  • Mitigating Failures when Running Valkey/Redis OSS
    • Mitigating Cluster Failures
      • Durability (Valkey 9.0+, Recommended)
        • Uses Multi-AZ transactional log to prevent data loss during failures
        • Synchronous writes: zero data loss, single-digit millisecond write latency
        • Asynchronous writes: microsecond write latency, up to 10 seconds of potential data loss
        • Both options maintain microsecond read latency
        • Replaces the need for AOF-based recovery
      • Redis Append Only Files (AOF) (Legacy approach)
        • enable AOF so whenever data is written to the cluster, a corresponding transaction record is written to a Redis AOF.
        • when Redis process restarts, ElastiCache creates a replacement cluster and provisions it and repopulates it with data from AOF.
        • It is time-consuming
        • AOF can get big.
        • Using AOF cannot protect you from all failure scenarios.
      • Replication Groups
        • A replication group is comprised of a single primary cluster which the application can both read from and write to, and from 1 to 5 read-only replica clusters.
        • Data written to the primary cluster is also asynchronously updated on the read replica clusters.
        • When a Read Replica fails, ElastiCache detects the failure, replaces the instance in the same AZ, and synchronizes with the Primary Cluster.
        • Multi-AZ with Automatic Failover: ElastiCache detects Primary cluster failure and promotes a read replica with the least replication lag to primary.
        • Multi-AZ with Auto Failover disabled: ElastiCache detects Primary cluster failure, creates a new one and syncs the new Primary with one of the existing replicas.
    • Mitigating Availability Zone Failures
      • locate the clusters in as many availability zones as possible

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon ElastiCache provide?
    1. A service by this name doesn’t exist. Perhaps you mean Amazon CloudCache.
    2. A virtual server with a huge amount of memory.
    3. A managed In-memory cache service
    4. An Amazon EC2 instance with the Memcached software already pre-installed.
  2. You are developing a highly available web application using stateless web servers. Which services are suitable for storing session state data? Choose 3 answers.
    1. Elastic Load Balancing
    2. Amazon Relational Database Service (RDS)
    3. Amazon CloudWatch
    4. Amazon ElastiCache
    5. Amazon DynamoDB
    6. AWS Storage Gateway
  3. Which statement best describes ElastiCache?
    1. Reduces the latency by splitting the workload across multiple AZs
    2. A simple web services interface to create and store multiple data sets, query your data easily, and return the results
    3. Offload the read traffic from your database in order to reduce latency caused by read-heavy workload
    4. Managed service that makes it easy to set up, operate and scale a relational database in the cloud
  4. Our company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
    1. Deploy ElastiCache in-memory cache running in each availability zone
    2. Implement sharding to distribute load to multiple RDS MySQL instances
    3. Increase the RDS MySQL Instance size and Implement provisioned IOPS
    4. Add an RDS MySQL read replica in each availability zone
  5. You are using ElastiCache Memcached to store session state and cache database queries in your infrastructure. You notice in CloudWatch that Evictions and Get Misses are both very high. What two actions could you take to rectify this? Choose 2 answers
    1. Increase the number of nodes in your cluster
    2. Tweak the max_item_size parameter
    3. Shrink the number of nodes in your cluster
    4. Increase the size of the nodes in the cluster
  6. You have been tasked with moving an ecommerce web application from a customer’s datacenter into a VPC. The application must be fault tolerant and well as highly scalable. Moreover, the customer is adamant that service interruptions not affect the user experience. As you near launch, you discover that the application currently uses multicast to share session state between web servers, In order to handle session state within the VPC, you choose to:
    1. Store session state in Amazon ElastiCache for Valkey/Redis (scalable and makes the web applications stateless)
    2. Create a mesh VPN between instances and allow multicast on it
    3. Store session state in Amazon Relational Database Service (RDS solution not highly scalable)
    4. Enable session stickiness via Elastic Load Balancing (affects user experience if the instance goes down)
  7. When you are designing to support a 24-hour flash sale, which one of the following methods best describes a strategy to lower the latency while keeping up with unusually heavy traffic?
    1. Launch enhanced networking instances in a placement group to support the heavy traffic (only improves internal communication)
    2. Apply Service Oriented Architecture (SOA) principles instead of a 3-tier architecture (just simplifies architecture)
    3. Use Elastic Beanstalk to enable blue-green deployment (only minimizes download for applications and ease of rollback)
    4. Use ElastiCache as in-memory storage on top of DynamoDB to store user sessions (scalable, faster read/writes and in memory storage)
  8. You are configuring your company’s application to use Auto Scaling and need to move user state information. Which of the following AWS services provides a shared data store with durability and low latency?
    1. AWS ElastiCache Memcached (does not provide durability as if the node is gone the data is gone)
    2. Amazon Simple Storage Service
    3. Amazon EC2 instance storage
    4. Amazon DynamoDB
  9. Your application is using an ELB in front of an Auto Scaling group of web/application servers deployed across two AZs and a Multi-AZ RDS Instance for data persistence. The database CPU is often above 80% usage and 90% of I/O operations on the database are reads. To improve performance you recently added a single-node Memcached ElastiCache Cluster to cache frequent DB query results. In the next weeks the overall workload is expected to grow by 30%. Do you need to change anything in the architecture to maintain the high availability for the application with the anticipated additional load and Why?
    1. You should deploy two Memcached ElastiCache Clusters in different AZs because the RDS Instance will not be able to handle the load if the cache node fails.
    2. If the cache node fails the automated ElastiCache node recovery feature will prevent any availability impact. (does not provide high availability, as data is lost if the node is lost)
    3. Yes you should deploy the Memcached ElastiCache Cluster with two nodes in the same AZ as the RDS DB master instance to handle the load if one cache node fails. (Single AZ affects availability as DB is Multi AZ and would be overloaded is the AZ goes down)
    4. No if the cache node fails you can always get the same data from the DB without having any availability impact. (Will overload the database affecting availability)
  10. A read only news reporting site with a combined web and application tier and a database tier that receives large and unpredictable traffic demands must be able to respond to these traffic fluctuations automatically. What AWS services should be used meet these requirements?
    1. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and RDS with read replicas.
    2. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and RDS with read replicas (Stateful instances will not allow for scaling)
    3. Stateful instances for the web and application tier in an autoscaling group monitored with CloudWatch and multi-AZ RDS (Stateful instances will allow not for scaling & multi-AZ is for high availability and not scaling)
    4. Stateless instances for the web and application tier synchronized using ElastiCache Memcached in an autoscaling group monitored with CloudWatch and multi-AZ RDS (multi-AZ is for high availability and not scaling)
  11. You have written an application that uses the Elastic Load Balancing service to spread traffic to several web servers. Your users complain that they are sometimes forced to login again in the middle of using your application, after they have already logged in. This is not behavior you have designed. What is a possible solution to prevent this happening?
    1. Use instance memory to save session state.
    2. Use instance storage to save session state.
    3. Use EBS to save session state.
    4. Use ElastiCache to save session state.
    5. Use Glacier to save session slate.
  12. A company wants to build a real-time recommendation engine for their e-commerce platform. The system needs to perform vector similarity searches against millions of product embeddings with sub-millisecond latency. Which AWS service and feature combination is most appropriate?
    1. Amazon OpenSearch Service with k-NN plugin
    2. Amazon RDS for PostgreSQL with pgvector extension
    3. Amazon ElastiCache for Valkey with vector search (provides microsecond-latency vector search with up to 99% recall, ideal for real-time use cases)
    4. Amazon Neptune with vector similarity
  13. A startup is launching a new application with unpredictable traffic patterns. They need a caching solution that requires minimal management and can scale automatically. They want to minimize costs during low-traffic periods. Which ElastiCache deployment option should they choose?
    1. ElastiCache for Redis OSS with cluster mode enabled
    2. ElastiCache Serverless for Valkey (zero infrastructure management, instant auto-scaling, pay-per-use, and Valkey offers 33% lower Serverless pricing)
    3. ElastiCache for Memcached with Auto Discovery
    4. ElastiCache for Redis OSS with data tiering
  14. An organization is migrating from ElastiCache for Redis OSS to ElastiCache for Valkey. Which statements about this migration are correct? (Choose 2 answers)
    1. Valkey is wire-compatible with Redis OSS, requiring no application code changes
    2. Valkey requires a different client library than Redis
    3. Valkey does not support cluster mode
    4. Valkey provides up to 230% higher throughput and 20% better memory efficiency compared to Redis OSS
  15. A financial services company needs an in-memory data store for payment tokenization that cannot tolerate any data loss, while maintaining microsecond read latency. Which ElastiCache configuration meets these requirements?
    1. ElastiCache for Redis OSS with AOF enabled
    2. ElastiCache for Memcached with Multi-AZ nodes
    3. ElastiCache for Valkey 9.0 with synchronous durability (Multi-AZ transactional log with synchronous writes ensures zero data loss while maintaining microsecond read latency)
    4. ElastiCache for Valkey with asynchronous durability

AWS DynamoDB Secondary Indexes

DynamoDB Secondary Indexes - GSI vs LSI

AWS DynamoDB Secondary Indexes

  • DynamoDB provides fast access to items in a table by specifying primary key values
  • DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.
  • DynamoDB Secondary indexes
    • is a data structure that contains a subset of attributes from a table.
    • is associated with exactly one table, from which it obtains its data.
    • requires an alternate key for the index partition key and sort key.
    • additionally can define projected attributes that are copied from the base table into the index along with the primary key attributes.
    • is automatically maintained by DynamoDB.
    • indexes on that table are also updated for any addition, modification, or deletion of items in the base table.
    • helps reduce the size of the data as compared to the main table, depending upon the project attributes, and hence helps improve provisioned throughput performance
    • are automatically maintained as sparse objects. Items will only appear in an index if they exist in the table on which the index is defined, making queries an index very efficient
    • use the same table class and capacity mode (provisioned or on-demand) as the base table they are associated with.
  • DynamoDB Secondary indexes support two types
    • Global secondary index – an index with a partition key and a sort key that can be different from those on the base table.
    • Local secondary index – an index that has the same partition key as the base table, but a different sort key.
  • DynamoDB supports up to 20 global secondary indexes (default quota, can request increase) and up to 5 local secondary indexes per table.

Global Secondary Indexes – GSI

  • DynamoDB creates and maintains indexes for the primary key attributes for efficient access to data in the table, which allows applications to quickly retrieve data by specifying primary key values.
  • Global Secondary Indexes – GSI are indexes that contain partition or composite partition-and-sort keys that can be different from the keys in the table on which the index is based.
  • Global secondary index is considered “global” because queries on the index can span all items in a table, across all partitions.
  • Multiple secondary indexes can be created on a table, and queries issued against these indexes.
  • Applications benefit from having one or more secondary keys available to allow efficient access to data with attributes other than the primary key.
  • GSIs support non-unique attributes, which increases query flexibility by enabling queries against any non-key attribute in the table
  • GSIs support eventual consistency only. DynamoDB automatically handles item additions, updates, and deletes in a GSI when corresponding changes are made to the table asynchronously. Strongly consistent reads are NOT supported on GSIs.
  • Data in a secondary index consists of GSI alternate key, primary key and attributes that are projected, or copied, from the table into the index.
  • Attributes that are part of an item in a table, but not part of the GSI key, the primary key of the table, or projected attributes are not returned on querying the GSI index.
  • GSIs can be created at the same time as the table, or added to an existing table. GSIs can also be deleted from an existing table. However, you cannot modify an existing GSI — you must delete and recreate it.
  • GSIs inherit the read/write capacity mode from the base table.
    • Provisioned Mode: GSIs manage throughput independently of the table they are based on. The provisioned throughput for the table and each associated GSI needs to be specified at the creation time.
      • Read provisioned throughput
        • provides one Read Capacity Unit with two eventually consistent reads per second for items < 4KB in size.
      • Write provisioned throughput
        • consumes 1 write capacity unit if,
          • a new item is inserted into the table that defines an indexed attribute
          • existing item is deleted from the table
          • existing items are updated for projected attributes
        • consumes 2 write capacity units if
          • existing item is updated for key attributes, which results in deletion and addition of the new item into the index
    • On-Demand Mode: GSIs automatically scale with the table’s traffic. Configurable maximum throughput can optionally limit on-demand capacity for both tables and their associated secondary indexes.
  • Throttling on a GSI affects the base table depending on whether the throttling is for read or write activity:
    • When a GSI has insufficient read capacity, the base table isn’t affected.
    • When a GSI has insufficient write capacity, write operations won’t succeed on the base table or any of its GSIs (back-pressure).
  • GSIs use the same table class as the base table (DynamoDB Standard or DynamoDB Standard-IA). When the table class is updated, all associated GSIs are updated as well.

Multi-Attribute Composite Keys (November 2025)

  • GSIs now support multi-attribute composite keys, allowing partition keys and sort keys to be composed of multiple attributes.
  • Partition key can be composed of up to 4 attributes and sort key can be composed of up to 4 attributes, for a total of up to 8 attributes per key schema.
  • Eliminates the need to manually concatenate values into synthetic keys (e.g., no more “TOURNAMENT#WINTER2024#REGION#NA-EAST” patterns).
  • Multi-attribute partition keys improve data distribution and uniqueness.
  • Multi-attribute sort keys enable flexible querying by letting you specify conditions on sort key attributes from left to right (hierarchical querying).
  • When querying, all partition key attributes must be specified using equality conditions. Sort key attributes can be queried left-to-right; inequality conditions must be the last condition.
  • Each attribute in a multi-attribute key can have its own data type: String (S), Number (N), or Binary (B).
  • Multi-attribute keys work particularly well when creating GSIs on existing tables — no need to backfill synthetic keys across data.
  • Available at no additional charge in all AWS Regions where DynamoDB is available.
  • Note: The base table primary key still uses the traditional structure of a single partition key + optional single sort key. Multi-attribute keys are only for GSIs.

Warm Throughput (November 2024)

  • DynamoDB introduced warm throughput for tables and indexes, providing visibility into read and write operations a table or GSI can immediately support.
  • Warm throughput values grow automatically as usage increases.
  • Tables and GSIs can be pre-warmed by proactively setting higher warm throughput values to prepare for anticipated traffic spikes.
  • Warm throughput values are available for all provisioned and on-demand tables and indexes at no cost.
  • Pre-warming incurs a charge (see DynamoDB Pricing).
  • Default warm throughput for a GSI is 12,000 read units and 4,000 write units.

Local Secondary Indexes (LSI)

  • Local secondary indexes are indexes that have the same partition key as the table, but a different sort key.
  • Local secondary index is “local” cause every partition of a local secondary index is scoped to a table partition that has the same partition key.
  • LSI allows search using a secondary index in place of the sort key, thus expanding the number of attributes that can be used for queries that can be conducted efficiently
  • LSI is updated automatically when the primary index is updated and reads support strong, eventual, and transactional consistency options.
  • LSIs can only be queried via the Query API
  • LSIs cannot be added to existing tables — they can only be created at the same time as the table
  • LSIs cannot be modified once created
  • LSIs cannot be removed from a table once they are created
  • For tables with local secondary indexes, there is a 10 GB size limit per partition key value (item collection size limit). This includes all items in the base table and all items in the LSIs that have the same partition key value.
  • LSI consumes provisioned throughput capacity as part of the table with which it is associated
    • Read Provisioned throughput
      • if data read is indexed and projected attributes
        • provides one Read Capacity Unit with one strongly consistent read (or two eventually consistent reads) per second for items < 4KB
        • data size includes the index and projected attributes only
      • if data read is indexed and a non-projected attribute
        • consumes double the read capacity, with one to read from the index and one to read (fetch) from the table with the entire data and not just the non-projected attribute
    • Write provisioned throughput
      • consumes 1 write capacity unit if,
        • a new item is inserted into the table
        • existing item is deleted from the table
        • existing items are updated for projected attributes
      • consumes 2 write capacity units if
        • existing item is updated for key attributes, which results in deletion and addition of the new item into the index

Global Secondary Index vs Local Secondary Index

Characteristic Global Secondary Index (GSI) Local Secondary Index (LSI)
Key Schema Simple (partition key) or composite (partition + sort key). Supports multi-attribute composite keys (up to 4+4 attributes). Must be composite (partition key + sort key). Single attribute only.
Key Attributes Partition key and sort key can be any base table attributes of type String, Number, or Binary. Partition key must be same as base table. Sort key can be any base table attribute of type String, Number, or Binary.
Size Restrictions No size restrictions per partition key value. 10 GB limit per partition key value (item collection).
Online Index Operations Can be created with the table or added/deleted on an existing table. Can only be created at table creation time. Cannot be added, modified, or deleted later.
Queries and Partitions Queries span all items across all partitions (global). Queries scoped to a single partition key value (local).
Read Consistency Eventual consistency only. Supports both eventual and strong consistency.
Provisioned Throughput Has its own independent provisioned throughput settings. Queries consume from the index. Consumes read/write capacity from the base table.
Projected Attributes Can only request projected attributes. DynamoDB does NOT fetch from base table. Can request non-projected attributes. DynamoDB automatically fetches from base table (at higher cost).
Maximum Count Up to 20 per table (default quota, can request increase). Up to 5 per table.

DynamoDB Secondary Indexes - GSI vs LSI

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. In DynamoDB, a secondary index is a data structure that contains a subset of attributes from a table, along with an alternate key to support ____ operations.
    1. None of the above
    2. Both
    3. Query
    4. Scan
  2. In regard to DynamoDB, what is the Global secondary index?
    1. An index with a partition and sort key that can be different from those on the table
    2. An index that has the same sort key as the table, but a different partition key
    3. An index that has the same partition key and sort key as the table
    4. An index that has the same partition key as the table, but a different sort key
  3. In regard to DynamoDB, can I modify the index once it is created?
    1. Yes, if it is a primary hash key index
    2. Yes, if it is a Global secondary index (GSIs can be added or deleted on existing tables, but cannot be modified in place — you must delete and recreate)
    3. No
    4. Yes, if it is a local secondary index
  4. When thinking of DynamoDB, what is true of Global Secondary Key properties?
    1. Both the partition key and sort key can be different from the table.
    2. Only the partition key can be different from the table.
    3. Either the partition key or the sort key can be different from the table, but not both.
    4. Only the sort key can be different from the table.
  5. A team needs to query a DynamoDB table using an attribute that is not part of the primary key. They need strongly consistent reads. Which type of secondary index should they use?
    1. Global Secondary Index with ConsistentRead set to true
    2. Local Secondary Index
    3. Either GSI or LSI supports strongly consistent reads
    4. Neither GSI nor LSI supports strongly consistent reads
  6. A DynamoDB table uses on-demand capacity mode. Which of the following statements about secondary indexes on this table is correct?
    1. Secondary indexes must use provisioned capacity mode
    2. GSIs can use a different capacity mode than the base table
    3. Secondary indexes inherit the capacity mode from the base table
    4. Only LSIs can use on-demand capacity mode
  7. With the multi-attribute composite key feature for DynamoDB GSIs, what is the maximum number of attributes that can compose a GSI key schema?
    1. 2 (one partition key, one sort key)
    2. 4 (two partition keys, two sort keys)
    3. 8 (up to four partition key attributes, up to four sort key attributes)
    4. 16 (up to eight partition keys, up to eight sort keys)
  8. What happens when a Global Secondary Index has insufficient write capacity?
    1. Only reads on the GSI are throttled
    2. The GSI becomes temporarily unavailable
    3. Write operations on the base table and all its GSIs are throttled (back-pressure)
    4. DynamoDB automatically increases the GSI write capacity

References

AWS IAM Access Management

IAM Access Policies

IAM Access Management

  • IAM Access Management is all about Permissions and Policies.
  • Permission help define who has access & what actions can they perform.
  • IAM Policy helps to fine-tune the permissions granted to the policy owner
  • IAM Policy is a document that formally states one or more permissions.
  • Most restrictive Policy always wins
  • IAM Policy is defined in the JSON (JavaScript Object Notation) format

IAM policy basically states “Principal A is allowed or denied (Effect) to perform Action B on Resource C given Conditions D are satisfied”

IAM Access Policies

  • An Entity can be associated with Multiple Policies and a Policy can have multiple statements where each statement in a policy refers to a single permission.
  • If the policy includes multiple statements, a logical OR is applied across the statements at evaluation time. Similarly, if multiple policies are applicable to a request, a logical OR is applied across the policies at evaluation time.
  • Principal can either be specified within the Policy for Resource based policies while for Identity based policies the principal is the user, group, or role to which the policy is attached.

IAM Policy Types

  • AWS supports nine types of policies: identity-based policies, resource-based policies, VPC endpoint policies, permissions boundaries, AWS Organizations service control policies (SCPs), AWS Organizations resource control policies (RCPs), access control lists (ACLs), AWS RAM resource shares, and session policies.
  • IAM policies define permissions for an action regardless of the method used to perform the operation (Console, CLI, or API).

Identity-Based vs Resource-Based Permissions

Identity-based, or IAM permissions

  • Identity-based or IAM permissions are attached to an IAM user, group, or role and specify what the user, group, or role can do.
  • User, group, or the role itself acts as a Principal.
  • IAM permissions can be applied to almost all the AWS services.
  • IAM Policies can either be inline or managed (AWS or Customer).
  • IAM Policy’s current version is 2012-10-17.

Resource-based permissions

  • Resource-based permissions are attached to a resource for e.g. S3, SNS
  • Resource-based permissions specify both who has access to the resource (Principal) and what actions they can perform on it (Actions)
  • Resource-based policies are inline only, not managed.
  • Resource-based permissions are supported only by some AWS services
  • Resource-based policies can be defined with version 2012-10-17 or 2008-10-17
  • Within the same account, if either the identity-based policy or the resource-based policy allows the request and the other doesn’t, the request is still allowed (union of permissions).

VPC Endpoint Policies

  • VPC endpoint policies are resource-based policies attached to a VPC endpoint to control which principals can use the endpoint and which resources can be accessed through it.
  • VPC endpoint policies act as an additional access boundary scoped to traffic that traverses the endpoint.
  • VPC endpoint policies do not override or replace identity-based policies or resource-based policies.
  • If no custom endpoint policy is attached, AWS attaches a default policy that allows full access.

Session Policies

  • Session policies are advanced inline policies passed as a parameter when programmatically creating a temporary session for a role or federated user.
  • Session policies limit the permissions that the role or user’s identity-based policies grant to the session.
  • The resulting session’s permissions are the intersection of the identity-based policies and the session policies.
  • Session policies do not grant permissions on their own; they can only restrict.
  • Can be passed using AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity, or GetFederationToken API operations.
  • You can pass up to 10 managed session policies using the PolicyArns parameter.

Access Control Lists (ACLs)

  • ACLs control which principals in other accounts can access the resource to which the ACL is attached.
  • ACLs cannot be used to control access for a principal within the same account.
  • ACLs are the only policy type that does not use the JSON policy document format.
  • Amazon S3, AWS WAF, and Amazon VPC are examples of services that support ACLs.

Managed Policies and Inline Policies

  • Managed policies
    • Managed policies are Standalone policies that can be attached to multiple users, groups, and roles in an AWS account.
    • Managed policies apply only to identities (users, groups, and roles) but not to resources.
    • Managed policies allow reusability
    • Managed policy changes are implemented as versions (limited to 5), a new change to the existing policy creates a new version which is useful to compare the changes and revert back, if needed
    • Managed policies have their own ARN
    • Two types of managed policies:
      • AWS managed policies
        • Managed policies that are created and managed by AWS.
        • AWS maintains and can upgrade these policies for e.g. if a new service is introduced, the changes automatically effect all the existing principals attached to the policy
        • AWS takes care of not breaking the policies for e.g. adding a restriction or removal of permission
        • AWS managed policies cannot be modified
      • Customer managed policies
        • Managed policies are standalone and custom policies created and administered by you.
        • Customer managed policies allow more precise control over the policies than when using AWS managed policies.
  • Inline policies
    • Inline policies are created and managed by you, and are embedded directly into a single user, group, or role.
    • Deletion of the Entity (User, Group or Role) or Resource deletes the In-Line policy as well

ABAC – Attribute-Based Access Control

  • ABAC – Attribute-based access control is an authorization strategy that defines permissions based on attributes called tags.
  • ABAC policies can be designed to allow operations when the principal’s tag matches the resource tag.
  • ABAC is helpful in environments that are growing rapidly and help with situations where policy management becomes cumbersome.
  • ABAC policies are easier to manage as different policies for different job functions need not be created.
  • Complements RBAC for granular permissions, with RBAC allowing access to only specific resources and ABAC can allow actions on all resources, but only if the resource tag matches the principal’s tag.
  • ABAC can help use employee attributes from the corporate directory with federation where attributes are applied to their resulting principal.
  • Amazon S3 now supports ABAC for general purpose buckets (launched Nov 2025), allowing tag-based access control on S3 resources directly.
  • ABAC support continues to expand across AWS services including OpenSearch Serverless, SageMaker Lakehouse, RDS, and Aurora.

IAM Permissions Boundaries

  • Permissions boundary allows using a managed policy to set the maximum permissions that an identity-based policy can grant to an IAM entity.
  • Permissions boundary allows it to perform only the actions that are allowed by both its identity-based policies and its permissions boundaries.
  • Permissions boundary supports both the AWS-managed policy and the customer-managed policy to set the boundary for an IAM entity.
  • Permissions boundary can be applied to an IAM entity (user or role) but is not supported for IAM Group.
  • Permissions boundary does not grant permissions on its own.
  • If a resource-based policy specifies a role session or user in the principal element, an explicit allow in the permission boundary is not required. However, if the resource-based policy specifies the role ARN, a permission boundary allow is required.
  • An explicit deny in the permissions boundary always takes effect regardless of other policies.

Service Control Policies (SCPs)

  • Service Control Policies (SCPs) are AWS Organizations policies that define the maximum permissions for IAM users and IAM roles within accounts in an organization or organizational unit (OU).
  • SCPs limit permissions that identity-based policies or resource-based policies grant to entities within the account.
  • SCPs do not grant permissions on their own; they only restrict.
  • SCPs affect all IAM users and roles in the member accounts, including the account root user.
  • SCPs do not affect the management account.
  • SCPs now support full IAM policy language (announced Sep 2025), including conditions, individual resource ARNs, and the NotAction element with Allow statements.
  • An explicit deny in an SCP overrides any allow in identity-based or resource-based policies.

Resource Control Policies (RCPs)

  • Resource Control Policies (RCPs) are a new authorization policy type in AWS Organizations launched at re:Invent 2024 (November 2024).
  • RCPs provide centralized preventative controls on AWS resources across the organization.
  • RCPs set the maximum available permissions for resources in member accounts, complementing SCPs which control permissions for principals.
  • RCPs help restrict external access to resources at scale and implement data perimeters.
  • RCPs do not grant permissions on their own.
  • RCPs affect resources in member accounts only, not the management account.
  • RCPs apply regardless of whether the principals belong to the organization.
  • At launch, RCPs support: Amazon S3, AWS STS, AWS KMS, Amazon SQS, and AWS Secrets Manager.
  • RCPs work alongside SCPs to provide comprehensive permission guardrails: SCPs control what principals can do, RCPs control what can be done to resources.
  • An explicit deny in an RCP overrides allows in other policies.

Declarative Policies

  • Declarative policies are a new AWS Organizations capability launched at re:Invent 2024 (December 2024).
  • Declarative policies allow centrally declaring and enforcing desired configuration for a given AWS service at scale across an organization.
  • Once attached, the configuration is always maintained even when services add new APIs or features.
  • Declarative policies are designed to prevent actions that are non-compliant with the policy.
  • Unlike SCPs which require predicting and denying specific API calls, declarative policies express the desired end state.
  • Can be attached to organization root, OUs, or individual accounts.
  • Supported for EC2, VPC, and other services at launch.
  • Example: A declarative policy can disallow public sharing of AMIs organization-wide.

IAM Access Analyzer

  • IAM Access Analyzer helps identify resources that are shared with external entities and validates policies for best practices.
  • IAM Access Analyzer provides multiple types of analysis:
    • External Access Analysis – Identifies resources shared with external principals outside the zone of trust (organization or account).
    • Internal Access Analysis (launched June 2025) – Identifies who within the organization has access to critical resources, providing 360-degree visibility.
    • Unused Access Analysis (launched re:Invent 2023) – Identifies unused roles, unused access keys, unused console passwords, and unused service/action-level permissions.
  • IAM Access Analyzer supports Custom Policy Checks using automated reasoning:
    • Check Access Not Granted – Verifies policies don’t grant access to specific critical actions.
    • Check No Public Access (July 2024) – Determines if a resource policy grants public access to a specified resource type.
    • Check No New Access – Compares updated policy against reference to ensure no new access is granted.
    • Custom policy checks can be integrated into CI/CD pipelines for proactive policy validation.
  • Guided Revocation (June 2024) – Provides actionable recommendations to revoke unused permissions, including refined policy suggestions tailored to actual access activity.
  • Policy Generation – Reviews CloudTrail logs and generates IAM policies based on actual access activity for a specified time frame.
  • Policy Validation – Provides over 100 policy checks including security warnings, errors, general warnings, and best practice suggestions.
  • Unused access analysis is a paid feature, charged per IAM role or user per month.
  • Unused access analysis scope can be customized (Jan 2025) to exclude specific accounts, roles, or users using account IDs or tags.

IAM Policy Simulator

  • IAM Policy Simulator helps test and troubleshoot IAM and resource-based policies
  • IAM Policy Simulator can help test the following ways:-
    • Test IAM based policies. If multiple policies are attached, you can test all the policies or select individual policies to test. You can test which actions are allowed or denied by the selected policies for specific resources.
    • Test Resource based policies. However, Resource-based policies cannot be tested standalone and have to be attached to the Resource
    • Test new IAM policies that are not yet attached to a user, group, or role by typing or copying them into the simulator. These are used only in the simulation and are not saved.
    • Test the policies with selected services, actions, and resources
    • Simulate real-world scenarios by providing context keys, such as an IP address or date, that are included in Condition elements in the policies being tested.
    • Identify which specific statement in a policy results in allowing or denying access to a particular resource or action.
  • IAM Policy Simulator does not make an actual AWS service request and hence does not make unwanted changes to the AWS live environment
  • IAM Policy Simulator just reports the result Allowed or Denied
  • IAM Policy Simulator allows you to modify the policy and test. These changes are not propagated to the actual policies attached to the entities
  • Policy Simulator results can differ from the live AWS environment; always verify against the live environment.
  • Policy Simulator can also be accessed programmatically using SimulateCustomPolicy and SimulatePrincipalPolicy API operations.

IAM Policy Evaluation

When determining if permission is allowed, AWS evaluates all applicable policy types in the following order:

IAM Permission Policy Evaluation

Updated Policy Evaluation Logic (Single Account)

  1. Default Deny – Decision starts with Deny. All permissions are implicitly denied by default.
  2. Explicit Deny Check – IAM checks all applicable policies for an explicit deny. An explicit deny in ANY policy overrides everything and access is denied.
  3. SCPs – If the account is in an AWS Organization with SCPs enabled, the action must be allowed (not denied) by the SCP. If the SCP doesn’t allow it, access is denied.
  4. RCPs – If Resource Control Policies are enabled, they are evaluated. If the RCP denies access to the resource, access is denied.
  5. Resource-Based Policies – If the resource has a resource-based policy that allows the principal, this can grant access (within the same account, union of identity-based and resource-based).
  6. Identity-Based Policies – The identity-based policies attached to the principal are evaluated for an explicit allow.
  7. Permissions Boundaries – If a permissions boundary is set, the action must be allowed by both the identity-based policy AND the permissions boundary.
  8. Session Policies – For federated users or assumed roles with session policies, the session policy further restricts permissions.

Key Evaluation Rules

  • Explicit Deny – An explicit deny in any policy always wins. It overrides any allow.
  • Explicit Allow – Permission must be explicitly allowed. For granting the User any permission, the permission must be explicitly allowed by applicable policy types.
  • Implicit Deny – If neither an explicit deny nor explicit allow policy exists, it reverts to the default: implicit deny.
  • Same Account Union – Within the same account, identity-based and resource-based policies form a union. Either one allowing is sufficient (unless explicitly denied).
  • Cross-Account Intersection – For cross-account access, both the identity-based policy in the source account AND the resource-based policy on the target resource must allow the action.

IAM Policy Variables

  • Policy variables provide a feature to specify placeholders in a policy.
  • When the policy is evaluated, the policy variables are replaced with values that come from the request itself
  • Policy variables allow a single policy to be applied to a group of users to control access for e.g. all user having access to S3 bucket folder with their name only
  • Policy variable is marked using a $ prefix followed by a pair of curly braces ({ }). Inside the ${ } characters, with the name of the value from the request that you want to use in the policy
  • Policy variables work only with policies defined with Version 2012-10-17
  • Policy variables can only be used in the Resource element and in string comparisons in the Condition element
  • Policy variables include variables like aws:username, aws:userid, aws:SourceIp, aws:CurrentTime etc.
  • Context key names are NOT case-sensitive. For example, including aws:SourceIP context key is equivalent to testing for AWS:SourceIp. However, context key values may be case-sensitive depending on the condition operator used.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. IAM’s Policy Evaluation Logic always starts with a default ____________ for every request, except for those that use the AWS account’s root security credentials b
    1. Permit
    2. Deny
    3. Cancel
  2. An organization has created 10 IAM users. The organization wants each of the IAM users to have access to a separate DynamoDB table. All the users are added to the same group and the organization wants to setup a group level policy for this. How can the organization achieve this?
    1. Define the group policy and add a condition which allows the access based on the IAM name
    2. Create a DynamoDB table with the same name as the IAM user name and define the policy rule which grants access based on the DynamoDB ARN using a variable
    3. Create a separate DynamoDB database for each user and configure a policy in the group based on the DB variable
    4. It is not possible to have a group level policy which allows different IAM users to different DynamoDB Tables
  3. An organization has setup multiple IAM users. The organization wants that each IAM user accesses the IAM console only within the organization and not from outside. How can it achieve this?
    1. Create an IAM policy with the security group and use that security group for AWS console login
    2. Create an IAM policy with a condition which denies access when the IP address range is not from the organization
    3. Configure the EC2 instance security group which allows traffic only from the organization’s IP range
    4. Create an IAM policy with VPC and allow a secure gateway between the organization and AWS Console
  4. Can I attach more than one policy to a particular entity?
    1. Yes always
    2. Only if within GovCloud
    3. No
    4. Only if within VPC
  5. A __________ is a document that provides a formal statement of one or more permissions.
    1. policy
    2. permission
    3. Role
    4. resource
  6. A __________ is the concept of allowing (or disallowing) an entity such as a user, group, or role some type of access to one or more resources.
    1. user
    2. AWS Account
    3. resource
    4. permission
  7. True or False: When using IAM to control access to your RDS resources, the key names that can be used are case sensitive. For example, aws:CurrentTime is NOT equivalent to AWS:currenttime.
    1. TRUE
    2. FALSE (Context key names are NOT case-sensitive. aws:CurrentTime IS equivalent to AWS:currenttime. However, context key values may be case-sensitive depending on the condition operator. Refer IAM Condition documentation)
  8. A user has set an IAM policy where it allows all requests if a request from IP 10.10.10.1/32. Another policy allows all the requests between 5 PM to 7 PM. What will happen when a user is requesting access from IP 10.10.10.1/32 at 6 PM?
    1. IAM will throw an error for policy conflict
    2. It is not possible to set a policy based on the time or IP
    3. It will deny access
    4. It will allow access
  9. Which of the following are correct statements with policy evaluation logic in AWS Identity and Access Management? Choose 2 answers.
    1. By default, all requests are denied
    2. An explicit allow overrides an explicit deny
    3. An explicit allow overrides default deny
    4. An explicit deny does not override an explicit allow
    5. By default, all request are allowed
  10. A web design company currently runs several FTP servers that their 250 customers use to upload and download large graphic files. They wish to move this system to AWS to make it more scalable, but they wish to maintain customer privacy and keep costs to a minimum. What AWS architecture would you recommend? [PROFESSIONAL]
    1. Ask their customers to use an S3 client instead of an FTP client. Create a single S3 bucket. Create an IAM user for each customer. Put the IAM Users in a Group that has an IAM policy that permits access to subdirectories within the bucket via use of the ‘username’ Policy variable.
    2. Create a single S3 bucket with Reduced Redundancy Storage turned on and ask their customers to use an S3 client instead of an FTP client. Create a bucket for each customer with a Bucket Policy that permits access only to that one customer. (Creating bucket for each user is not a scalable model, also 100 buckets are a limit earlier without extending which has since changed link)
    3. Create an auto-scaling group of FTP servers with a scaling policy to automatically scale-in when minimum network traffic on the auto-scaling group is below a given threshold. Load a central list of ftp users from S3 as part of the user Data startup script on each Instance (Expensive)
    4. Create a single S3 bucket with Requester Pays turned on and ask their customers to use an S3 client instead of an FTP client. Create a bucket tor each customer with a Bucket Policy that permits access only to that one customer. (Creating bucket for each user is not a scalable model, also 100 buckets are a limit earlier without extending which has since changed link)

New Practice Questions – Updated 2025

  1. Which AWS Organizations policy type is used to centrally restrict external access to AWS resources across an organization?
    1. Service Control Policy (SCP)
    2. Resource Control Policy (RCP)
    3. Permissions Boundary
    4. Declarative Policy

    RCPs provide centralized preventative controls on AWS resources, restricting external access at scale. SCPs control what principals can do, while RCPs control what can be done to resources.

  2. A security team wants to ensure that no AMIs can be publicly shared across their entire AWS Organization, even when new APIs are added. Which policy type should they use?
    1. Service Control Policy (SCP)
    2. Resource Control Policy (RCP)
    3. Declarative Policy
    4. Identity-based Policy

    Declarative policies express desired end state and are maintained even when services add new APIs or features, making them ideal for enforcing configurations like disallowing public AMI sharing.

  3. Which IAM Access Analyzer feature uses automated reasoning to detect policies that grant public access to a resource before deployment?
    1. External Access Analysis
    2. Unused Access Analysis
    3. Custom Policy Checks – Check No Public Access
    4. Policy Generation

    Custom Policy Checks (including Check No Public Access, launched July 2024) use automated reasoning and can be integrated into CI/CD pipelines for proactive policy validation.

  4. In the IAM policy evaluation logic, what is the relationship between identity-based policies and resource-based policies within the same account?
    1. Both must allow the action (intersection)
    2. Either one allowing is sufficient (union)
    3. Resource-based policy always takes precedence
    4. Identity-based policy always takes precedence

    Within the same account, identity-based and resource-based policies form a union. If either one allows the request, access is granted (unless explicitly denied by any policy).

  5. How many policy types does AWS currently support? (Select the correct answer)
    1. 5
    2. 6
    3. 7
    4. 9 (identity-based, resource-based, VPC endpoint policies, permissions boundaries, SCPs, RCPs, ACLs, RAM resource shares, and session policies)
  6. What is the key difference between SCPs and RCPs in AWS Organizations?
    1. SCPs apply to resources while RCPs apply to principals
    2. SCPs control maximum permissions for principals while RCPs control maximum permissions for resources
    3. RCPs can grant permissions while SCPs cannot
    4. SCPs apply only to the management account while RCPs apply to member accounts

    SCPs define maximum permissions for IAM users and roles (principals), while RCPs define maximum permissions for resources. Neither grants permissions. Both apply to member accounts only.

  7. A security administrator wants to identify IAM roles that have permissions to access S3 but haven’t used those permissions in 90 days. Which feature should they use?
    1. IAM Access Analyzer – External Access Analysis
    2. IAM Access Analyzer – Unused Access Analysis
    3. IAM Access Analyzer – Custom Policy Checks
    4. IAM Policy Simulator

    Unused Access Analysis identifies unused roles, unused access keys, unused passwords, and unused service/action-level permissions by analyzing CloudTrail activity.

AWS DynamoDB Advanced Features

AWS DynamoDB Advanced Features

  • DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.
  • DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed.
  • DynamoDB Global Tables is a fully managed, multi-active, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
  • DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.
  • DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from ms to µs – even at millions of requests per second.
  • DynamoDB Zero-ETL Integrations provide seamless data replication to analytics services like Amazon Redshift, Amazon OpenSearch Service, and Amazon SageMaker Lakehouse without building ETL pipelines.
  • VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.
  • DynamoDB Warm Throughput provides visibility into the throughput your tables and indexes can instantly support and allows pre-warming for anticipated traffic spikes.

DynamoDB Secondary Indexes

  • DynamoDB Secondary indexes on a table allow efficient access to data with attributes other than the primary key.
  • Global secondary index – an index with a partition key and a sort key that can be different from those on the base table.
  • Local secondary index – an index that has the same partition key as the base table, but a different sort key.

DynamoDB TTL

  • DynamoDB Time to Live (TTL) enables a per-item timestamp to determine when an item is no longer needed.
  • After the date and time of the specified timestamp, DynamoDB deletes the item from the table without consuming any write throughput.
  • DynamoDB TTL is provided at no extra cost and can help reduce data storage by retaining only required data.
  • Items that are deleted from the table are also removed from any local secondary index and global secondary index in the same way as a DeleteItem operation.
  • DynamoDB typically deletes expired items within a few days of their expiration. Items with valid, expired TTL attributes may still be updated, including changing or removing their TTL attributes, while pending deletion.
  • DynamoDB Stream tracks the delete operation as a system delete and not a regular delete.
  • TTL is useful if the stored items lose relevance after a specific time. for e.g.
    • Remove user or sensor data after a year of inactivity in an application
    • Archive expired items to an S3 data lake via DynamoDB Streams and AWS Lambda.
    • Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.

DynamoDB Global Tables

  • DynamoDB Global Tables is a fully managed, serverless, multi-active, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
  • Applications can perform reads and writes to DynamoDB in AWS regions around the world, with changes in any region propagated to every region where a table is replicated.
  • Global Tables help in building applications to take advantage of data locality to reduce overall latency.
  • Global Tables provides up to 99.999% availability and increased application resiliency.
  • Global Tables uses the Last Write Wins approach for conflict resolution.
  • Global Tables requires DynamoDB streams enabled with New and Old image settings.
  • Global Tables supports both same-account and multi-account replication models (multi-account GA Feb 2026).

Global Tables – Multi-Region Strong Consistency (MRSC)

  • DynamoDB Global Tables now supports Multi-Region Strong Consistency (MRSC), generally available as of June 2025.
  • MRSC enables applications to always read the latest data from any Region, achieving zero Recovery Point Objective (RPO).
  • Provides the highest level of application resilience, removing the need to manage strongly consistent replication manually.
  • Ideal for global applications with strict consistency requirements such as user profile management, inventory tracking, and financial transaction processing.
  • Available in: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland, London, Paris, Frankfurt), Asia Pacific (Tokyo, Seoul, Osaka).
  • Note: Global tables configured for MRSC do not support the multi-account model.

Global Tables – Multi-Region Eventual Consistency (MREC)

  • Default replication mode providing eventual consistency for cross-region reads.
  • Supports strong consistency for same-region reads.
  • Supports both same-account and multi-account replication models.

Global Tables – Multi-Account Replication

  • DynamoDB Global Tables now supports replication across multiple AWS accounts (GA Feb 2026).
  • Adds account-level isolation for stronger resiliency and limits the impact of misconfigurations, security incidents, or account-level issues.
  • Multi-account global tables replicate data across AWS Regions and accounts, providing the same active-active functionality as same-account global tables.
  • Both models support multi-Region writes, asynchronous replication, last-writer-wins conflict resolution, and the same billing model.
  • They differ in how accounts, permissions, encryption, and table governance are managed.
  • Multi-account global tables support only Multi-Region Eventual Consistency (MREC), not MRSC.

Global Tables – Pricing (Nov 2024 Update)

  • Effective November 1, 2024, DynamoDB reduced global tables pricing by up to 67% for on-demand tables (replicated write pricing).
  • For provisioned capacity tables, replicated write pricing was reduced by 33%.
  • After the price reduction, replicated write cost (rWCU/rWRU) is now priced identically to standard single-region WCU/WRU.

Global Tables – AWS FIS Integration

  • DynamoDB supports an AWS Fault Injection Service (FIS) action to pause global table replication (April 2024).
  • Enables simulation and observation of application response to Regional replication pauses.
  • Helps fine-tune monitoring and recovery processes for improved resiliency and availability.

DynamoDB Streams

  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
  • DynamoDB Streams stores the data for the last 24 hours, after which they are erased.
  • DynamoDB Streams maintains an ordered sequence of the events per item however, sequence across items is not maintained.
  • Example
    • For e.g., suppose that you have a DynamoDB table tracking high scores for a game and that each item in the table represents an individual player. If you make the following three updates in this order:
      • Update 1: Change Player 1’s high score to 100 points
      • Update 2: Change Player 2’s high score to 50 points
      • Update 3: Change Player 1’s high score to 125 points
    • DynamoDB Streams will maintain the order for Player 1 score events. However, it would not maintain order across the players. So Player 2 score event is not guaranteed between the 2 Player 1 events
  • DynamoDB Streams APIs help developers consume updates and receive the item-level data before and after items are changed.
  • DynamoDB Streams allow reads at up to twice the rate of the provisioned write capacity of the DynamoDB table.
  • DynamoDB Streams have to be enabled on a per-table basis.
  • DynamoDB streams support Encryption at rest to encrypt the data.
  • DynamoDB Streams is designed for No Duplicates so that every update made to the table will be represented exactly once in the stream.
  • DynamoDB Streams writes stream records in near-real time so that applications can consume these streams and take action based on the contents.
  • DynamoDB streams can be used for multi-region replication to keep other data stores up-to-date with the latest changes to DynamoDB or to take actions based on the changes made to the table
  • DynamoDB stream records can be processed using Kinesis Data Streams, Lambda, KCL application, or Amazon Managed Service for Apache Flink.
  • DynamoDB Streams now supports resource-based policies (March 2024), enabling cross-account stream access without complex IAM role configurations.
  • DynamoDB Streams supports AWS PrivateLink interface endpoints (December 2024), enabling private access to streams over private IP addresses within a VPC.

DynamoDB Streams vs Kinesis Data Streams for DynamoDB

  • DynamoDB offers two streaming models for change data capture (CDC):
    • DynamoDB Streams – Built-in, 24-hour retention, tightly integrated with DynamoDB, ideal for Lambda triggers and event-driven architectures.
    • Kinesis Data Streams for DynamoDB – More flexible retention (up to 365 days), higher throughput, supports multiple consumers, ideal for complex downstream processing pipelines.
  • Kinesis Data Streams captures item-level modifications and replicates them to a Kinesis data stream, allowing continuous capture and storage of terabytes of data per hour.
  • Choose DynamoDB Streams for simpler use cases (Lambda triggers, Global Tables). Choose Kinesis Data Streams for higher throughput, longer retention, or multiple consumers.

DynamoDB Triggers

  • DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.
  • DynamoDB triggers can be used in scenarios like sending notifications, updating an aggregate table, and connecting DynamoDB tables to other data sources.
  • DynamoDB Trigger flow
    • Custom logic for a DynamoDB trigger is stored in an AWS Lambda function as code.
    • A trigger for a given table can be created by associating an AWS Lambda function to the stream (via DynamoDB Streams) on a table.
    • When the table is updated, the updates are published to DynamoDB Streams.
    • In turn, AWS Lambda reads the updates from the associated stream and executes the code in the function.

DynamoDB Backup and Restore

  • DynamoDB on-demand backup helps create full backups of the tables for long-term retention, and archiving for regulatory compliance needs.
  • Backup and restore actions run with no impact on table performance or availability.
  • Backups are preserved regardless of table deletion and retained until they are explicitly deleted.
  • On-demand backups are cataloged, and discoverable.
  • On-demand backups can be created using
    • DynamoDB
      • DynamoDB on-demand backups cannot be copied to a different account or Region.
    • AWS Backup (Recommended)
      • is a fully managed data protection service that makes it easy to centralize and automate backups across AWS services, in the cloud, and on-premises
      • provides enhanced backup features
      • can configure backup schedule, policies and monitor activity for the AWS resources and on-premises workloads in one place.
      • can copy the on-demand backups across AWS accounts and Regions,
      • encryption using an AWS KMS key that is independent of the DynamoDB table encryption key.
      • apply write-once-read-many (WORM) setting for the backups using the AWS Backup Vault Lock policy.
      • add cost allocation tags to on-demand backups, and
      • transition on-demand backups to cold storage for lower costs.

DynamoDB PITR – Point-In-Time Recovery

  • DynamoDB point-in-time recovery – PITR enables automatic, continuous, incremental backup of the table with per-second granularity.
  • PITR helps protect against accidental writes and deletes.
  • PITR can back up tables with hundreds of terabytes of data with no impact on the performance or availability of the production applications.
  • PITR-enabled tables that were deleted can be recovered in the preceding 35 days and restored to their state just before they were deleted.
  • Configurable Recovery Period (Jan 2025): PITR now supports configurable recovery periods. You can set the PITR period for each table between 1 to 35 days (default remains 35 days). This helps meet data compliance and regulatory requirements that need shorter retention periods.
  • Shortening the RecoveryPeriodInDays has no impact on PITR pricing because the price is based on the size of table and local secondary indexes.

DynamoDB Table Deletion Protection

  • DynamoDB supports table deletion protection (March 2023) to prevent accidental deletion during regular maintenance operations.
  • When deletion protection is enabled, the table cannot be deleted via the AWS Management Console, AWS CLI, or API calls unless the feature is explicitly disabled first.
  • Authorized administrators can set the deletion protection property when creating new tables or managing existing tables.
  • Complements other protection strategies like IAM policies, CloudFormation deletion policies, and PITR.

DynamoDB Import and Export

Export to S3

  • DynamoDB supports full and incremental exports to Amazon S3 from tables with PITR enabled.
  • Full Export: Exports the complete table data at any point in time within the PITR recovery window.
  • Incremental Export (Sep 2023): Exports only data that was inserted, updated, or deleted between two specified points in time. Enables efficient CDC pipelines without full table exports.
  • Exports do not affect the read capacity or availability of the table.
  • Data can be exported in DynamoDB JSON or Amazon Ion format.
  • Export per-second granularity for any point in the last 35 days (configurable with PITR recovery period).

Import from S3

  • DynamoDB Import allows importing data from an Amazon S3 bucket to a new DynamoDB table.
  • Supports up to 50,000 S3 objects in a single bulk import (increased from previous limits in March 2024).
  • Removes the need to consolidate S3 objects prior to running a bulk import.

DynamoDB Zero-ETL Integrations

  • DynamoDB offers zero-ETL integrations that seamlessly replicate data to analytics services without building or managing ETL pipelines.

Zero-ETL with Amazon Redshift (GA Oct 2024)

  • Automatically replicates DynamoDB tables into Amazon Redshift within minutes of data being written.
  • Enables SQL queries and analytics on DynamoDB data without complex ETL processes.
  • Supports Amazon Redshift Serverless workgroups or provisioned clusters using RA3 instance types.
  • Data replication begins within a few minutes of changes being written to DynamoDB.

Zero-ETL with Amazon OpenSearch Service (GA Jul 2024)

  • Provides near real-time data replication from DynamoDB to OpenSearch Service using the DynamoDB plugin for OpenSearch Ingestion.
  • Uses DynamoDB export to S3 for initial snapshot loading, then DynamoDB Streams for real-time change replication.
  • Enables powerful full-text search, vector search, and complex analytics on DynamoDB data.
  • Fully managed, code-free solution for seamless data synchronization.

Zero-ETL with Amazon SageMaker Lakehouse (Dec 2024)

  • Automates extracting and loading data from DynamoDB tables into SageMaker Lakehouse.
  • Enables analytics and ML workloads using integrated access control and Apache Iceberg for data interoperability.

Zero-ETL with Amazon S3 Tables (Jul 2025)

  • AWS Glue supports zero-ETL integrations from DynamoDB to S3 Table-backed data lakes.
  • Efficiently extracts and loads data for analysis in S3 Tables.

DynamoDB Warm Throughput

  • DynamoDB warm throughput (November 2024) provides visibility into the number of read and write operations your tables and indexes can readily handle.
  • Pre-warming allows proactively increasing the warm throughput value to meet anticipated future traffic demands.
  • Warm throughput values are available for all provisioned and on-demand tables and indexes at no cost.
  • Pre-warming your table’s throughput incurs a charge.
  • Warm throughput is not a maximum limit; it represents a minimum throughput the table can handle instantly.
  • DynamoDB dynamically increases warm throughput as applications grow, offering consistent performance at any scale.
  • Ideal for anticipated traffic spikes such as product launches, flash sales, or planned events.
  • Pre-warming is an asynchronous operation; you can carry out other table updates while pre-warming is in progress.

DynamoDB Configurable Maximum Throughput

  • DynamoDB supports configurable maximum throughput for on-demand tables (May 2024).
  • Allows optionally setting maximum read or write (or both) throughput for individual on-demand tables and associated secondary indexes.
  • Requests exceeding the maximum throughput are automatically throttled.
  • Provides predictable cost management and protection against accidental surging in consumed resources.
  • Safeguards downstream services with fixed capacity from potential overloading.
  • Maximum throughput values can be modified as needed based on application requirements.

DynamoDB Accelerator – DAX

  • DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.
  • DAX is intended for high-performance read applications. As a write-through cache, DAX writes directly so that the writes are immediately reflected in the item cache.
  • DAX as a managed service handles the cache invalidation, data population, or cluster management.
  • DAX provides API-compatible with DynamoDB. Therefore, it requires only minimal functional changes to use with an existing application.
  • DAX saves costs by reducing the read load (RCU) on DynamoDB.
  • DAX helps prevent hot partitions.
  • DAX only supports eventual consistency, and strong consistency requests are passed-through to DynamoDB.
  • DAX is fault-tolerant and scalable.
  • DAX cluster has a primary node and zero or more read-replica nodes. Upon a failure for a primary node, DAX will automatically failover and elect a new primary. For scaling, add or remove read replicas.
  • DAX supports server-side encryption.
  • DAX also supports encryption in transit, ensuring that all requests and responses between the application and the cluster are encrypted by TLS, and connections to the cluster can be authenticated by verification of a cluster x509 certificate.
  • DAX now supports R7i instances (April 2025), powered by 4th Gen Intel Xeon Scalable processors, with instance sizes up to 24xlarge and DDR5 memory.
  • DAX now supports AWS PrivateLink (October 2025), enabling secure access to DAX management APIs (CreateCluster, DescribeClusters, DeleteCluster) over private IP addresses within a VPC.
  • DAX SDK for JavaScript version 3 is now available (March 2025).

DynamoDB Accelerator - DAX

DynamoDB Security Features

Resource-Based Policies (March 2024)

  • DynamoDB supports resource-based policies for tables, indexes, and streams.
  • Allows specifying IAM principals and their permitted actions directly on DynamoDB resources.
  • Simplifies cross-account access control without requiring complex IAM role assumptions.
  • Integrates with AWS IAM Access Analyzer and Block Public Access capabilities.
  • Available in all AWS commercial Regions and GovCloud at no additional cost.

Attribute-Based Access Control – ABAC (Nov 2024 GA)

  • DynamoDB supports ABAC for tables and indexes.
  • ABAC defines access permissions based on tags attached to users, roles, and AWS resources.
  • Uses tag-based conditions in IAM policies to allow or deny specific actions.
  • Automatically applies tag-based permissions to new employees and changing resource structures without rewriting policies.

AWS PrivateLink (March 2024)

  • DynamoDB supports AWS PrivateLink (Interface VPC Endpoints) for private connectivity without public IP addresses.
  • Compatible with AWS Direct Connect and AWS VPN for end-to-end private network connectivity.
  • Eliminates the need for internet gateway or firewall rule configuration for DynamoDB access from on-premises.
  • Supports FIPS 140-3 compliant interface VPC endpoints and Streams endpoints (Dec 2024).

VPC Endpoints

  • DynamoDB supports both Gateway endpoints and Interface endpoints (PrivateLink):
    • Gateway Endpoints: Free, adds route table entries to direct traffic to DynamoDB. Ideal for VPC-to-DynamoDB access with no additional cost.
    • Interface Endpoints (PrivateLink): Creates an ENI with private IP. Supports Direct Connect and VPN. Has per-hour and per-GB costs. Ideal for on-premises-to-DynamoDB access.
  • VPC Gateway endpoints for DynamoDB improve privacy and security, especially those dealing with sensitive workloads with compliance and audit requirements, by enabling private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.
  • VPC endpoints for DynamoDB support IAM policies to simplify DynamoDB access control, where access can be restricted to a specific VPC endpoint.
  • VPC endpoints can be created only for Amazon DynamoDB tables in the same AWS Region as the VPC.
  • DynamoDB Streams can be accessed using Interface endpoints (PrivateLink) only, not Gateway endpoints.

VPC Gateway Endpoints

DynamoDB Pricing Updates (Nov 2024)

  • Effective November 1, 2024, DynamoDB reduced on-demand throughput pricing by 50%.
  • Global tables pricing reduced by up to 67% for on-demand and 33% for provisioned.
  • DynamoDB offers two table classes:
    • DynamoDB Standard: Default table class, optimized for balanced throughput and storage costs.
    • DynamoDB Standard-IA: Reduces storage costs by up to 60% ($0.10/GB vs $0.25/GB) for infrequently accessed data. Higher read/write costs (~25% higher).
  • Standard-IA is ideal when storage is the dominant cost and access patterns are infrequent.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the services supported by VPC endpoints, using Gateway endpoint type? Choose 2 answers
    1. Amazon S3
    2. Amazon EFS
    3. Amazon DynamoDB
    4. Amazon Glacier
    5. Amazon SQS
  2. A company has setup an application in AWS that interacts with DynamoDB. DynamoDB is currently responding in milliseconds, but the application response guidelines require it to respond within microseconds. How can the performance of DynamoDB be further improved? [SAA-C01]
    1. Use ElastiCache in front of DynamoDB
    2. Use DynamoDB inbuilt caching
    3. Use DynamoDB Accelerator
    4. Use RDS with ElastiCache instead
  3. A company runs a global application that requires strong consistency for reads across all regions. Which DynamoDB feature should be used?
    1. DynamoDB Streams with Lambda replication
    2. DynamoDB Global Tables with eventual consistency
    3. DynamoDB Global Tables with Multi-Region Strong Consistency (MRSC)
    4. DynamoDB with ElastiCache in each region
  4. A company needs to run analytics on DynamoDB data using SQL queries without building ETL pipelines. Which solution requires the least operational overhead?
    1. Export DynamoDB to S3 and query with Athena
    2. Use DynamoDB Streams to replicate to Aurora
    3. Use DynamoDB zero-ETL integration with Amazon Redshift
    4. Use AWS Glue to copy data to Redshift nightly
  5. A company anticipates a major traffic spike during a product launch and wants to ensure their DynamoDB on-demand table can handle the increased load immediately. What feature should they use?
    1. Switch to provisioned capacity mode
    2. Enable DynamoDB Auto Scaling
    3. Pre-warm the table using warm throughput
    4. Add a DAX cluster
  6. A company needs to grant a partner account access to specific DynamoDB tables without creating IAM roles in the partner account. What is the most efficient approach?
    1. Create a cross-account IAM role
    2. Use DynamoDB resource-based policies
    3. Share tables using AWS RAM
    4. Replicate data to the partner account
  7. A company wants to configure DynamoDB PITR with a 7-day recovery window to comply with data minimization regulations. Is this possible?
    1. No, PITR always retains 35 days of backups
    2. Yes, PITR now supports configurable recovery periods between 1-35 days
    3. No, you must use on-demand backups for shorter retention
    4. Yes, but only with AWS Backup
  8. Which DynamoDB streaming option provides retention of up to 365 days and supports multiple consumers? [SAA-C03]
    1. DynamoDB Streams
    2. Kinesis Data Streams for DynamoDB
    3. DynamoDB Triggers
    4. Amazon EventBridge

References

AWS RDS DB Maintenance & Upgrades

RDS DB Maintenance and Upgrades

  • Changes to a DB instance can occur when a DB instance is manually modified for e.g. DB engine version is upgraded, or when RDS performs maintenance on an instance

RDS Maintenance

  • RDS performs periodic maintenance on RDS resources, such as DB instances, and most often involves updates to the DB instance’s operating system (OS).
  • Maintenance items can either
    • be applied manually on a DB instance at one’s convenience
    • or wait for the automatic maintenance process initiated by RDS during the defined weekly maintenance window.
  • Maintenance window only determines when pending operations start but does not limit the total execution time of these operations.
  • Maintenance operations are not guaranteed to finish before the maintenance window ends and can continue beyond the specified end time.
  • Maintenance update availability can be checked both on the RDS console and by using the RDS API. And if an update is available, one can
    • Defer the maintenance items.
    • Apply the maintenance items immediately.
    • Schedule them to start during the next defined maintenance window
  • Maintenance items marked as
    • Required cannot be deferred indefinitely, if deferred AWS will send a notify the time when the update will be performed next
    • Available and can be deferred indefinitely and the update will not be applied to the DB instance.
  • Required patching is automatically scheduled only for patches that are related to security and instance reliability. Such patching occurs infrequently (typically once every few months) and seldom requires more than a fraction of your maintenance window.
  • Maintenance items require that RDS take the DB instance offline for a short time. Maintenance that requires DB instances to be offline includes scale compute operations, which generally take only a few minutes from start to finish, and required operating system or database patching.
  • Multi-AZ deployment for the DB instance reduces the impact of a maintenance event by following these steps:
    • Perform maintenance on standby.
    • Promote the standby to primary.
    • Perform maintenance on the old primary, which becomes the new standby.
  • When the database engine for the DB instance is modified in a Multi-AZ deployment, RDS upgrades both the primary and secondary DB instances at the same time. In this case, the database engine for the entire Multi-AZ deployment is shut down during the upgrade.

Multi-AZ DB Cluster Maintenance (Two Readable Standbys)

  • RDS Multi-AZ DB clusters (with two readable standbys) provide significantly reduced downtime during maintenance compared to traditional Multi-AZ DB instances.
  • Minor version upgrades and system maintenance updates can be completed with typically 35 seconds or less of downtime using Multi-AZ DB clusters.
  • Downtime can be further reduced to typically 1 second or less when combined with:
    • Amazon RDS Proxy
    • AWS Advanced JDBC Wrapper Driver
    • PgBouncer (for PostgreSQL)
    • ProxySQL (for MySQL)
  • During maintenance on a Multi-AZ DB cluster:
    • Maintenance is applied to the reader instances first
    • One of the reader instances is promoted to writer
    • Maintenance is then applied to the former writer
    • The process results in an automatic failover with minimal downtime

Operating System Updates

  • Upgrades to the operating system are most often for security issues and should be done as soon as possible.
  • OS updates on a DB instance can be applied at one’s convenience or can wait for the maintenance process initiated by RDS to apply the update during the defined maintenance window
  • DB instance is not automatically backed up when an OS update is applied and should be backup up before the update is applied

Database Engine Version Upgrade

  • DB instance engine version can be upgraded when a new DB engine version is supported by RDS.
  • Database version upgrades consist of major and minor version upgrades.
    • Major database version upgrades
      • can contain changes that are not backward-compatible
      • RDS doesn’t apply major version upgrades automatically
      • DB instance should be manually modified and thoroughly tested before applying it to the production instances.
      • RDS Blue/Green Deployments can be used to perform major version upgrades with minimal downtime (typically under 5 seconds with direct endpoint connections, or under 2 seconds with AWS Advanced JDBC Driver).
    • Minor version upgrades
      • Each DB engine handles minor version upgrade slightly differently
        for e.g. RDS automatically apply minor version upgrades to a DB instance running PostgreSQL, but must be manually applied to a DB instance running Oracle.
      • Auto Minor Version Upgrade (AmVU) setting controls whether RDS automatically applies minor version upgrades during the maintenance window.
      • In cases of critical security issues or when a version reaches its end-of-support date, RDS might apply a minor version upgrade even if Auto Minor Version Upgrade is disabled.
  • Amazon posts an announcement to the forums announcement page and sends a customer e-mail notification before upgrading an DB instance
  • Amazon schedule the upgrades at specific times through the year, to help plan around them, because downtime is required to upgrade a DB engine version, even for Multi-AZ instances.
  • RDS takes two DB snapshots during the upgrade process.
    • First DB snapshot is of the DB instance before any upgrade changes have been made. If the upgrade fails, it can be restored from the snapshot to create a DB instance running the old version.
    • Second DB snapshot is taken when the upgrade completes. After the upgrade is complete, database engine can’t be reverted to the previous version. For returning to the previous version, restore the first DB snapshot taken to create a new DB instance.
  • If the DB instance is using read replication, all of the Read Replicas must be upgraded before upgrading the source instance.
  • If the DB instance is in a Multi-AZ deployment, both the primary and standby replicas are upgraded at the same time and would result in an outage. The time for the outage varies based on your database engine, version, and the size of your DB instance.

AWS Organizations Upgrade Rollout Policy

  • AWS Organizations now supports upgrade rollout policies to centrally manage and stagger automatic minor version upgrades across multiple database resources and AWS accounts.
  • Allows defining upgrade sequences using simple orders (first, second, last) applied through account-level policies or resource tags.
  • When new minor versions become eligible for automatic upgrade, the policy ensures upgrades start with development environments first, allowing validation before proceeding to production.
  • Key features:
    • AWS Health notifications between phases for monitoring progress
    • Built-in validation periods to ensure stability
    • Ability to disable automatic progression at any time if issues are detected
  • Eliminates operational overhead of coordinating upgrades manually or through custom tools across hundreds of resources and accounts.
  • Available in all AWS commercial Regions and AWS GovCloud (US) Regions.
  • Manageable via AWS Management Console, CLI, SDKs, CloudFormation, or CDK.

RDS Maintenance Window

  • Every DB instance has a weekly maintenance window defined during which any system changes are applied.
  • Maintenance window is an opportunity to control when DB instance modifications and software patching occur, in the event either are requested or required.
  • If a maintenance event is scheduled for a given week, it will be initiated during the 30-minute maintenance window as defined
  • Maintenance events mostly complete during the 30-minute maintenance window, although larger maintenance events may take more time
  • 30-minute maintenance window is selected at random from an 8-hour block of time per region. If you don’t specify a preferred maintenance window when you create the DB instance, Amazon RDS assigns a 30-minute maintenance window on a randomly selected day of the week.
  • RDS will consume some of the resources on the DB instance while maintenance is being applied, minimally affecting performance.
  • For some maintenance events, a Multi-AZ failover may be required for a maintenance update to be complete.

RDS Blue/Green Deployments for Upgrades and Maintenance

  • RDS Blue/Green Deployments provide a safer and faster method for performing database upgrades and maintenance with minimal downtime.
  • A Blue/Green Deployment creates a staging environment (Green) that mirrors the production environment (Blue) and keeps them synchronized.
  • Supported use cases include:
    • Major version database engine upgrades
    • Minor version upgrades
    • Maintenance updates and OS patching
    • Instance scaling (compute changes)
    • Storage volume shrink (since November 2024) – the only native way to reduce RDS allocated storage
    • Parameter group changes
  • Switchover downtime is typically under 5 seconds for single-Region configurations (as of January 2026).
  • Applications using the AWS Advanced JDBC Driver typically see under 2 seconds of downtime due to eliminated DNS propagation delays.
  • No application endpoint changes are required after switchover.
  • Supported for Amazon Aurora and Amazon RDS engines including PostgreSQL, MySQL, and MariaDB in all AWS regions.
  • For PostgreSQL:
    • Uses physical replication by default for minor version upgrades (faster, lower lag)
    • Uses logical replication when performing major version upgrades
  • Blue/Green Deployments also support Aurora Global Databases (as of November 2025), including primary and all secondary regions.

RDS Extended Support

  • Amazon RDS Extended Support provides up to three additional years of critical security and bug fixes beyond a major version’s end of standard support date.
  • Available for Amazon Aurora MySQL, Aurora PostgreSQL, RDS for MySQL, and RDS for PostgreSQL.
  • Gives flexibility to upgrade, migrate, and transform databases at your own pace while still receiving security fixes from AWS.
  • Key characteristics:
    • Charged as a per vCPU-hour fee in addition to regular RDS instance charges.
    • Pricing increases over time: Year 1 and Year 2 are at the same rate, Year 3 pricing is doubled (approximately).
    • Charges apply to all instances including replicas and standbys in Multi-AZ configurations.
    • Reserved Instance discounts do NOT apply to Extended Support charges.
    • No charge for DB snapshots, but restoring a snapshot to an EOL version will trigger Extended Support enrollment and charges.
  • Automatic enrollment: After a major engine version reaches end of standard support, instances running that version are automatically enrolled in Extended Support and charges begin.
  • Opting out: You can prevent enrollment by specifying open-source-rds-extended-support-disabled for the --engine-lifecycle-support CLI option when creating or restoring instances.
  • Ending enrollment: Upgrade to a newer engine version still under standard support to stop Extended Support charges.
  • Extended Support charges appear as a separate line item on your bill (Usage Type: ‘ExtendedSupport’).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user has launched an RDS MySQL DB with the Multi AZ feature. The user has scheduled the scaling of instance storage during maintenance window. What is the correct order of events during maintenance window? 1. Perform maintenance on standby 2. Promote standby to primary 3. Perform maintenance on original primary 4. Promote original master back as primary
    1. 1, 2, 3, 4
    2. 1, 2, 3
    3. 2, 3, 4, 1
  2. Can I control if and when MySQL based RDS Instance is upgraded to new supported versions?
    1. No
    2. Only in VPC
    3. Yes
  3. A user has scheduled the maintenance window of an RDS DB on Monday at 3 AM. Which of the below mentioned events may force to take the DB instance offline during the maintenance window?
    1. Enabling Read Replica
    2. Making the DB Multi AZ
    3. DB password change
    4. Security patching
  4. A user has launched an RDS postgreSQL DB with AWS. The user did not specify the maintenance window during creation. The user has configured RDS to update the DB instance type from micro to large. If the user wants to have it during the maintenance window, what will AWS do?
    1. AWS will not allow to update the DB until the maintenance window is configured
    2. AWS will select the default maintenance window if the user has not provided it
    3. AWS will ask the user to specify the maintenance window during the update
    4. It is not possible to change the DB size from micro to large with RDS
  5. Can I test my DB Instance against a new version before upgrading?
    1. No
    2. Yes
    3. Only in VPC
  6. A company wants to perform a major version upgrade on their production RDS PostgreSQL database with minimal downtime. Which approach provides the least downtime?
    1. Perform an in-place major version upgrade during the maintenance window
    2. Use RDS Blue/Green Deployments to create a staging environment with the new version and switchover
    3. Create a read replica, promote it, and point the application to it
    4. Take a snapshot, restore it with the new engine version, and switch DNS
  7. An organization is managing hundreds of RDS instances across multiple AWS accounts. They want automatic minor version upgrades to roll out to development environments first before production. Which feature should they use?
    1. AWS Systems Manager Maintenance Windows
    2. RDS Blue/Green Deployments
    3. AWS Organizations Upgrade Rollout Policy
    4. AWS Config remediation rules
  8. A company is running RDS for PostgreSQL 11 which has reached end of standard support. They are not ready to upgrade to a newer version. What happens if they take no action?
    1. The database will be terminated after 90 days
    2. The database will be automatically upgraded to the latest version
    3. The database will be enrolled in RDS Extended Support with additional per-vCPU-hour charges
    4. The database will continue running with no changes or charges
  9. A company uses RDS Multi-AZ DB clusters with two readable standbys and wants to minimize downtime during minor version upgrades. What is the expected downtime?
    1. 5-10 minutes
    2. 1-2 minutes
    3. Typically 35 seconds or less, reducible to under 1 second with RDS Proxy
    4. Zero downtime
  10. Which of the following can be achieved using RDS Blue/Green Deployments? (Select THREE)
    1. Major version engine upgrades
    2. Storage volume shrink
    3. Cross-region replication setup
    4. Instance class scaling
    5. Enabling Multi-AZ on an existing single-AZ instance

References

AWS RDS Storage

AWS RDS Storage

  • RDS storage uses Elastic Block Store – EBS volumes for database and log storage.
  • RDS automatically stripes across multiple EBS volumes to enhance performance, depending on the amount of storage requested and the database engine.
  • For MariaDB, MySQL, PostgreSQL, and Db2 with 400 GiB or more storage, RDS stripes across 4 volumes. For Oracle, this threshold is 200 GiB. SQL Server does not support volume striping.

RDS Storage Types

  • RDS provides three storage types: Provisioned IOPS SSD (io2 Block Express and io1), General Purpose SSD (gp3 and gp2), and Magnetic (legacy, deprecated).
  • These storage types differ in performance characteristics and price, which allows tailoring of storage performance and cost to the database needs.
  • Db2, MySQL, MariaDB, PostgreSQL RDS DB instances can be created with up to 64 TiB of storage.
  • Oracle and SQL Server RDS DB instances support up to 256 TiB of storage with additional storage volumes (up to 64 TiB per volume, with up to 3 additional volumes).
  • RDS for Db2 doesn’t support the gp2 and magnetic storage types.

⚠️ Magnetic Storage Deprecated

Amazon RDS is deprecating magnetic storage on April 30, 2026.

AWS recommends upgrading magnetic storage volumes to gp3 or io2 before April 29, 2026. After April 29, 2026, Amazon RDS will begin forced migration of magnetic storage volumes to gp3 storage volumes.

The default storage type when restoring snapshots of magnetic volumes will be changed to gp3 by June 1, 2026.

Magnetic (Standard) – Legacy, Deprecated

  • Magnetic storage, also called standard storage, is a legacy storage type maintained only for backward compatibility.
  • AWS recommends using General Purpose SSD (gp3) or Provisioned IOPS SSD (io2) for all new storage needs.
  • Limited to a maximum size of 3 TiB and approximately 1,000 IOPS.
  • Does not support storage autoscaling, elastic volumes, or zero-ETL integrations with Amazon Redshift.
  • Does not allow storage type conversion or scaling when using the SQL Server database engine.
  • Magnetic storage is not reserved for a single DB instance, so performance can vary greatly depending on the demands placed on shared resources by other customers.

General Purpose SSD (gp3 – Recommended)

  • General Purpose SSD storage offers cost-effective storage ideal for a broad range of workloads running on medium-sized DB instances.
  • Amazon RDS offers two types of General Purpose storage: gp3 (recommended) and gp2 (previous generation).

gp3 Storage (Recommended)

  • gp3 allows you to customize storage performance independently of storage capacity.
  • Provides a baseline performance of 3,000 IOPS and 125 MiB/s at any volume size.
  • When storage size reaches the striping threshold (400 GiB for MySQL/MariaDB/PostgreSQL/Db2, 200 GiB for Oracle), baseline increases to 12,000 IOPS and 500 MiB/s due to volume striping across 4 volumes.
  • Can provision additional IOPS up to 64,000 IOPS and throughput up to 4,000 MiB/s (for non-SQL Server engines).
  • For SQL Server: up to 16,000 IOPS and 1,000 MiB/s (up to 80,000 IOPS and 2,000 MiB/s with June 2026 update).
  • Storage size ranges from 20 GiB to 65,536 GiB (16,384 GiB for SQL Server per volume).
  • Delivers single-digit millisecond latency consistently 99% of the time.
  • General Purpose gp3 is excellent for development/testing environments and most production workloads that are not latency-sensitive.

gp2 Storage (Previous Generation)

  • gp2 is the previous generation General Purpose SSD storage.
  • Baseline I/O performance of 3 IOPS per GiB, with a minimum of 100 IOPS.
  • Volumes below 1,000 GiB can burst to 3,000 IOPS using I/O credit balance.
  • Maximum of 64,000 IOPS for volumes 4,000 GiB and larger (with striping).
  • Cannot provision IOPS directly – IOPS varies with the allocated storage size.
  • AWS recommends using gp3 for new workloads as it provides predictable baseline performance without relying on burst credits.

Provisioned IOPS SSD (io2 Block Express – Recommended)

  • Provisioned IOPS storage is designed to meet the needs of I/O-intensive workloads, particularly database workloads, that are sensitive to storage performance and consistency in random access I/O throughput.
  • Amazon RDS offers two types of Provisioned IOPS storage: io2 Block Express (recommended) and io1 (previous generation).
  • For any production application that requires fast and consistent I/O performance, Amazon recommends Provisioned IOPS storage.
  • Provisioned IOPS storage is optimized for I/O intensive, online transaction processing (OLTP) workloads that have consistent performance requirements.

io2 Block Express (Recommended)

  • io2 Block Express provides the highest performance within the RDS storage portfolio.
  • Supports up to 256,000 IOPS, 4,000 MiB/s throughput, and up to 64 TiB per volume.
  • Delivers consistent sub-millisecond latency (99.9% of the time) on AWS Nitro-based instances.
  • Provides 99.999% durability (compared to 99.8-99.9% for other volume types).
  • IOPS to storage ratio of up to 1,000:1 on Nitro-based instances (500:1 for non-Nitro).
  • Available at the same price as io1 volumes.
  • You can upgrade from io1 to io2 Block Express without any downtime using the ModifyDBInstance API.
  • RDS delivers within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

io1 Storage (Previous Generation)

  • io1 is the previous generation Provisioned IOPS storage.
  • Supports up to 256,000 IOPS (64,000 for SQL Server) and up to 4,000 MiB/s throughput.
  • IOPS to storage ratio of up to 50:1.
  • Delivers single-digit millisecond latency consistently 99.9% of the time.
  • AWS recommends upgrading to io2 Block Express for better performance, higher durability, and same cost.

Adding Storage and Changing Storage Type

  • DB instance can be modified to use additional storage and converted to a different storage type.
  • Storage allocated for a DB instance cannot be decreased directly (you cannot reduce the amount of storage on a volume after allocation).
  • Storage Volume Shrink (Nov 2024): Amazon RDS Blue/Green Deployments now supports shrinking storage volumes, allowing you to reduce storage size by creating a green environment with smaller storage and switching over.
  • For SQL Server DB instances, you can scale storage for only the General Purpose SSD (gp3/gp2) and Provisioned IOPS SSD (io1/io2) storage types.
  • During the scaling process, the DB instance will be available for reads and writes, but may experience performance degradation.
  • After modifying storage, you can’t make further storage modifications for 6 hours or until storage optimization has completed, whichever is longer.
  • While storage is being added, nightly backups are suspended and no other RDS operations can take place, including modify, reboot, delete, create Read Replica, and create DB Snapshot.

RDS Storage Autoscaling

  • RDS Storage Autoscaling continuously monitors actual storage consumption and scales capacity up automatically when actual utilization approaches provisioned storage capacity.
  • Enabled by setting a Maximum Storage Threshold – the upper limit to which RDS can automatically scale the storage.
  • RDS automatically increases storage when:
    • Free available space is less than 10% of the allocated storage
    • The low-storage condition lasts at least 5 minutes
    • At least 6 hours have passed since the last storage modification
  • Storage autoscaling increases storage by the greater of: 10 GiB, 10% of currently allocated storage, or the predicted storage growth for the next 7 hours based on the past hour’s growth rate.
  • Works with gp2, gp3, io1, and io2 storage types. Does not work with magnetic storage.
  • Storage autoscaling doesn’t scale storage during the storage-optimization state.
  • Supported for Single-AZ and Multi-AZ DB instances. Multi-AZ DB clusters with two readable standbys do not support native storage autoscaling.

Additional Storage Volumes

  • Available for RDS for Oracle and RDS for SQL Server (announced December 2025).
  • You can attach up to 3 additional storage volumes to your DB instance, in addition to the primary volume.
  • Each additional volume can be up to 64 TiB, enabling a total of up to 256 TiB per DB instance.
  • You can choose between gp3 and io2 storage types for each additional volume independently.
  • Allows mixing different storage types to optimize cost and performance based on data access patterns (e.g., frequently accessed data on io2, archival data on gp3).
  • Additional volumes can be added and removed (but the primary volume cannot be removed).
  • Volume names: For Oracle – rdsdbdata2, rdsdbdata3, rdsdbdata4. For SQL Server – H:, I:, J:.

Dedicated Log Volume (DLV)

  • A Dedicated Log Volume (DLV) stores database transaction logs on a separate volume from the volume containing database tables (announced October 2023).
  • Makes transaction write logging more efficient and consistent by eliminating I/O contention between data and log operations.
  • DLVs are created with a fixed size of 1,024 GiB and 3,000 Provisioned IOPS.
  • Supported only for PIOPS storage (io1 and io2 Block Express). Not supported for General Purpose storage (gp2/gp3).
  • Ideal for databases with large allocated storage, high IOPS requirements, or latency-sensitive workloads.
  • Supported for MariaDB 10.6.7+, MySQL 8.0.28+/8.4.3+, and PostgreSQL 13.10+/14.7+/15.2+/16+.
  • Moves redo logs and binary logs (MySQL/MariaDB) or WAL segments (PostgreSQL) to the separate volume.

Performance Metrics

  • Amazon RDS provides several metrics that can be used to determine how the DB instance is performing.
    • IOPS (ReadIOPS / WriteIOPS)
      • The number of I/O operations completed per second.
      • It is reported as the average IOPS for a given time interval.
      • RDS reports read and write IOPS separately on one minute intervals.
      • Total IOPS is the sum of the read and write IOPS.
      • Typical values for IOPS range from zero to tens of thousands per second.
      • Measured IOPS values are independent of the size of the individual I/O operation.
    • Latency (ReadLatency / WriteLatency)
      • The elapsed time between the submission of an I/O request and its completion.
      • It is reported as the average latency for a given time interval.
      • RDS reports read and write latency separately on one minute intervals in units of seconds.
      • Typical values for latency are in the millisecond (ms) range.
    • Throughput (ReadThroughput / WriteThroughput)
      • The number of bytes per second transferred to or from disk.
      • It is reported as the average throughput for a given time interval.
      • RDS reports read and write throughput separately on one minute intervals using units of bytes per second (B/s).
      • Typical values for throughput range from zero to the I/O channel’s maximum bandwidth.
    • Queue Depth
      • The number of I/O requests in the queue waiting to be serviced.
      • These are I/O requests that have been submitted by the application but have not been sent to the device because the device is busy servicing other I/O requests.
      • It is reported as the average queue depth for a given time interval.
      • RDS reports queue depth in one minute intervals. Typical values for queue depth range from zero to several hundred.
      • Time spent waiting in the queue is a component of Latency and Service Time (not available as a metric).

RDS Storage Facts

  • First time a DB instance is started and accesses an area of disk for the first time, the process can take longer than all subsequent accesses to the same disk area. This is known as the “first touch penalty“. Once an area of disk has incurred the first touch penalty, that area of disk does not incur the penalty again for the life of the instance, even if the DB instance is rebooted, restarted, or the DB instance class changes. Note that a DB instance created from a snapshot, a point-in-time restore, or a read replica is a new instance and does incur this first touch penalty.
  • RDS manages the DB instance and it reserves overhead space on the instance. While the amount of reserved storage varies by DB instance class and other factors, this reserved space can be as much as one or two percent of the total storage.
  • Provisioned IOPS provides a way to reserve I/O capacity by specifying IOPS. Like any other system capacity attribute, maximum throughput under load will be constrained by the resource that is consumed first, which could be IOPS, channel bandwidth, CPU, memory, or database internal resources.
  • io2 Block Express volumes support throughput scaling proportionally up to 0.256 MiB/s per provisioned IOPS. Maximum throughput of 4,000 MiB/s can be achieved at 256,000 IOPS with a 16-KiB I/O size.
  • EBS-optimized instances have a baseline and maximum IOPS rate enforced at the DB instance level. Combined IOPS from multiple volumes cannot exceed the instance-level threshold.
  • The maximum ratio of IOPS to allocated storage is 1,000:1 for io2 Block Express on Nitro-based instances, 500:1 for io2 on non-Nitro instances, and 50:1 for io1.

Factors That Impact RDS Storage Performance

  • Several factors can affect the performance of a DB instance, such as instance configuration, I/O characteristics, and workload demand.
  • System related activities also consume I/O capacity and may reduce database instance performance while in progress:
    • DB snapshot creation
    • Nightly backups
    • Multi-AZ standby creation
    • Read replica creation
    • Scaling storage
    • Changing storage types
  • System resources can constrain the throughput of a DB instance, but there can be other reasons for a bottleneck. Database could be the issue if:
    • Channel throughput limit is not reached
    • Queue depths are consistently low
    • CPU utilization is under 80%
    • Free memory available
    • No swap activity
    • Plenty of free disk space
    • Application has dozens of threads all submitting transactions as fast as the database will take them, but there is clearly unused I/O capacity
  • DB instance class determines the maximum bandwidth, throughput, and IOPS available. Using a current generation instance class with EBS-optimization and 10-gigabit network connectivity is recommended for best performance.
  • When modifying storage so it goes from one volume to four volumes (crossing the striping threshold), RDS provisions new volumes and transparently moves data, consuming significant IOPS and throughput.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. When should I choose Provisioned IOPS over Standard RDS storage?
    1. If you have batch-oriented workloads
    2. If you use production online transaction processing (OLTP) workloads
    3. If you have workloads that are not sensitive to consistent performance
  2. Is decreasing the storage size of a DB Instance permitted?
    1. Depends on the RDMS used
    2. Yes
    3. No (Direct decrease is not supported. However, Blue/Green Deployments can be used to achieve storage shrink since November 2024)
  3. Because of the extensibility limitations of striped storage attached to Windows Server, Amazon RDS does not currently support increasing storage on a _____ DB Instance using magnetic storage.
    1. SQL Server
    2. MySQL
    3. Oracle
  4. If I want to run a database in an Amazon instance, which is the most recommended Amazon storage option?
    1. Amazon Instance Storage
    2. Amazon EBS
    3. You can’t run a database inside an Amazon instance.
    4. Amazon S3
  5. For each DB Instance class, what is the maximum size of associated storage capacity?
    1. 5 TiB
    2. 16 TiB
    3. 64 TiB (per volume; up to 256 TiB with additional storage volumes for Oracle/SQL Server)
    4. 128 TiB
  6. Which RDS storage type provides the highest durability with 99.999% durability guarantee?
    1. General Purpose gp3
    2. General Purpose gp2
    3. Provisioned IOPS io2 Block Express
    4. Provisioned IOPS io1
  7. Which feature allows RDS to automatically increase storage capacity when running low on space?
    1. Elastic Volumes
    2. RDS Storage Autoscaling
    3. Dynamic Scaling
    4. Predictive Scaling
  8. What is the purpose of a Dedicated Log Volume (DLV) in Amazon RDS?
    1. To store database backups on a separate volume
    2. To store transaction logs on a separate volume from database tables for improved write performance
    3. To provide additional read capacity
    4. To enable cross-region replication
  9. A company needs to deploy an RDS for Oracle database that requires more than 64 TiB of storage. What should they do?
    1. Use multiple RDS instances with application-level sharding
    2. Migrate to Aurora
    3. Use additional storage volumes to scale up to 256 TiB per DB instance
    4. Use magnetic storage which has higher limits
  10. Which of the following statements about RDS gp3 storage is correct? (Choose 2)
    1. gp3 provides a baseline of 3,000 IOPS and 125 MiB/s regardless of volume size
    2. gp3 IOPS scales with volume size at 3 IOPS per GiB
    3. gp3 allows independent provisioning of IOPS and throughput from storage capacity
    4. gp3 requires burst credits for performance above baseline

References

AWS Relational Database Service – RDS

Relational Database Service – RDS

  • Relational Database Service – RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.
  • provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks such as hardware provisioning, database setup, patching, and backups.
  • features & benefits
    • CPU, memory, storage, and IOPs can be scaled independently.
    • manages backups, software patching, automatic failure detection, and recovery.
    • automated backups can be performed as needed, or manual backups can be triggered as well. Backups can be used to restore a database, and the restore process works reliably and efficiently.
    • provides Multi-AZ high availability with a primary instance and a synchronous standby secondary instance that can failover seamlessly when a problem occurs.
    • provides elasticity & scalability by enabling Read Replicas to increase read scaling.
    • supports MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server, and the new, MySQL-compatible Aurora DB engine
    • supports IAM users and permissions to control who has access to the RDS database service
    • databases can be further protected by putting them in a VPC, using SSL for data in transit and encryption for data in rest
    • However, as it is a managed service, shell (root ssh) access to DB instances is not provided, and this restricts access to certain system procedures and tables that require advanced privileges.

RDS Components

  • DB Instance
    • is a basic building block of RDS
    • is an isolated database environment in the cloud
    • each DB instance runs a DB engine. AWS currently supports MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server & Aurora DB engines
    • can be accessed from AWS command-line tools, RDS APIs, or the AWS Management RDS Console.
    • computation and memory capacity of a DB instance is determined by its DB instance class, which can be selected as per the needs
    • supports three storage types: Magnetic, General Purpose (SSD), and Provisioned IOPS (SSD), which differ in performance and price
    • each DB instance has a DB instance identifier, which is a customer-supplied name and must be unique for that customer in an AWS region. It uniquely identifies the DB instance when interacting with the RDS API and AWS CLI commands.
    • each DB instance can host multiple user-created databases or a single Oracle database with multiple schemas.
    • can be hosted in an AWS VPC environment for better control
  • Regions and Availability Zones
    • AWS resources are housed in highly available data center facilities in different areas of the world, these data centers are called regions which further contain multiple distinct locations called Availability Zones
    • Each AZ is engineered to be isolated from failures in other AZs and to provide inexpensive, low-latency network connectivity to other AZs in the same region
    • DB instances can be hosted in different AZs, an option called a Multi-AZ deployment.
      • RDS automatically provisions and maintains a synchronous standby replica of the DB instance in a different AZ.
      • Primary DB instance is synchronously replicated across AZs to the standby replica
      • Provides data redundancy, failover support, eliminates I/O freezes, and minimizes latency spikes during system backups.
  • Security Groups
    • security group controls the access to a DB instance, by allowing access to the specified IP address ranges or EC2 instances
  • DB Parameter Groups
    • A DB parameter group contains engine configuration values that can be applied to one or more DB instances of the same instance type
    • help define configuration values specific to the selected DB Engine for e.g. max_connections, force_ssl , autocommit
    • supports default parameter group, which cannot be edited.
    • supports custom parameter group, to override values
    • supports static and dynamic parameter groups
      • changes to dynamic parameters are applied immediately (irrespective of apply immediately setting)
      • changes to static parameters are NOT applied immediately and require a manual reboot.
  • DB Option Groups
    • Some DB engines offer tools or optional features that simplify managing the databases and making the best use of data.
    • RDS makes such tools available through option groups for e.g. Oracle Application Express (APEX), SQL Server Transparent Data Encryption, and MySQL Memcached support.

RDS Interfaces

  • RDS can be interacted with multiple interfaces
    • AWS RDS Management console
    • Command Line Interface
    • Programmatic Interfaces which include SDKs, libraries in different languages, and RDS API

RDS Multi-AZ & Read Replicas

  • Multi-AZ deployment
    • provides high availability, durability, and automatic failover support
    • helps improve the durability and availability of a critical system, enhancing availability during planned system maintenance, DB instance failure, and Availability Zone disruption.
    • automatically provisions and manages a synchronous standby instance in a different AZ.
    • automatically fails over in case of any issues with the primary instance
    • A Multi-AZ DB instance deployment has one standby DB instance that provides failover support but doesn’t serve read traffic.
    • A Multi-AZ DB cluster deployment has two standby DB instances that provide failover support and can also serve read traffic.
  • Read replicas
    • enable increased scalability and database availability in the case of an AZ failure.
    • allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads

RDS Security

  • DB instance can be hosted in a VPC for the greatest possible network access control.
  • IAM policies can be used to assign permissions that determine who is allowed to manage RDS resources.
  • Security groups allow control of what IP addresses or EC2 instances can connect to the databases on a DB instance.
  • RDS supports encryption in transit using SSL connections
  • RDS supports encryption at rest to secure instances and snapshots at rest.
  • Network encryption and transparent data encryption (TDE) with Oracle DB instances
  • Authentication can be implemented using Password, Kerberos, and IAM database authentication.

RDS Backups, Snapshot

  • Automated backups
    • are enabled by default for a new DB instance.
    • enables recovery of the database to any point in time, with database change logs, during the backup retention period, up to the last five minutes of database usage.
  • DB snapshots are manual, user-initiated backups that enable backup of the DB instance to a known state, and restore to that specific state at any time.

RDS Monitoring & Notification

  • RDS integrates with CloudWatch and provides metrics for monitoring
  • CloudWatch alarms can be created over a single metric that sends an SNS message when the alarm changes state
  • RDS also provides SNS notification whenever any RDS event occurs
  • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
  • RDS Recommendations provides automated recommendations for database resources.

RDS Pricing

  • Instance class
    • Pricing is based on the class (e.g., micro) of the DB instance consumed.
  • Running time
    • Usage is billed in one-second increments, with a minimum of 10 mins.
  • Storage
    • Storage capacity provisioned for the DB instance is billed per GB per month
    • If the provisioned storage capacity is scaled within the month, the bill will be pro-rated.
  • I/O requests per month
    • Total number of storage I/O requests made in a billing cycle.
  • Provisioned IOPS (per IOPS per month)
    • Provisioned IOPS rate, regardless of IOPS consumed, for RDS Provisioned IOPS (SSD) storage only.
    • Provisioned storage for EBS volumes is billed in one-second increments, with a minimum of 10 minutes.
  • Backup storage
    • Automated backups & any active database snapshots consume storage
    • Increasing backup retention period or taking additional database snapshots increases the backup storage consumed by the database.
    • RDS provides backup storage up to 100% of the provisioned database storage at no additional charge for e.g., if you have 10 GB-months of provisioned database storage, RDS provides up to 10 GB-months of backup storage at no additional charge.
    • Most databases require less raw storage for a backup than for the primary dataset, so if multiple backups are not maintained, you will never pay for backup storage.
    • Backup storage is free only for active DB instances.
  • Data transfer
    • Internet data transfer out of the DB instance.
  • Reserved Instances
    • In addition to regular RDS pricing, reserved DB instances can be purchased

Further Reading

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What does Amazon RDS stand for?
    1. Regional Data Server.
    2. Relational Database Service
    3. Regional Database Service.
  2. How many relational database engines does RDS currently support?
    1. MySQL, Postgres, MariaDB, Oracle, and Microsoft SQL Server
    2. Just two: MySQL and Oracle.
    3. Five: MySQL, PostgreSQL, MongoDB, Cassandra and SQLite.
    4. Just one: MySQL.
  3. If I modify a DB Instance or the DB parameter group associated with the instance, should I reboot the instance for the changes to take effect?
    1. No
    2. Yes
  4. What is the name of licensing model in which I can use your existing Oracle Database licenses to run Oracle deployments on Amazon RDS?
    1. Bring Your Own License
    2. Role Bases License
    3. Enterprise License
    4. License Included
  5. Will I be charged if the DB instance is idle?
    1. No
    2. Yes
    3. Only is running in GovCloud
    4. Only if running in VPC
  6. What is the minimum charge for the data transferred between Amazon RDS and Amazon EC2 Instances in the same Availability Zone?
    1. USD 0.10 per GB
    2. No charge. It is free.
    3. USD 0.02 per GB
    4. USD 0.01 per GB
  7. Does Amazon RDS allow direct host access via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection?
    1. Yes
    2. No
    3. Depends on if it is in VPC or not
  8. What are the two types of licensing options available for using Amazon RDS for Oracle?
    1. BYOL and Enterprise License
    2. BYOL and License Included
    3. Enterprise License and License Included
    4. Role based License and License Included
  9. A user plans to use RDS as a managed DB platform. Which of the below mentioned features is not supported by RDS?
    1. Automated backup
    2. Automated scaling to manage a higher load
    3. Automated failure detection and recovery
    4. Automated software patching
  10. A user is launching an AWS RDS with MySQL. Which of the below mentioned options allows the user to configure the InnoDB engine parameters?
    1. Options group
    2. Engine parameters
    3. Parameter groups
    4. DB parameters
  11. A user is planning to use the AWS RDS with MySQL. Which of the below mentioned services the user is not going to pay?
    1. Data transfer
    2. RDS CloudWatch metrics
    3. Data storage
    4. I/O requests per month

References

AWS_Relational_Database_Service_RDS

AWS RDS Best Practices

AWS RDS Best Practices

AWS recommends RDS best practices in terms of Monitoring, Performance, Security, and Operational Excellence.

RDS Basic Operational Guidelines

  • Monitoring
    • Memory, CPU, replica lag, and storage usage should be monitored.
    • CloudWatch can be setup for notifications when usage patterns change or when the capacity of deployment is approached, so that system performance and availability can be maintained.
  • Scaling
    • Scale up the DB instance when approaching storage capacity limits.
    • There should be some buffer in storage and memory to accommodate unforeseen increases in demand from the applications.
    • Enable RDS Storage Autoscaling to automatically scale storage when free storage space is low, helping avoid storage-full issues without manual intervention.
  • Backups
    • Enable Automatic Backups and set the backup window to occur during the daily low in WriteIOPS.
    • Use Multi-AZ to reduce the impact of backups on the primary DB instance.
  • Storage
    • Use General Purpose SSD (gp3) or Provisioned IOPS SSD (io2 Block Express) storage for all new instances.
    • Magnetic (standard) storage is deprecated as of May 1, 2026. Migrate existing magnetic storage instances to gp3 or io2 Block Express.
    • gp3 provides baseline performance of 3,000 IOPS and 125 MiB/s throughput, with the ability to independently scale IOPS up to 64,000 and throughput up to 4,000 MiB/s.
    • For latency-sensitive production workloads, use io2 Block Express which delivers sub-millisecond latency and up to 256,000 IOPS.
  • On a MySQL DB instance,
    • Do not create more than 10,000 tables using Provisioned IOPS or standard storage. Large numbers of tables will significantly increase database recovery time after a failover or database crash.
    • Avoid tables in the database growing too large. Provisioned storage limits restrict the maximum size of a MySQL table file to 16 TiB. Partition large tables so that file sizes are well under the 16 TiB limit. This can also improve performance and recovery time.
  • Performance
    • If the database workload requires more I/O than provisioned, recovery after a failover or database failure will be slow.
    • To increase the I/O capacity of a DB instance,
      • Migrate to a DB instance class with High I/O capacity.
      • Convert from magnetic storage to either General Purpose (gp3) or Provisioned IOPS (io2 Block Express) storage, depending on how much of an increase is needed.
      • If using Provisioned IOPS storage, provision additional throughput capacity.
    • Enable RDS Optimized Reads to achieve up to 2x faster complex query processing by leveraging local NVMe-based SSD storage for temporary tables.
    • Enable RDS Optimized Writes for MySQL and MariaDB to achieve up to 2x improvement in write transaction throughput at no additional cost.
  • Multi-AZ & Failover
    • Deploy applications in all Availability Zones, so if an AZ goes down, applications in other AZs will still be available.
    • Use RDS DB events to monitor failovers.
    • Set a TTL of less than 30 seconds, if the client application is caching the DNS data of the DB instances. As the underlying IP address of a DB instance can change after a failover, caching the DNS data for an extended time can lead to connection failures if the application tries to connect to an IP address that no longer is in service.
    • Multi-AZ requires the transaction logging feature to be enabled. Do not use features like Simple recovery mode, offline mode or Read-only mode which turn off transaction logging.
    • To shorten failover time
      • Ensure that sufficient Provisioned IOPS are allocated for your workload. Inadequate I/O can lengthen failover times. Database recovery requires I/O.
      • Use smaller transactions. Database recovery relies on transactions, so break up large transactions into multiple smaller transactions to shorten failover time.
      • Consider using RDS Proxy to reduce failover time by up to 79% for Aurora MySQL and 32% for RDS MySQL by maintaining application connections during failover.
    • Test failover for your DB instance to understand how long the process takes for your use case and to ensure that the application that accesses your DB instance can automatically connect to the new DB instance after failover.
    • Consider using Multi-AZ DB Cluster deployment for faster failovers (typically under 35 seconds) and readable standby instances.
  • Database Engine Versions
    • Regularly upgrade database engine versions to maintain security, performance, and compliance.
    • Enable automatic minor version upgrades for easier patching.
    • Schedule major version upgrades with proper testing in staging environments.
    • Be aware of RDS Extended Support charges that apply automatically after a major version reaches its end of standard support date. Plan upgrades before end-of-standard-support to avoid additional costs.
  • AWS Database Drivers
    • Use the AWS JDBC Driver, AWS Python Driver, or other AWS suite of drivers for faster switchover and failover times (single-digit seconds vs. tens of seconds for open-source drivers).
    • AWS drivers provide built-in support for authentication with AWS Secrets Manager, IAM, and Federated Identity.

Multi-AZ Deployment Options

  • RDS provides two Multi-AZ deployment options:
    • Multi-AZ DB Instance – One primary and one standby DB instance with synchronous replication. Failover time is typically 60–120 seconds.
    • Multi-AZ DB Cluster – One writer and two readable standby DB instances across three AZs with semi-synchronous replication. Failover time is typically under 35 seconds.
  • Multi-AZ DB Cluster Advantages:
    • Up to 2x faster transaction commit latency compared to Multi-AZ DB instance.
    • Two readable standby instances that can serve read traffic.
    • Faster automated failovers (typically under 35 seconds).
    • Supports RDS Optimized Writes and RDS Optimized Reads.
    • Supports gp3, io1, and io2 Block Express storage types.
    • IAM database authentication support.
  • Multi-AZ DB Cluster is supported for MySQL, PostgreSQL, and MariaDB engines.

DB Instance RAM Recommendations

  • An RDS performance best practice is to allocate enough RAM so that the working set resides almost completely in memory.
  • The working set is the data and indexes that are frequently in use on the instance.
  • Value of ReadIOPS should be small and stable.
  • ReadIOPS metric can be checked, using AWS CloudWatch while the DB instance is under load, to tell if the working set is almost all in memory.
  • If scaling up the DB instance class with more RAM results in a dramatic drop in ReadIOPS, the working set was not almost completely in memory.
  • Continue to scale up until ReadIOPS no longer drops dramatically after a scaling operation, or ReadIOPS is reduced to a very small amount.

RDS Security Best Practices

  • Do not use AWS root credentials to manage RDS resources; create IAM users for everyone.
  • Grant each user the minimum set of permissions required to perform his or her duties.
  • Use IAM groups to effectively manage permissions for multiple users.
  • Rotate your IAM credentials regularly.
  • Configure AWS Secrets Manager to automatically rotate the secrets for Amazon RDS, including the master user password.
  • Use IAM Database Authentication to authenticate to DB instances using IAM roles instead of database passwords. This eliminates the need to store credentials in the database and provides short-lived authentication tokens.
  • Use SSL/TLS connections to encrypt data in transit between applications and DB instances.
  • Enable encryption at rest using AWS KMS for database storage and backups.
  • Use AWS Security Hub to monitor RDS usage as it relates to security best practices.
  • Change the master user password using the AWS Management Console, CLI, or RDS API only (not SQL clients, which may unintentionally revoke privileges).

RDS Proxy

  • Amazon RDS Proxy is a fully managed, highly available database proxy that makes applications more scalable, more resilient to database failures, and more secure.
  • Connection Pooling: RDS Proxy establishes a pool of database connections and reuses them, reducing the stress on database compute and memory resources.
  • Improved Failover: RDS Proxy automatically connects to a standby DB instance while preserving application connections, reducing failover times significantly.
  • Security: RDS Proxy enforces IAM authentication and can retrieve credentials from AWS Secrets Manager, eliminating the need for database credentials in application code.
  • Best Use Cases:
    • Serverless and event-driven applications (e.g., AWS Lambda) with many short-lived connections.
    • Applications that open and close database connections rapidly.
    • Applications that require high availability with fast failover.
    • Applications that need to enforce IAM-based access to databases.
  • Supports MySQL, PostgreSQL, MariaDB, and SQL Server engines.

Blue/Green Deployments

  • Amazon RDS Blue/Green Deployments create a fully managed staging (green) environment that mirrors the production (blue) environment for safer database changes.
  • Key Benefits:
    • Perform major version upgrades, schema changes, and parameter group modifications with minimal downtime.
    • Test changes in the green environment while production continues to run on the blue environment.
    • Switchover typically completes in under a minute with built-in safeguards.
    • Automatic rollback if the switchover encounters issues.
  • Best Practices:
    • Thoroughly test the green environment before switching over.
    • Keep databases in the green environment read-only to avoid replication conflicts.
    • Make only replication-compatible schema changes.
    • Ensure data loading (lazy loading) is complete before switching over.
  • Supported for RDS for MySQL, MariaDB, and PostgreSQL.

CloudWatch Database Insights

  • Amazon CloudWatch Database Insights (successor to Performance Insights) provides database monitoring and analysis capabilities.
  • Note: Performance Insights is being replaced by CloudWatch Database Insights. The Performance Insights console experience reaches end-of-life on June 30, 2026. Migrate to the Advanced mode of Database Insights before that date.
  • Key Features:
    • Monitors database load using the DB Load metric based on active sessions.
    • On-demand analysis uses machine learning to identify performance bottlenecks and provide remediation advice.
    • Available for all RDS engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, and Db2.
  • Modes:
    • Standard mode – Included at no additional cost; provides 7 days of performance data history.
    • Advanced mode – Provides extended retention (up to 24 months), execution plans, and on-demand analysis features.

Using Enhanced Monitoring to Identify Operating System Issues

  • RDS provides metrics in real time for the operating system (OS) that your DB instance runs on.
  • Enhanced Monitoring provides granularity at 1, 5, 10, 15, 30, or 60 second intervals.
  • Enhanced Monitoring is available for all current generation DB instance classes.
  • Note: Previous generation instances (db.t1.micro, db.m1.small) have been deprecated since February 2023.

Using Metrics to Identify Performance Issues

  • To identify performance issues caused by insufficient resources and other common bottlenecks, you can monitor the metrics available for your Amazon RDS DB instance.
  • Performance metrics should be monitored on a regular basis to benchmark the average, maximum, and minimum values for a variety of time ranges to help identify performance degradation.
  • CloudWatch alarms can be set for particular metric thresholds to be alerted when they are reached or breached.
  • A DB instance has a number of different categories of metrics which includes CPU, memory, disk space, IOPS, db connections and network traffic, and how to determine acceptable values depends on the metric.
  • One of the best ways to improve DB instance performance is to tune the most commonly used and most resource-intensive queries to make them less expensive to run.
  • For Multi-AZ DB clusters, monitor replica lag – the time difference between the latest transaction on the writer DB instance and the latest applied transaction on a reader DB instance.

Recovery

  • MySQL
    • InnoDB is the recommended and supported storage engine for MySQL DB instances on Amazon RDS.
    • However, MyISAM performs better than InnoDB if you require intense, full-text search capability.
    • Point-In-Time Restore and snapshot restore features of Amazon RDS for MySQL require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
    • Although MySQL supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
    • MyISAM storage engine does not support reliable crash recovery and might prevent a Point-In-Time Restore or snapshot restore from working as intended which might result in lost or corrupt data when MySQL is restarted after a crash.
    • InnoDB instances can be migrated to Aurora, while MyISAM instances cannot.
    • MySQL table file size limit is 16 TiB (updated from 6 TB). Partition large tables well under this limit.
  • MariaDB
    • InnoDB is the recommended and supported storage engine for MariaDB DB instances on Amazon RDS.
    • Point-In-Time Restore and snapshot restore features of Amazon RDS for MariaDB require a crash-recoverable storage engine and are supported for the InnoDB storage engine only.
    • Although MariaDB supports multiple storage engines with varying capabilities, not all of them are optimized for crash recovery and data durability.
    • For e.g., although Aria is a crash-safe replacement for MyISAM, it might still prevent a Point-In-Time Restore or snapshot restore from working as intended. This might result in lost or corrupt data when MariaDB is restarted after a crash.

RDS Extended Support

  • Amazon RDS Extended Support provides up to three additional years of critical security and bug fixes beyond a major version’s end of standard support date.
  • Extended Support charges apply automatically when running a major version past its end of standard support date.
  • Best Practice: Plan and execute major version upgrades before the end-of-standard-support date to avoid Extended Support charges.
  • Use Blue/Green Deployments to perform major version upgrades with minimal downtime.
  • Currently supported MySQL versions: 8.0 and 8.4 (MySQL 5.7 under Extended Support).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  2. Amazon RDS automated backups and DB Snapshots are currently supported for only the __________ storage engine
    1. InnoDB
    2. MyISAM
  3. A company wants to reduce the failover time for their Amazon RDS Multi-AZ deployment from approximately 60-120 seconds. Which approach would provide the fastest failover? Choose the correct answer.
    1. Use RDS Proxy in front of the Multi-AZ DB instance
    2. Migrate to a Multi-AZ DB Cluster deployment which provides failover typically under 35 seconds
    3. Increase the Provisioned IOPS on the DB instance
    4. Enable Enhanced Monitoring to detect failures faster
  4. A serverless application using AWS Lambda functions experiences database connection exhaustion on its Amazon RDS MySQL instance during peak traffic. What is the recommended solution?
    1. Increase the max_connections parameter in the DB parameter group
    2. Scale up to a larger DB instance class
    3. Deploy Amazon RDS Proxy to manage connection pooling and multiplexing
    4. Add a read replica to distribute the load
  5. A company wants to perform a major version upgrade of their production Amazon RDS for PostgreSQL database with minimal downtime. Which approach is recommended?
    1. Take a snapshot, restore to a new instance with the new version, and switch DNS
    2. Use the modify DB instance option to upgrade in-place during a maintenance window
    3. Use Amazon RDS Blue/Green Deployments to create a staging environment, test the upgrade, and switch over
    4. Create a read replica with the new version and promote it
  6. Which of the following are advantages of Amazon RDS Multi-AZ DB Cluster deployment over Multi-AZ DB Instance deployment? (Choose 3)
    1. Readable standby instances that can serve read traffic
    2. Faster automated failover (typically under 35 seconds)
    3. Support for all RDS database engines
    4. Up to 2x faster transaction commit latency
    5. Zero-downtime failover
  7. A company is running Amazon RDS for MySQL 5.7 which has reached end of standard support. What happens if they take no action? Choose the correct answer.
    1. The database will be automatically upgraded to MySQL 8.0
    2. The database will be terminated
    3. AWS will automatically charge RDS Extended Support fees for continued security patches
    4. The database will stop receiving any updates but continue running at the same cost
  8. Which storage type is AWS deprecating for Amazon RDS as of May 2026?
    1. General Purpose SSD (gp2)
    2. Magnetic (standard) storage
    3. Provisioned IOPS (io1)
    4. General Purpose SSD (gp3)

References

AWS Lambda

AWS Lambda

  • AWS Lambda offers Serverless computing that allows applications and services to be built and run without thinking about servers.
  • With serverless computing, the application still runs on servers, but all the server management is done by AWS.
  • helps run code without provisioning or managing servers, where you pay only for the compute time when the code is running.
  • is priced on a pay-per-use basis and there are no charges when the code is not running.
  • allows the running of code for any type of application or backend service with zero administration.
  • performs all the operational and administrative activities on your behalf, including capacity provisioning, monitoring fleet health, applying security patches to the underlying compute resources, deploying code, running a web service front end, and monitoring and logging the code.
  • does not provide access to the underlying compute infrastructure.
  • handles scalability and availability as it
    • provides easy scaling and high availability to the code without additional effort on your part.
    • is designed to process events within milliseconds.
    • is designed to run many instances of the functions in parallel.
    • is designed to use replication and redundancy to provide high availability for both the service and the functions it operates.
    • has no maintenance windows or scheduled downtimes for either.
    • has a default safety throttle for the number of concurrent executions per account per region (default 1,000 concurrent executions).
    • scales by 1,000 concurrent executions every 10 seconds until the account’s concurrency limit is reached (12x faster scaling announced Nov 2023).
    • has a higher latency immediately after a function is created, or updated, or if it has not been used recently.
    • for any function updates, there is a brief window of time, less than a minute, when requests would be served by both versions
  • Security
    • stores code in S3 and encrypts it at rest and performs additional integrity checks while the code is in use.
    • each function runs in its own isolated environment, with its own resources and file system view
    • supports Code Signing using AWS Signer, which offers trust and integrity controls that enable you to verify that only unaltered code from approved developers is deployed in the functions.
  • Functions must complete execution within 900 seconds (15 minutes). The default timeout is 3 seconds. The timeout can be set to any value between 1 and 900 seconds.
  • AWS Step Functions can help coordinate a series of Lambda functions in a specific order. Multiple functions can be invoked sequentially, passing the output of one to the other, and/or in parallel, while the state is being maintained by Step Functions.
  • AWS X-Ray helps to trace functions, which provides insights such as service overhead, function init time, and function execution time.
  • Lambda Provisioned Concurrency provides greater control over the performance of serverless applications.
  • Lambda SnapStart reduces cold start latency to sub-second for Java, Python, and .NET functions without code changes or additional cost.
  • Lambda Durable Functions enable multi-step applications and AI workflows with automatic checkpointing, failure recovery, and execution suspension for up to one year.
  • Lambda Managed Instances allow running Lambda functions on EC2 instances with serverless operational simplicity, enabling access to specialized compute and EC2 commitment-based pricing (up to 72% savings).
  • Lambda@Edge allows you to run code across AWS locations globally without provisioning or managing servers, responding to end-users at the lowest network latency.
  • Lambda Extensions allow integration of Lambda with other third-party tools for monitoring, observability, security, and governance.
  • Compute Savings Plan can help save money for Lambda executions.
  • CodePipeline and CodeDeploy can be used to automate the serverless application release process.
  • RDS Proxy provides a highly available database proxy that manages thousands of concurrent connections to relational databases.
  • Supports Elastic File Store, to provide a shared, external, persistent, scalable volume using a fully managed elastic NFS file system without the need for provisioning or capacity management.
  • supports Function URLs, a built-in HTTPS endpoint that can be invoked using the browser, curl, and any HTTP client.
  • supports Response Streaming, allowing functions to send response data to callers as it becomes available, enabling larger payloads and long-running operations with incremental progress reporting.
  • supports Graviton2 (ARM64) architecture, delivering up to 34% better price-performance compared to x86_64 functions.
  • supports configurable ephemeral storage (/tmp) between 512 MB and 10,240 MB (10 GB) for data-intensive workloads.
  • supports up to 10,240 MB (10 GB) of memory with up to 6 vCPUs proportionally allocated.
  • supports asynchronous invocation payload sizes up to 1 MB (increased from 256 KB in Oct 2025).
  • supports Advanced Logging Controls with native JSON structured logging for easier search, filter, and analysis of function logs.
  • supports Recursive Loop Detection that automatically detects and stops recursive invocations between Lambda and supported services (SQS, SNS, S3) to prevent runaway costs.

Functions & Event Sources

  • Core components of Lambda are functions and event sources.
    • Event source – an AWS service or custom application that publishes events.
    • Function – a custom code that processes the events.

Lambda Functions

  • Each function has associated configuration information, such as its name, description, runtime, entry point, and resource requirements
  • Lambda functions should be designed as stateless
    • to allow launching of as many copies of the function as needed as per the demand.
    • Local file system access, child processes, and similar artifacts may not extend beyond the lifetime of the request
    • The state can be maintained externally in DynamoDB or S3
  • Lambda Execution role can be assigned to the function to grant permission to access other resources.
  • Functions have the following restrictions
    • Inbound network connections are blocked
    • Outbound connections only TCP/IP sockets are supported
    • ptrace (debugging) system calls are blocked
    • TCP port 25 traffic is also blocked as an anti-spam measure.
  • Lambda may choose to retain an instance of the function and reuse it to serve a subsequent request, rather than creating a new copy.
  • Lambda Layers provide a convenient way to package libraries and other dependencies that you can use with your Lambda functions.
  • Function versions can be used to manage the deployment of the functions.
  • Function Alias supports creating aliases, which are mutable, for each function version.
  • Functions are automatically monitored, and real-time metrics are reported through CloudWatch, including total requests, latency, error rates, etc.
  • Lambda automatically integrates with CloudWatch logs, creating a log group for each function and providing basic application lifecycle event log entries, including logging the resources consumed for each use of that function.
  • Functions support code written in
    • Node.js (Node.js 22, Node.js 24)
    • Python (Python 3.12, 3.13, 3.14)
    • Java (Java 21, Java 25)
    • C# (.NET 8, .NET 10)
    • Ruby (Ruby 3.3, 3.4, 4.0)
    • Go (using OS-only runtime provided.al2023)
    • Rust (using OS-only runtime provided.al2023)
    • Custom runtime (provided.al2023)
  • Container images are also supported.
  • Supports both x86_64 and arm64 (Graviton2) architectures for all managed runtimes.
  • Failure Handling
    • For S3 bucket notifications and custom events, Lambda will attempt execution of the function three times in the event of an error condition in the code or if a service or resource limit is exceeded.
    • For ordered event sources that Lambda polls, e.g. DynamoDB Streams and Kinesis streams, it will continue attempting execution in the event of a developer code error until the data expires.
    • Kinesis and DynamoDB Streams retain data for a minimum of 24 hours
    • Dead Letter Queues (SNS or SQS) can be configured for events to be placed, once the retry policy for asynchronous invocations is exceeded
    • Lambda Destinations can be configured for successful and failed asynchronous invocations (recommended over DLQ).

Read in-depth @ Lambda Functions

Lambda Event Sources

  • Event Source is an AWS service or developer-created application that produces events that trigger an AWS Lambda function to run
  • Event source mapping refers to the configuration which maps an event source to a Lambda function.
  • Event sources can be both push and pull sources
    • Services like S3, and SNS publish events to Lambda by invoking the cloud function directly.
    • Lambda can also poll resources in services like Kafka, and Kinesis streams that do not publish events to Lambda.

Read in-depth @ Event Sources

Lambda Execution Environment

  • Lambda invokes the function in an execution environment, which provides a secure and isolated runtime environment.
  • Execution Context is a temporary runtime environment that initializes any external dependencies of the Lambda function code, e.g. database connections or HTTP endpoints.
  • When a function is invoked, the Execution environment is launched based on the provided configuration settings i.e. memory and execution time.
  • After a Lambda function is executed, Lambda maintains the execution environment for some time in anticipation of another function invocation which allows it to reuse the /tmp directory and objects declared outside of the function’s handler method e.g. database connection.
  • When a Lambda function is invoked for the first time or after it has been updated there is latency for bootstrapping as Lambda tries to reuse the Execution Context for subsequent invocations of the Lambda function
  • Subsequent invocations perform better performance as there is no need to “cold-start” or initialize those external dependencies
  • Execution environment
    • takes care of provisioning and managing the resources needed to run the function.
    • provides lifecycle support for the function’s runtime and any external extensions associated with the function.
  • Function’s runtime communicates with Lambda using the Runtime API.
  • Extensions communicate with Lambda using the Extensions API.
  • Extensions can also receive log messages from the function by subscribing to logs using the Logs API.
  • Lambda manages Execution Environment creations and deletion, there is no AWS Lambda API to manage Execution Environment.
  • Execution environments support both standard functions (up to 15 minutes) and Durable Functions (up to one year).

Lambda Execution Environment

Lambda in VPC

  • Lambda function always runs inside a VPC owned by the Lambda service which isn’t connected to your account’s default VPC
  • Lambda applies network access and security rules to this VPC and maintains and monitors the VPC automatically.
  • A function can be configured to be launched in private subnets in a VPC in your AWS account.
  • Function connected to VPC can access private resources databases, cache instances, or internal services during the execution.
  • To enable the function to access resources inside the private VPC, additional VPC-specific configuration information that includes private subnet IDs and security group IDs must be provided.
  • Lambda uses this information to set up ENIs that enables the function to connect securely to other resources within your private VPC.
  • Functions connected to VPC can’t access the Internet and need a NAT Gateway to access any external resources outside of AWS.
  • Functions cannot connect directly to a VPC with dedicated instance tenancy, instead, peer it to a second VPC with default tenancy.

Lambda Security

  • All data stored in ephemeral storage is encrypted at rest with a key managed by AWS.
  • Lambda functions provide access only to a single VPC. If multiple subnets are specified, they must all be in the same VPC. Other VPCs can be connected using VPC Peering.
  • Supports Code Signing using AWS Signer, which offers trust and integrity controls that enable you to verify that only unaltered code from approved developers is deployed in the functions.
  • AWS Lambda can perform the following signature checks at deployment:
    • Corrupt signature – This occurs if the code artifact has been altered since signing.
    • Mismatched signature – This occurs if the code artifact is signed by a signing profile that is not approved.
    • Expired signature – This occurs if the signature is past the configured expiry date.
    • Revoked signature – This occurs if the signing profile owner revokes the signing jobs.
  • For sensitive information, for e.g. passwords, AWS recommends using client-side encryption using AWS Key Management Service – KMS and store the resulting values as ciphertext in your environment variable.
  • Function code should include the logic to decrypt these values.

Lambda Permissions

  • IAM – Use IAM to manage access to the Lambda API and resources like functions and layers.
  • Execution Role – A Lambda function can be provided with an Execution Role, that grants it permission to access AWS services and resources e.g. send logs to CloudWatch and upload trace data to AWS X-Ray.
  • Function Policy – Resource-based Policies
    • Use resource-based policies to give other accounts and AWS services permission to use the Lambda resources.
    • Resource-based permissions policies are supported for functions and layers.

Invoking Lambda Functions

  • Lambda functions can be invoked
    • directly using the Lambda console or API, a function URL HTTP(S) endpoint, an AWS SDK, the AWS CLI, and AWS toolkits.
    • other AWS services like S3 and SNS invoke the function.
    • to read from a stream or queue and invoke the function.
  • Functions can be invoked
    • Synchronously
      • You wait for the function to process the event and return a response.
      • Error handling and retries need to be handled by the Client.
      • Invocation includes API, and SDK for calls from API Gateway.
      • Maximum request/response payload size is 6 MB.
    • Asynchronously
      • queues the event for processing and returns a response immediately.
      • handles retries and can send invocation records to a destination for successful and failed events.
      • Invocation includes S3, SNS, and CloudWatch Events
      • can define DLQ for handling failed events. AWS recommends using destinations instead of DLQ.
      • Maximum payload size is 1 MB (increased from 256 KB in Oct 2025).

Lambda SnapStart

  • Lambda SnapStart reduces cold start latency from several seconds to as low as sub-second, typically with no or minimal code changes.
  • Supported for Java, Python, and .NET runtimes (Python and .NET became GA in November 2024).
  • Works by taking a snapshot of the initialized execution environment (memory and disk state) after initialization completes.
  • When the function is invoked, Lambda resumes the execution environment from the cached snapshot instead of initializing from scratch.
  • Can improve startup performance by up to 10x for Java functions.
  • SnapStart is an opt-in capability configured at the function level.
  • For SnapStart-enabled functions, initialization code can run for up to 15 minutes when creating a snapshot.
  • Available in most AWS Regions (expanded to 23+ additional regions in 2025).
  • Unlike Provisioned Concurrency, SnapStart does not incur additional charges for pre-initialized environments.

Lambda Provisioned Concurrency

  • Lambda Provisioned Concurrency provides greater control over the performance of serverless applications.
  • When enabled, Provisioned Concurrency keeps functions initialized and hyper-ready to respond in double-digit milliseconds.
  • Provisioned Concurrency is ideal for building latency-sensitive applications, such as web or mobile backends, synchronously invoked APIs, and interactive microservices.
  • The amount of concurrency can be increased during times of high demand and lowered or turn it off completely when demand decreases.
  • If the concurrency of a function reaches the configured level, subsequent invocations of the function have the latency and scale characteristics of regular functions.
  • Application Auto Scaling can be used to automatically manage provisioned concurrency based on utilization.

Lambda Durable Functions

  • Lambda Durable Functions (announced Dec 2025) enable multi-step applications and AI workflows with automatic checkpointing and failure recovery.
  • Durable functions use a checkpoint and replay mechanism (durable execution) to persist progress at specific points in code.
  • Key capabilities:
    • Checkpoints – act as save points, persisting progress at specific moments in the execution.
    • Steps – define logical units of work that are automatically checkpointed upon completion.
    • Waits – allow pausing execution for up to one year without incurring compute charges (for on-demand functions).
    • Automatic Failure Recovery – when a failure occurs, execution resumes from the most recent checkpoint rather than starting over.
  • Requires the open source Durable Execution SDK (available for Node.js and Python runtimes).
  • When a function resumes, it replays from the beginning but skips completed work using saved checkpoint results.
  • Ideal for:
    • Human-in-the-loop processes
    • AI and LLM orchestration workflows
    • Multi-step data processing pipelines
    • Long-running approval workflows
  • Available in 14+ AWS Regions.
  • Does not require managing additional infrastructure or writing custom state management code.

Lambda Managed Instances

  • Lambda Managed Instances (announced Nov 2025, re:Invent) allow running Lambda functions on Amazon EC2 instances while maintaining serverless operational simplicity.
  • AWS handles all infrastructure tasks: instance lifecycle, OS and runtime patching, routing, load balancing, and auto scaling.
  • Key benefits:
    • EC2 Pricing Models – Access to Compute Savings Plans and Reserved Instances for up to 72% discount over On-Demand pricing.
    • Specialized Compute – Access to latest-generation EC2 instances including Graviton4, network-optimized, and large-memory instances.
    • Multi-concurrency – Each execution environment can handle multiple concurrent requests, improving resource utilization.
    • No Cold Starts – Pre-provisioned execution environments eliminate cold start latency.
  • Organized into Capacity Providers that define compute characteristics (instance type, networking, scaling parameters).
  • Instances have a maximum 14-day lifetime for security and compliance.
  • Pricing: Standard Lambda request charges + EC2 instance charges + 15% compute management fee.
  • Supports Node.js, Java, .NET, and Python runtimes.
  • Existing Lambda functions can be migrated without code changes (must validate thread safety for multi-concurrency).
  • Available in US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland).

Lambda@Edge

Read in-depth @ Lambda@Edge

Lambda Extensions

  • Lambda Extensions allow integration of Lambda with other third-party tools for monitoring, observability, security, and governance.
  • Extensions run as companion processes within the Lambda execution environment.
  • Internal extensions run as part of the runtime process; external extensions run as separate processes.
  • Extensions can run during the invocation phase and up to 2 seconds during the shutdown phase.

Lambda Concurrency and Scaling

  • Default account concurrency limit is 1,000 concurrent executions across all functions per region (can be increased via Service Quotas).
  • Lambda scales by 1,000 concurrent executions every 10 seconds until the account limit is reached (12x faster than previous scaling model).
  • Each function scales independently from other functions in the same account.
  • Reserved Concurrency – Guarantees a set number of concurrent executions for a specific function; also acts as a maximum concurrency limit for that function.
  • Provisioned Concurrency – Pre-initializes execution environments to eliminate cold starts.
  • Maximum Concurrency for SQS – Allows setting a maximum number of concurrent function invocations when using SQS as an event source.
  • New AWS accounts may have reduced concurrency quotas that are raised automatically based on usage.

Lambda Best Practices

  • Lambda function code should be stateless and ensure there is no affinity between the code and the underlying compute infrastructure.
  • Instantiate AWS clients outside the scope of the handler to take advantage of connection re-use.
  • Make sure you have set +rx permissions on your files in the uploaded ZIP to ensure Lambda can execute code on your behalf.
  • Lower costs and improve performance by minimizing the use of startup code not directly related to processing the current event.
  • Use the built-in CloudWatch monitoring of the Lambda functions to view and optimize request latencies.
  • Delete old Lambda functions that you are no longer using.
  • Use Graviton2 (arm64) architecture for up to 34% better price-performance with minimal code changes.
  • Use SnapStart for Java, Python, and .NET functions to reduce cold start latency without additional cost.
  • Use Lambda Power Tuning to find the optimal memory configuration for cost and performance.
  • Include the AWS SDK in your deployment package rather than relying on the runtime-included version for version consistency.
  • Use structured JSON logging with Advanced Logging Controls for better observability.
  • Keep runtimes up to date – Lambda runtimes are deprecated when the underlying language version reaches end of community LTS support.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your serverless architecture using AWS API Gateway, AWS Lambda, and AWS DynamoDB experienced a large increase in traffic to a sustained 400 requests per second, and dramatically increased in failure rates. Your requests, during normal operation, last 500 milliseconds on average. Your DynamoDB table did not exceed 50% of provisioned throughput, and Table primary keys are designed correctly. What is the most likely issue?
    1. Your API Gateway deployment is throttling your requests.
    2. Your AWS API Gateway Deployment is bottlenecking on request (de)serialization.
    3. You did not request a limit increase on concurrent Lambda function executions. (Lambda has a default account concurrency limit of 1,000. At 500 milliseconds per request, each concurrent execution handles 2 requests/second. The default 1,000 concurrency supports 2,000 requests/second, which exceeds 400 rps. However, new accounts may have lower limits. The key concept is understanding Lambda concurrency limits and requesting increases when needed.)
    4. You used Consistent Read requests on DynamoDB and are experiencing semaphore lock.
  2. A company wants to reduce cold start latency for their Java-based Lambda functions that power a latency-sensitive API. Which TWO approaches can help? (Choose 2)
    1. Enable Lambda SnapStart on the function
    2. Configure Provisioned Concurrency for the function
    3. Increase the function’s memory allocation to 10 GB
    4. Switch the function to asynchronous invocation
    5. Enable Lambda Extensions for the function
    (SnapStart reduces cold starts by up to 10x by caching initialized snapshots. Provisioned Concurrency keeps environments pre-initialized. Both eliminate cold start latency.)
  3. A team needs to build a multi-step workflow that involves calling multiple APIs, waiting for human approval (which may take days), and then processing the results. The team wants to minimize infrastructure management. Which Lambda capability is most suitable?
    1. Lambda Provisioned Concurrency
    2. Lambda Managed Instances
    3. Lambda Durable Functions
    4. AWS Step Functions with Lambda
    (Lambda Durable Functions can checkpoint progress, suspend execution for up to one year during waits like human approvals, and automatically recover from failures—all without additional infrastructure. While Step Functions can also orchestrate workflows, Durable Functions provide this within the Lambda programming model itself.)
  4. A company has steady-state Lambda workloads processing 10,000 requests per second. They want to reduce costs while maintaining the serverless programming model. What should they use?
    1. Lambda Provisioned Concurrency with Compute Savings Plans
    2. Lambda Managed Instances with EC2 Reserved Instances or Compute Savings Plans
    3. Migrate to Amazon ECS with Fargate
    4. Use Lambda with Graviton2 architecture only
    (Lambda Managed Instances allow using EC2 commitment-based pricing (Savings Plans, Reserved Instances) for up to 72% discount while maintaining the Lambda programming model and serverless operational simplicity.)
  5. Which of the following is NOT a supported Lambda runtime as of 2025?
    1. Python 3.13
    2. Node.js 22
    3. Java 21
    4. Go 1.x managed runtime
    (The Go 1.x managed runtime was deprecated on Jan 8, 2024. Go functions should now use the OS-only runtime provided.al2023 with a custom runtime approach.)
  6. A Lambda function processes files uploaded to S3 and writes processed results back to the same S3 bucket. The team notices unexpectedly high Lambda invocations and costs. What AWS feature helps prevent this?
    1. Lambda Reserved Concurrency
    2. S3 Event notification filtering
    3. Lambda Recursive Loop Detection
    4. Lambda function timeout configuration
    (Lambda Recursive Loop Detection automatically detects and stops recursive invocations between Lambda and supported services including S3, SQS, and SNS after 16 invocations, preventing runaway costs.)
  7. A company wants their Lambda functions to send response data progressively to clients as it becomes available, rather than waiting for the entire response. Which feature should they use?
    1. Lambda Provisioned Concurrency
    2. Lambda Response Streaming
    3. Lambda Function URLs with buffered response
    4. Lambda Asynchronous Invocation with Destinations
    (Lambda Response Streaming allows sending response data to callers as it becomes available, supporting larger payloads and enabling progressive rendering for web applications.)