AWS Aurora is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
Aurora is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine i.e. applications developed with MySQL can switch to Aurora with little or no changes
Aurora delivers up to 5x performance of MySQL without requiring any changes to most MySQL applications
Aurora PostgreSQL delivers up to 3x performance of PostgreSQL.
RDS manages the Aurora databases, handling time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair.
Based on the database usage, Aurora storage will automatically grow, from 10GB to 64TiB in 10GB increments with no impact to database performance
Aurora DB Clusters
Aurora DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances.
An Aurora cluster volume is a virtual database storage volume that spans multiple AZs, with each AZ having a copy of the DB cluster data
Two types of DB instances make up an Aurora DB cluster:
Primary DB instance
Supports read and write operations, and performs all of the data modifications to the cluster volume.
Each Aurora DB cluster has one primary DB instance.
Connects to the same storage volume as the primary DB instance and supports only read operations.
Each Aurora DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance.
Provides high availability by locating Replicas in separate AZs
Aurora automatically fails over to an Aurora Replica in case the primary DB instance becomes unavailable.
Failover priority for Aurora Replicas can be specified.
Aurora Replicas can also offload read workloads from the primary DB instance
For Aurora multi-master clusters
all DB instances have read/write capability, with no difference between primary and replica.
Aurora involves a cluster of DB instances instead of a single instance
Endpoint refers to an intermediate handler with the host name and port specified to connect to the cluster
Aurora uses the endpoint mechanism to abstract these connections
Cluster endpoint (or writer endpoint) for an Aurora DB cluster connects to the current primary DB instance for that DB cluster.
Cluster endpoint is the only one that can perform write operations such as DDL statements as well as read operations
Each Aurora DB cluster has one cluster endpoint and one primary DB instance.
Cluster endpoint provides failover support for read/write connections to the DB cluster. If the current primary DB instance of a DB cluster fails, Aurora automatically fails over to a new primary DB instance. During a failover, the DB cluster continues to serve connection requests to the cluster endpoint from the new primary DB instance, with minimal interruption of service.
Reader endpoint for an Aurora DB cluster provides load-balancing support for read-only connections to the DB cluster.
Use the reader endpoint for read operations, such as queries.
Reader endpoint reduces the overhead on the primary instance by processing the statements on the read-only Aurora Replicas.
Each Aurora DB cluster has one reader endpoint.
If the cluster contains one or more Aurora Replicas, the reader endpoint load-balances each connection request among the Aurora Replicas.
Custom endpoint for an Aurora cluster represents a set of DB instances that you choose.
Aurora performs load balancing and chooses one of the instances in the group to handle the connection.
An Aurora DB cluster has no custom endpoints until one created and upto five custom endpoints can be created for each provisioned Aurora cluster.
Aurora Serverless clusters does not support custom endpoints
An instance endpoint connects to a specific DB instance within an Aurora cluster and provides direct control over connections to the DB cluster.
Each DB instance in a DB cluster has its own unique instance endpoint. So there is one instance endpoint for the current primary DB instance of the DB cluster, and there is one instance endpoint for each of the Aurora Replicas in the DB cluster.
High Availability and Replication
Aurora is designed to offer greater than 99.99% availability
Aurora provides data durability and reliability
by replicating the database volume six ways across three Availability Zones in a single region
backing up the data continuously to S3.
Aurora transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.
If the primary DB instance fails, Aurora automatically fails over to a new primary DB instance, by either promoting an existing Aurora Replica to a new primary DB instance or creating a new primary DB instance
Aurora automatically divides the database volume into 10GB segments spread across many disks. Each 10GB chunk of the database volume is replicated six ways, across three Availability Zones
RDS databases for e.g. MySQL, Oracle etc. have the data in a single AZ
Aurora is designed to transparently handle
the loss of up to two copies of data without affecting database write availability and
up to three copies without affecting read availability.
Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically.
Aurora Replicas share the same underlying volume as the primary instance. Updates made by the primary are visible to all Aurora Replicas
As Aurora Replicas share the same data volume as the primary instance, there is virtually no replication lag
Any Aurora Replica can be promoted to become primary without any data loss and therefore can be used for enhancing fault tolerance in the event of a primary DB Instance failure.
To increase database availability, 1 to 15 replicas can be created in any of 3 AZs, and RDS will automatically include them in failover primary selection in the event of a database outage.
Aurora uses SSL (AES-256) to secure the connection between the database instance and the application
Aurora allows database encryption using keys managed through AWS Key Management Service (KMS).
Encryption and decryption are handled seamlessly.
With Aurora encryption, data stored at rest in the underlying storage is encrypted, as are its automated backups, snapshots, and replicas in the same cluster.
Encryption of existing unencrypted Aurora instance is not supported. Create a new encrypted Aurora instance and migrate the data
Backup and Restore
Automated backups are always enabled on Aurora DB Instances.
Backups do not impact database performance.
Aurora also allows creation of manual snapshots
Aurora automatically maintains 6 copies of the data across 3 AZs and will automatically attempt to recover the database in a healthy AZ with no data loss
If in any case the data is unavailable within Aurora storage,
DB Snapshot can be restored or
point-in-time restore operation can be performed to a new instance. Latest restorable time for a point-in-time restore operation can be up to 5 minutes in the past.
Restoring a snapshot creates a new Aurora DB instance
Deleting Aurora database deletes all the automated backups (with an option to create a final snapshot), but would not remove the manual snapshots.
Snapshots (including encrypted ones) can be shared with another AWS accounts
Amazon Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
An Aurora Serverless DB cluster automatically starts up, shuts down, and scales capacity up or down based on the application’s needs.
Aurora Serverless provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
Aurora Global Database
AWS Certification Exam Practice Questions
Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
Open to further feedback, discussion and correction.
Company wants to use MySQL compatible relational database with greater performance. Which AWS service can be used?
An application requires a highly available relational database with an initial storage capacity of 8 TB. The database will grow by 8 GB every day. To support expected traffic, at least eight read replicas will be required to handle database reads. Which option will meet these requirements?
A company is migrating their on-premise 10TB MySQL database to AWS. As a compliance requirement, the company wants to have the data replicated across three availability zones. Which Amazon RDS engine meets the above business requirement?
is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases
is a managed services and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair
is a proprietary technology from AWS (not open sourced)
provides PostgreSQL and MySQL compatibility
is “AWS cloud optimized” and claims 5x performance improvement
over MySQL on RDS, over 3x the performance of PostgreSQL on RDS
scales storage automatically in increments of 10GB, up to 64 TB with no impact to database performance. Storage is striped across 100s of volumes.
no need to provision storage in advance.
provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.
provides instantaneous failover
replicates each chunk of my the database volume six ways across three Availability Zones i.e. 6 copies of the data across 3 AZ
requires 4 copies out of 6 needed for writes
requires 3 copies out of 6 need for reads
costs more than RDS (20% more) – but is more efficient
can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
share the same data volume as the primary instance in the same AWS Region, there is virtually no replication lag
supports Automated failover for master in less than 30 seconds
supports Cross Region Replication using either physical or logical replication.
supports Encryption at rest using KMS
supports Encryption in flight using SSL (same process as MySQL or Postgres)
Automated backups, snapshots and replicas are also encrypted
Possibility to authenticate using IAM token (same method as RDS)
supports protecting the instance with security groups
does not support SSH access to the underlying servers
provides automated database Client instantiation and on-demand autoscaling based on actual usage
provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
automatically starts up, shuts down, and scales capacity up or down based on the application’s needs. No capacity planning needed
Pay per second, can be more cost-effective
Aurora Global Database
allows a single Aurora database to span multiple AWS regions.
provides Physical replication, which uses dedicated infrastructure that leaves the databases entirely available to serve the application
supports 1 Primary Region (read / write)
replicates across up to 5 secondary (read-only) regions, replication lag is less than 1 second
supports up to 16 Read Replicas per secondary region
recommended for low-latency global reads and disaster recovery with an RTO of < 1 minute
failover is not automated and If the primary region becomes unavailable, a secondary region can be manually removed from an Aurora Global Database and promote it to take full reads and writes. Application needs to be updated to point to the newly promoted region.
supports parallel or distributed query using Aurora Parallel Query, which refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.
fully managed NoSQL database service
synchronously replicates data across three facilities in an AWS Region, giving high availability and data durability
runs exclusively on SSDs to provide high I/O performance
provides provisioned table reads and writes
automatically partitions, reallocates and re-partitions the data and provisions additional server capacity as data or throughput changes
provides Eventually consistent (by default) or Strongly Consistent option to be specified during an read operation
creates and maintains indexes for the primary key attributes for efficient access of data in the table
supports secondary indexes
allows querying attributes other then the primary key attributes without impacting performance.
are automatically maintained as sparse objects
Local vs Global secondary index
shares partition key + different sort key vs different partition + sort key
search limited to partition vs across all partition
unique attributes vs non unique attributes
linked to the base table vs independent separate index
only created during the base table creation vs can be created later
cannot be deleted after creation vs can be deleted
consumes provisioned throughput capacity of the base table vs independent throughput
returns all attributes for item vs only projected attributes
Eventually or Strongly vs Only Eventually consistent reads
size limited to 10Gb per partition vs unlimited
supports cross region replication using DynamoDB streams which leverages Kinesis and provides time-ordered sequence of item-level changes and can help for lower RPO, lower RTO disaster recovery
Data Pipeline jobs with EMR can be used for disaster recovery with higher RPO, lower RTO requirements
supports triggers to allow execution of custom actions or notifications based on item-level updates
managed web service that provides in-memory caching to deploy and run Memcached or Redis protocol-compliant cache clusters
ElastiCache with Redis,
like RDS, supports Multi-AZ, Read Replicas and Snapshots
Read Replicas are created across AZ within same region using Redis’s asynchronous replication technology
Multi-AZ differs from RDS as there is no standby, but if the primary goes down a Read Replica is promoted as primary
Read Replicas cannot span across regions, as RDS supports
cannot be scaled out and if scaled up cannot be scaled down
allows snapshots for backup and restore
AOF can be enabled for recovery scenarios, to recover the data in case the node fails or service crashes. But it does not help in case the underlying hardware fails
Enabling Redis Multi-AZ as a Better Approach to Fault Tolerance
ElastiCache with Memcached
can be scaled up by increasing size and scaled out by adding nodes
nodes can span across multiple AZs within the same region
cached data is spread across the nodes, and a node failure will always result in some data loss from the cluster
supports auto discovery
every node should be homogenous and of same instance type
ElastiCache Redis vs Memcached
complex data objects vs simple key value storage
persistent vs non persistent, pure caching
automatic failover with Multi-AZ vs Multi-AZ not supported
scaling using Read Replicas vs using multiple nodes
backup & restore supported vs not supported
can be used state management to keep the web application stateless
fully managed, fast and powerful, petabyte scale data warehouse service
uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from node and component failures
provides Massive Parallel Processing (MPP) by distributing & parallelizing queries across multiple physical resources
columnar data storage improving query performance and allowing advance compression techniques
only supports Single-AZ deployments and the nodes are available within the same AZ, if the AZ supports Redshift clusters