AWS RDS Read Replicas

RDS Read Replicas

RDS Read Replicas

  • RDS Read Replica is a read-only copy of the DB instance.
  • RDS Read Replicas provide enhanced performance and durability for RDS.
  • RDS Read Replicas allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.
  • RDS Read replicas enable increased scalability and database availability in the case of an AZ failure.
  • Read Replicas can help reduce the load on the source DB instance by routing read queries from applications to the Read Replica.
  • Read replicas can also be promoted when needed to become standalone DB instances.
  • RDS read replicas can be Multi-AZ i.e. set up with their own standby instances in a different AZ.
  • One or more replicas of a given source DB Instance can serve high-volume application read traffic from multiple copies of the data, thereby increasing aggregate read throughput.
  • RDS uses DB engines’ built-in replication functionality to create a special type of DB instance called a Read Replica from a source DB instance. It uses the engines’ native asynchronous replication to update the read replica whenever there is a change to the source DB instance.
  • Read Replicas are eventually consistent due to asynchronous replication.
  • RDS sets up a secure communications channel using public-key encryption between the source DB instance and the read replica, even when replicating across regions.
  • Read replica operates as a DB instance that allows only read-only connections. Applications can connect to a read replica just as they would to any DB instance.
  • Read replicas are available in RDS for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server as well as Aurora.
  • RDS replicates all databases in the source DB instance.
  • RDS supports replication between an RDS MySQL or MariaDB DB instance and a MySQL or MariaDB instance that is external to RDS using Binary Log File Position or  Global Transaction Identifiers (GTIDs) replication.

RDS Read Replicas

Read Replicas Creation

  • Read Replicas can be created within the same AZ, different AZ within the same region, and cross-region as well.
  • Up to five Read Replicas can be created from one source DB instance.
  • Creation process
    • Automatic backups must be enabled on the source DB instance by setting the backup retention period to a value other than 0
    • An existing DB instance needs to be specified as the source.
    • RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot.
    • RDS then uses the asynchronous replication method for the DB engine to update the Read Replica for any changes to the source DB instance.
  • RDS replicates all databases in the source DB instance.
  • RDS sets up a secure communications channel between the source DB instance and the Read Replica if that Read Replica is in a different AWS region from the DB instance.
  • RDS establishes any AWS security configurations, such as adding security group entries, needed to enable the secure channel.
  • During the Read Replica creation, a brief I/O suspension on the source DB instance can be experienced as the DB snapshot occurs.
  • I/O suspension typically lasts about one minute and can be avoided if the source DB instance is a Multi-AZ deployment (in the case of Multi-AZ deployments, DB snapshots are taken from the standby).
  • Read Replica creation time can be slow if any long-running transactions are being executed and should wait for completion
  • For multiple Read Replicas created in parallel from the same source DB instance, only one snapshot is taken at the start of the first create action.
  • A Read Replica can be promoted to a new independent source DB, in which case the replication link is broken between the Read Replica and the source DB.  However, the replication continues for other replicas using the original source DB as the replication source

Read Replica Deletion & DB Failover

  • Read Replicas must be explicitly deleted, using the same mechanisms for deleting a DB instance.
  • If the source DB instance is deleted without deleting the replicas, each replica is promoted to a stand-alone, single-AZ DB instance.
  • If the source instance of a Multi-AZ deployment fails over to the standby, any associated Read Replicas are switched to use the secondary as their replication source.

Read Replica Storage & Compute requirements

  • A Read Replica, by default, is created with the same storage type as the source DB instance.
  • For replication to operate effectively, each Read Replica should have the same amount of compute & storage resources as the source DB instance.
  • Read Replicas should be scaled accordingly if the source DB instance is scaled.

Read Replicas Promotion

  • A read replica can be promoted into a standalone DB instance.
  • When the read replica is promoted
    • New DB instance is rebooted before it becomes available.
    • New DB instance that is created retains the option group and the parameter group of the former read replica.
    • The promotion process can take several minutes or longer to complete, depending on the size of the read replica.
    • If a source DB instance has several read replicas, promoting one of the read replicas to a DB instance has no effect on the other replicas.
  • If you plan to promote a read replica to a standalone instance, AWS recommends that you enable backups and complete at least one backup prior to promotion.
  • Read Replicas Promotion can help with
    • Performing DDL operations (MySQL and MariaDB only)
      • DDL Operations such as creating or rebuilding indexes can take time and can be performed on the read replica once it is in sync with its primary DB instance.
    • Sharding
      • Sharding embodies the “share-nothing” architecture and essentially involves breaking a large database into several smaller databases.
      • Read Replicas can be created and promoted corresponding to each of the shards and then using a hashing algorithm to determine which host receives a given update.
    • Implementing failure recovery
      • Read replica promotion can be used as a data recovery scheme if the primary DB instance fails.

Read Replicas Multi-AZ

  • RDS read replicas can be Multi-AZ and we can have read-only standby instances in a different AZ.
  • Read Replicas is currently supported for MySQL, MariaDB, PostgreSQL, and Oracle database engines.
  • Read Replicas with Multi-AZ help build a resilient disaster recovery strategy and simplify the database engine upgrade process.
  • Read replica as Multi-AZ, allows you to use the read replica as a DR target providing automatic failover.
  • Also, when you promote the read replica to be a standalone database, it will already be Multi-AZ enabled.

Cross-Region Read Replicas

  • Supported for MySQL, PostgreSQL, MariaDB, and Oracle.
  • Not supported for SQL Server
  • Cross-Region Read Replicas help to improve
    • disaster recovery capabilities (reduces RTO and RPO),
    • scale read operations into a region closer to end users,
    • migration from a data center in one region to another region
  • A source DB instance can have cross-region read replicas in multiple AWS Regions.
  • Cross-Region RDS read replica can be created from a source RDS DB instance that is not a read replica of another RDS DB instance.
  • Replica lags are higher for Cross-region replicas. This lag time comes from the longer network channels between regional data centers.
  • RDS can’t guarantee more than five cross-region read replica instances, due to the limit on the number of access control list (ACL) entries for a VPC
  • Read Replica uses the default DB parameter group and DB option group for the specified DB engine.
  • Read Replica uses the default security group.
  • Deleting the source for a cross-Region read replica will result in
    • read replica promotion for MariaDB, MySQL, and Oracle DB instances
    • no read replica promotion for PostgreSQL DB instances and the replication status of the read replica is set to terminated.

Cross-Region Read Replicas

Read Replica Features & Limitations

  • RDS does not support circular replication.
  • DB instance cannot be configured to serve as a replication source for an existing DB instance; a new Read Replica can be created only from an existing DB instance for e.g., if MyDBInstance replicates to ReadReplica1, ReadReplica1 can’t be configured to replicate back to MyDBInstance.  From ReadReplica1, only a new Read Replica can be created, such as ReadRep2.
  • Read Replica can be created from other Read replicas as well. However, the replica lag is higher for these instances and there cannot be more than four instances involved in a replication chain.

Read Replica ComparisionRDS Read Replicas Use Cases

  • Scaling beyond the compute or I/O capacity of a single DB instance for read-heavy database workloads, directing excess read traffic to Read Replica(s)
  • Serving read traffic while the source DB instance is unavailable for e.g. If the source DB instance cannot take I/O requests due to backups I/O suspension or scheduled maintenance, the read traffic can be directed to the Read Replica(s). However, the data might be stale.
  • Business reporting or data warehousing scenarios where business reporting queries can be executed against a Read Replica, rather than the primary, production DB instance.
  • Implementing disaster recovery by promoting the read replica to a standalone instance as a disaster recovery solution, if the primary DB instance fails.

RDS Read Replicas vs Multi-AZ

RDS Mulit-AZ vs Multi-Region vs Read Replicas

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are running a successful multi-tier web application on AWS and your marketing department has asked you to add a reporting tier to the application. The reporting tier will aggregate and publish status reports every 30 minutes from user-generated information that is being stored in your web applications database. You are currently running a Multi-AZ RDS MySQL instance for the database tier. You also have implemented ElastiCache as a database caching layer between the application tier and database tier. Please select the answer that will allow you to successfully implement the reporting tier with as little impact as possible to your database.
    1. Continually send transaction logs from your master database to an S3 bucket and generate the reports off the S3 bucket using S3 byte range requests.
    2. Generate the reports by querying the synchronously replicated standby RDS MySQL instance maintained through Multi-AZ (Standby instance cannot be used as a scaling solution)
    3. Launch a RDS Read Replica connected to your Multi-AZ master database and generate reports by querying the Read Replica.
    4. Generate the reports by querying the ElastiCache database caching tier. (ElasticCache does not maintain full data and is simply a caching solution)
  2. Your company is getting ready to do a major public announcement of a social media site on AWS. The website is running on EC2 instances deployed across multiple Availability Zones with a Multi-AZ RDS MySQL Extra Large DB Instance. The site performs a high number of small reads and writes per second and relies on an eventual consistency model. After comprehensive tests you discover that there is read contention on RDS MySQL. Which are the best approaches to meet these requirements? (Choose 2 answers)
    1. Deploy ElastiCache in-memory cache running in each availability zone
    2. Implement sharding to distribute load to multiple RDS MySQL instances (this is only a read contention, the writes work fine)
    3. Increase the RDS MySQL Instance size and Implement provisioned IOPS (not scalable, this is only a read contention, the writes work fine)
    4. Add an RDS MySQL read replica in each availability zone
  3. Your company has HQ in Tokyo and branch offices all over the world and is using logistics software with a multi-regional deployment on AWS in Japan, Europe and US. The logistic software has a 3-tier architecture and currently uses MySQL 5.6 for data persistence. Each region has deployed its own database. In the HQ region you run an hourly batch process reading data from every region to compute cross-regional reports that are sent by email to all offices this batch process must be completed as fast as possible to quickly optimize logistics. How do you build the database architecture in order to meet the requirements?
    1. For each regional deployment, use RDS MySQL with a master in the region and a read replica in the HQ region
    2. For each regional deployment, use MySQL on EC2 with a master in the region and send hourly EBS snapshots to the HQ region
    3. For each regional deployment, use RDS MySQL with a master in the region and send hourly RDS snapshots to the HQ region
    4. For each regional deployment, use MySQL on EC2 with a master in the region and use S3 to copy data files hourly to the HQ region
    5. Use Direct Connect to connect all regional MySQL deployments to the HQ region and reduce network latency for the batch process
  4. Your business is building a new application that will store its entire customer database on a RDS MySQL database, and will have various applications and users that will query that data for different purposes. Large analytics jobs on the database are likely to cause other applications to not be able to get the query results they need to, before time out. Also, as your data grows, these analytics jobs will start to take more time, increasing the negative effect on the other applications. How do you solve the contention issues between these different workloads on the same data?
    1. Enable Multi-AZ mode on the RDS instance
    2. Use ElastiCache to offload the analytics job data
    3. Create RDS Read-Replicas for the analytics work
    4. Run the RDS instance on the largest size possible
  5. If I have multiple Read Replicas for my master DB Instance and I promote one of them, what happens to the rest of the Read Replicas?
    1. The remaining Read Replicas will still replicate from the older master DB Instance
    2. The remaining Read Replicas will be deleted
    3. The remaining Read Replicas will be combined to one read replica
  6. You need to scale an RDS deployment. You are operating at 10% writes and 90% reads, based on your logging. How best can you scale this in a simple way?
    1. Create a second master RDS instance and peer the RDS groups.
    2. Cache all the database responses on the read side with CloudFront.
    3. Create read replicas for RDS since the load is mostly reads.
    4. Create a Multi-AZ RDS installs and route read traffic to standby.
  7. A customer is running an application in US-West (Northern California) region and wants to setup disaster recovery failover to the Asian Pacific (Singapore) region. The customer is interested in achieving a low Recovery Point Objective (RPO) for an Amazon RDS multi-AZ MySQL database instance. Which approach is best suited to this need?
    1. Synchronous replication
    2. Asynchronous replication
    3. Route53 health checks
    4. Copying of RDS incremental snapshots
  8. A user is using a small MySQL RDS DB. The user is experiencing high latency due to the Multi AZ feature. Which of the below mentioned options may not help the user in this situation?
    1. Schedule the automated back up in non-working hours
    2. Use a large or higher size instance
    3. Use PIOPS
    4. Take a snapshot from standby Replica
  9. My Read Replica appears “stuck” after a Multi-AZ failover and is unable to obtain or apply updates from the source DB Instance. What do I do?
    1. You will need to delete the Read Replica and create a new one to replace it.
    2. You will need to disassociate the DB Engine and re associate it.
    3. The instance should be deployed to Single AZ and then moved to Multi- AZ once again
    4. You will need to delete the DB Instance and create a new one to replace it.
  10. A company is running a batch analysis every hour on their main transactional DB running on an RDS MySQL instance to populate their central Data Warehouse running on Redshift. During the execution of the batch their transactional applications are very slow. When the batch completes they need to update the top management dashboard with the new data. The dashboard is produced by another system running on-premises that is currently started when a manually-sent email notifies that an update is required The on-premises system cannot be modified because is managed by another team. How would you optimize this scenario to solve performance issues and automate the process as much as possible?
    1. Replace RDS with Redshift for the batch analysis and SNS to notify the on-premises system to update the dashboard
    2. Replace RDS with Redshift for the batch analysis and SQS to send a message to the on-premises system to update the dashboard
    3. Create an RDS Read Replica for the batch analysis and SNS to notify me on-premises system to update the dashboard
    4. Create an RDS Read Replica for the batch analysis and SQS to send a message to the on-premises system to update the dashboard.

References

AWS_RDS_Read_Replicas

AWS RDS Multi-AZ Deployment

RDS Multi-AZ Instance Deployment

RDS Multi-AZ Deployment

  • RDS Multi-AZ deployments provide high availability and automatic failover support for DB instances
  • Multi-AZ helps improve the durability and availability of a critical system, enhancing availability during planned system maintenance, DB instance failure, and Availability Zone disruption.
  • A Multi-AZ DB instance deployment
    • has one standby DB instance that provides failover support but doesn’t serve read traffic.
    • There is only one row for the DB instance.
    • The value of Role is Instance or Primary.
    • The value of Multi-AZ is Yes.
  • A Multi-AZ DB cluster deployment
    • has two standby DB instances that provide failover support and can also serve read traffic.
    • There is a cluster-level row with three DB instance rows under it.
    • For the cluster-level row, the value of Role is Multi-AZ DB cluster.
    • For each instance-level row, the value of Role is Writer instance or Reader instance.
    • For each instance-level row, the value of Multi-AZ is 3 Zones.

RDS Multi-AZ DB Instance Deployment

  • RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different AZ.
  • RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.
  • RDS Multi-AZ deployment maintains the same endpoint for the DB Instance after a failover, so the application can resume database operation without the need for manual administrative intervention.
  • Multi-AZ is a high-availability feature and NOT a scaling solution for read-only scenarios; a standby replica can’t be used to serve read traffic. To service read-only traffic, use a Read Replica.
  • RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.
  • Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring.

RDS Multi-AZ Instance Deployment

RDS Multi-AZ DB Cluster Deployment

  • RDS Multi-AZ DB cluster deployment is a high-availability deployment mode of RDS with two readable standby DB instances.
  • RDS Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate AZs in the same AWS Region.
  • With a Multi-AZ DB cluster, RDS semi-synchronously replicates data from the writer DB instance to both of the reader DB instances using the DB engine’s native replication capabilities.
  • Multi-AZ DB clusters provide high availability, increased capacity for read workloads, and lower write latency when compared to Multi-AZ DB instance deployments.
  • If an event of an outage, RDS manages failover from the writer DB instance to one of the reader DB instances. RDS does this based on which reader DB instance has the most recent change record.

RDS Mulit-AZ DB Cluster

Multi-AZ DB Instance vs Multi-AZ DB Cluster

RDS Multi-AZ DB Instance vs DB Cluster

RDS Multi-AZ vs Read Replicas

RDS Mulit-AZ vs Multi-Region vs Read Replicas

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
    1. MySQL Installed on two Amazon EC2 Instances in a single Availability Zone (does not provide High Availability out of the box)
    2. Amazon RDS for MySQL with Multi-AZ
    3. Amazon ElastiCache (Just a caching solution)
    4. Amazon DynamoDB (Not suitable for complex queries and joins)
  2. What would happen to an RDS (Relational Database Service) multi-Availability Zone deployment if the primary DB instance fails?
    1. IP of the primary DB Instance is switched to the standby DB Instance.
    2. A new DB instance is created in the standby availability zone.
    3. The canonical name record (CNAME) is changed from primary to standby.
    4. The RDS (Relational Database Service) DB instance reboots.
  3. Will my standby RDS instance be in the same Availability Zone as my primary?
    1. Only for Oracle RDS types
    2. Yes
    3. Only if configured at launch
    4. No
  4. Is creating a Read Replica of another Read Replica supported?
    1. Only in certain regions
    2. Only with MySQL based RDS
    3. Only for Oracle RDS types
    4. No
  5. A user is planning to set up the Multi-AZ feature of RDS. Which of the below mentioned conditions won’t take advantage of the Multi-AZ feature?
    1. Availability zone outage
    2. A manual failover of the DB instance using Reboot with failover option
    3. Region outage
    4. When the user changes the DB instance’s server type
  6. When you run a DB Instance as a Multi-AZ deployment, the “_____” serves database writes and reads
    1. secondary
    2. backup
    3. stand by
    4. primary
  7. When running my DB Instance as a Multi-AZ deployment, can I use the standby for read or write operations?
    1. Yes
    2. Only with MSSQL based RDS
    3. Only for Oracle RDS instances
    4. No
  8. Read Replicas require a transactional storage engine and are only supported for the _________ storage engine
    1. OracleISAM
    2. MSSQLDB
    3. InnoDB
    4. MyISAM
  9. A user is configuring the Multi-AZ feature of an RDS DB. The user came to know that this RDS DB does not use the AWS technology, but uses server mirroring to achieve replication. Which DB is the user using right now?
    1. MySQL
    2. Oracle
    3. MS SQL
    4. PostgreSQL
  10. If you have chosen Multi-AZ deployment, in the event of a planned or unplanned outage of your primary DB Instance, Amazon RDS automatically switches to the standby replica. The automatic failover mechanism simply changes the ______ record of the main DB Instance to point to the standby DB Instance.
    1. DNAME
    2. CNAME
    3. TXT
    4. MX
  11. When automatic failover occurs, Amazon RDS will emit a DB Instance event to inform you that automatic failover occurred. You can use the _____ to return information about events related to your DB Instance
    1. FetchFailure
    2. DescriveFailure
    3. DescribeEvents
    4. FetchEvents
  12. The new DB Instance that is created when you promote a Read Replica retains the backup window period.
    1. TRUE
    2. FALSE
  13. Will I be alerted when automatic failover occurs?
    1. Only if SNS configured
    2. No
    3. Yes
    4. Only if Cloudwatch configured
  14. Can I initiate a “forced failover” for my MySQL Multi-AZ DB Instance deployment?
    1. Only in certain regions
    2. Only in VPC
    3. Yes
    4. No
  15. A user is accessing RDS from an application. The user has enabled the Multi-AZ feature with the MS SQL RDS DB. During a planned outage how will AWS ensure that a switch from DB to a standby replica will not affect access to the application?
    1. RDS will have an internal IP which will redirect all requests to the new DB
    2. RDS uses DNS to switch over to standby replica for seamless transition
    3. The switch over changes Hardware so RDS does not need to worry about access
    4. RDS will have both the DBs running independently and the user has to manually switch over
  16. Which of the following is part of the failover process for a Multi-AZ Amazon Relational Database Service (RDS) instance?
    1. The failed RDS DB instance reboots.
    2. The IP of the primary DB instance is switched to the standby DB instance.
    3. The DNS record for the RDS endpoint is changed from primary to standby.
    4. A new DB instance is created in the standby availability zone.
  17. Which of these is not a reason a Multi-AZ RDS instance will failover?
    1. An Availability Zone outage
    2. A manual failover of the DB instance was initiated using Reboot with failover
    3. To autoscale to a higher instance class (Refer link)
    4. Master database corruption occurs
    5. The primary DB instance fails
  18. How does Amazon RDS multi Availability Zone model work?
    1. A second, standby database is deployed and maintained in a different availability zone from master, using synchronous replication. (Refer link)
    2. A second, standby database is deployed and maintained in a different availability zone from master using asynchronous replication.
    3. A second, standby database is deployed and maintained in a different region from master using asynchronous replication.
    4. A second, standby database is deployed and maintained in a different region from master using synchronous replication.
  19. A user is using a small MySQL RDS DB. The user is experiencing high latency due to the Multi AZ feature. Which of the below mentioned options may not help the user in this situation?
    1. Schedule the automated back up in non-working hours
    2. Use a large or higher size instance
    3. Use PIOPS
    4. Take a snapshot from standby Replica
  20. What is the charge for the data transfer incurred in replicating data between your primary and standby?
    1. No charge. It is free.
    2. Double the standard data transfer charge
    3. Same as the standard data transfer charge
    4. Half of the standard data transfer charge
  21. A user has enabled the Multi AZ feature with the MS SQL RDS database server. Which of the below mentioned statements will help the user understand the Multi AZ feature better?
    1. In a Multi AZ, AWS runs two DBs in parallel and copies the data asynchronously to the replica copy
    2. In a Multi AZ, AWS runs two DBs in parallel and copies the data synchronously to the replica copy
    3. In a Multi AZ, AWS runs just one DB but copies the data synchronously to the standby replica
    4. AWS MS SQL does not support the Multi AZ feature

Choosing the Right Data Science Specialization: Where to Focus Your Skills

Choosing the Right Data Science Specialization: Where to Focus Your Skills

In the rapidly evolving world of technology, data science stands out as a field of endless opportunities and diverse pathways. With its foundations deeply rooted in statistics, computer science, and domain-specific knowledge, data science has become indispensable for organizations seeking to make data-driven decisions. However, the vastness of this field can be overwhelming, making specialization a strategic necessity for aspiring data scientists.

This article aims to navigate through the labyrinth of data science specializations, helping you align your career with your interests, skills, and the evolving demands of the job market.

Understanding the Breadth of Data Science

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to draw knowledge and discover insights from structured and unstructured data. It includes multiranged activities, from data collection and cleaning to complex algorithmic computations and predictive modeling.

Key Areas Within Data Science

  • Machine Learning: This involves creating algorithms that can learn from pre-fed data and make predictions or decisions based on it.
  • Deep Learning: A specialized subdomain of machine learning, focusing on neural networks and algorithms inspired by the structure and function of the brain.
  • Data Engineering: This is the backbone of data science, focusing on the practical aspects of data collection, storage, and retrieval.
  • Data Visualization: It involves converting complex data sets into understandable and interactive graphical representations.
  • Big Data Analytics: This deals with extracting meaningful insights from very large, diverse data sets that are often beyond the capability of traditional data-processing applications.
  • AI and Robotics: This cutting-edge field combines data science with robotics, focusing on creating machines that can perform actions/operations that typically require human intelligence.

Interconnectivity of These Areas

While these specializations are distinct, they are interconnected. For instance, data engineering is foundational for machine learning, and AI applications often rely on insights derived from big data analytics.

Factors to Consider When Choosing a Specialization

  • Personal Interests and Strengths
    • Your choice should resonate with your personal interests. If you are fascinated by how algorithms can mimic human learning, deep learning could be your calling. Alternatively, if you enjoy the challenges of handling and organizing large data sets, data engineering might suit you.
  • Industry Demand and Job Market Trends
    • It’s crucial to align your specialization with the market demand. Fields like AI and machine learning are rapidly growing and offer numerous job opportunities. Tracking industry trends can provide valuable insights into which specializations are most in demand.
  • Long-term Career Goals
    • Consider where you want to be in your career in the next five to ten years. Some specializations may offer more opportunities for growth, leadership roles, or transitions into different areas of data science.
  • Impact of Emerging Technologies
    • Emerging technologies can redefine the landscape of data science. Continuously updating with the knowledge about these changes can help you choose a specialization that remains relevant in the future.

Deep Dive into Popular Data Science Specializations

  • Machine Learning
    • Overview and Applications: From predictive modeling in finance to recommendation systems in e-commerce, machine learning is revolutionizing various industries.
    • Required Skills and Tools: Proficiency in programming languages like Python or R, understanding of algorithms, and familiarity with TensorFlow or Scikit-learn like machine learning frameworks are essential.
  • Data Engineering
    • Role in Data Science: Data engineers build and maintain the infrastructure that allows data scientists to analyze and utilize data effectively.
    • Key Skills and Technologies: Skills in database management, ETL (Extract, Transform, Load) processes, and knowledge of SQL, NoSQL, Hadoop, and Spark are crucial.
  • Big Data Analytics
    • Understanding Big Data: This specialization deals with extremely large data sets that discover patterns, trends, and associations, particularly relating to human behavior and interactions.
    • Tools and Techniques: Familiarity with big data platforms like Apache Hadoop and Spark, along with data mining and statistical analysis, is important.
  • AI and Robotics
    • The Frontier of Data Science: This field is at the cutting edge, developing intelligent systems with the capability of performing tasks that particularly require human intelligence.
    • Skills and Knowledge Base: A deep understanding of AI principles, programming, and robotics is necessary, along with skills in machine learning and neural networks.

Educational Pathways for Each Specialization

  • Academic Courses and Degrees
    • Pursuing a formal education in data science or a related field can provide a strong theoretical foundation. Many universities like MIT now offer specialized courses in machine learning, AI, and big data analytics, like the Data Analysis Certificate program.
  • Online Courses and Bootcamps
    • Online platforms like Great Learning offer specialized courses that are more flexible and often industry-oriented. Bootcamps, on the other hand, provide intensive, hands-on training in specific areas of data science.
  • Certifications and Workshops
    • Professional certifications from recognized bodies can add significant value to your resume. Educational choices like the Data Science course showcase your expertise and commitment to professional development.
  • Self-learning Resources
    • The internet is replete with resources for self-learners. From online tutorials and forums to webinars and eBooks, the opportunities for self-paced learning in data science are abundant.

Building Experience in Your Chosen Specialization

  • Internships and Entry-level Positions
    • Gaining practical experience is crucial. Internships and entry-level positions provide real-world experience and help you understand the practical challenges and applications of your chosen specialization.
  • Personal and Open-source Projects
    • Working on personal data science projects or contributing to open-source projects can be a great way to apply your skills. These projects can also be a valuable addition to your portfolio.
  • Networking and Community Involvement
    • Building a professional network and participating in data science communities can lead to job opportunities and collaborations. Attending industry conferences and seminars is also a great way to stay updated and connected.
  • Industry Conferences and Seminars
    • These events are excellent for learning about the latest industry trends, best data science practices, and emerging technologies. They also offer opportunities to meet industry leaders and peers.

Future Trends and Evolving Specializations

  • Predicting the Future of Data Science
    • The field of data science is constantly evolving. Staying informed about future trends is crucial for choosing a specialization that will remain relevant and in demand.
  • Emerging Specializations and Technologies
    • Areas like quantum computing, edge analytics, and ethical AI are emerging as new frontiers in data science. These fields are likely to offer exciting new opportunities for specialization in the coming years.
  • Staying Adaptable and Continuous Learning
    • The work-way to a successful career in data science is adaptability and a commitment to continuous learning. The field is dynamic, and staying abreast of new developments is essential.

Conclusion

Choosing the right data science specialization is a critical decision that can shape your career trajectory. It requires a careful consideration of your personal interests, the current job market, and future industry trends. Whether your passion lies in the intricate algorithms of machine learning, the structural challenges of data engineering, or the innovative frontiers of AI and robotics, there is a niche for every aspiring data scientist. The journey is one of continuous learning, adaptability, and an unwavering curiosity about the power of data. As the field continues to grow and diversify, the opportunities for data scientists are bound to expand, offering a rewarding and dynamic career path.

 

2025 Black Friday & Cyber Monday Deals

Udemy – Black Friday Sale (Upto 85% Off)- till 28th Nov


Braincert – till 27th Nov

Use Coupon Code – BLACK_FRIDAY

AWS Certifications

KodeKloud – Black Friday Sale – till 30th Nov



Coursera – till 28th Nov


Whizlabs – Black Friday Sale – till 27th Nov



AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

AWS Database - Specialty Certificate

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

I recently revalidated my AWS Certified Database – Specialty (DBS-C01) certification just before it expired. The format and domains are pretty much the same as the previous exam, however, it has been enhanced to cover a lot of new services.

AWS Certified Database – Specialty (DBS-C01) Exam Content

AWS Certified Database – Specialty (DBS-C01) exam validates your understanding of databases, including the concepts of design, migration, deployment, access, maintenance, automation, monitoring, security, and troubleshooting, and covers the following tasks:

  • Understand and differentiate the key features of AWS database services.
  • Analyze needs and requirements to design and recommend appropriate database solutions using AWS services

Refer to AWS Database – Specialty Exam Guide

DBS-C01 Domains

AWS Certified Database – Specialty (DBS-C01) Exam Summary

  • Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • DBS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
  • DBS-C01 exam includes two types of questions, multiple-choice and multiple-response.
  • DBS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Specialty exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review, move on, and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Database – Specialty (DBS-C01) Exam Resources

AWS Certified Database – Specialty (DBS-C01) Exam Summary

  • AWS Certified Database – Specialty exam focuses completely on AWS Data services from relational, non-relational, graph, caching, and data warehousing. It also covers deployments, automation, migration, security, monitoring, and troubleshooting aspects of them.

Database Services

  • Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Aurora, RDS, DynamoDB
  • DynamoDB
    • is a fully managed NoSQL database service providing single-digit millisecond latency.
    • DynamoDB provisioned throughput supports On-demand and provisioned throughput capacity modes.
      • On-demand mode
        • provides a flexible billing option capable of serving thousands of requests per second without capacity planning
        • does not support reserved capacity
      • Provisioned mode
        • requires you to specify the number of reads and writes per second as required by the application
        • Understand the provisioned capacity calculations
    • DynamoDB Auto Scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
    • Know DynamoDB Burst capacity, Adaptive capacity
    • DynamoDB Consistency mode determines the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.
      • supports eventual and strongly consistent reads.
      • Eventual requires less throughput but might return stale data, whereas, Strongly consistent reads require higher throughput but would always return correct data.
    • DynamoDB secondary indexes provide efficient access to data with attributes other than the primary key.
      • LSI uses the same partition key but a different sort key, whereas, GSI is a separate table with a different partition key and/or sort key.
      • GSI can cause primary table throttling if under-provisioned.
      • Make sure you understand the difference between the Local Secondary Index and the Global Secondary Index
    • DynamoDB Global Tables is a new multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
    • DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed. (hint: know TTL can expire the data and this can be captured by using DynamoDB Streams)
    • DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
    • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
    • DynamoDB Triggers (just like database triggers) is a feature that allows the execution of custom actions based on item-level updates on a table.
    • DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement even at millions of requests per second.
      • DAX does not support fine-grained access control like DynamoDB.
    • DynamoDB Backups support PITR
      • AWS Backup can be used to backup and restore, and it supports cross-region snapshot copy as well.
    • VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway
    • Understand DynamoDB Best practices (hint: selection of keys to avoid hot partitions and creation of LSI and GSI)
  • Aurora
    • is a relational database engine that combines the speed and reliability with the simplicity and cost-effectiveness of open-source databases. 
    • provides MySQL and PostgreSQL compatibility
    • Aurora Disaster Recovery & High Availability can be achieved using Read Replicas with very minimal downtime.
      • Aurora promotes read replicas as per the priority tier (tier 0 is the highest), the largest size if the tier matches
    • Aurora Global Database provides cross-region read replicas for low-latency reads. Remember it is not multi-master and would not provide low latency writes across regions as DynamoDB Global tables.
    • Aurora Connection endpoints support
      • Cluster for primary read/write
      • Reader for read replicas
      • Custom for a specific group of instances
      • Instance for specific single instance – Not recommended
    • Aurora Fast Failover techniques
      • set TCP keepalives low
      • set Java DNS caching timeouts low
      • Set the timeout variables used in the JDBC connection string as low
      • Use the provided read and write Aurora endpoints
      • Use cluster cache management for Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there’s a failover.
    • Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
    • Aurora Backtrack feature helps rewind the DB cluster to the specified time. It is not a replacement for backups.
    • Aurora Server Auditing Events for different activities cover log-in, DML, permission changes DCL, schema changes DDL, etc.
    • Aurora Cluster Cache management feature which helps fast failover
    • Aurora Clone feature which allows you to create quick and cost-effective clones
    • Aurora supports fault injection queries to simulate various failovers like node down, primary failover, etc.
    • RDS PostgreSQL and MySQL can be migrated to Aurora, by creating an Aurora Read Replica from the instance. Once the replica lag is zero, switch the DNS with no data loss
    • Aurora Database Activity Streams help stream audit logs to external services like Kinesis
    • Supports stored procedures calling lambda functions
  • Relational Database Service (RDS)
    • provides a relational database in the cloud with multiple database options.
    • RDS Snapshots, Backups, and Restore
      • restoring a DB from a snapshot does not retain the parameter group and security group
      • automated snapshots cannot be shared. Make a manual backup from the snapshot before sharing the same.
    • RDS Read Replicas
      • allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.
      • increased scalability and database availability in the case of an AZ failure.
      • supports cross-region replicas.
    • RDS Multi-AZ provides high availability and automatic failover support for DB instances.
    • Understand the differences between RDS Multi-AZ vs Read Replicas
      • Multi-AZ failover can be simulated using Reboot with Failure option
      • Read Replicas require automated backups enabled
    • Understand DB components esp. DB parameter group, DB options groups
      • Dynamic parameters are applied immediately
      • Static parameters need manual reboot.
      • Default parameter group cannot be modified. Need to create custom parameter group and associate to RDS
      • Know max connections also depends on DB instance size
    • RDS Custom automates database administration tasks and operations. while making it possible for you as a database administrator to access and customize the database environment and operating system.
    • RDS Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess the load on the database, and determine when and where to take action. 
    • RDS Security
      • RDS supports security groups to control who can access RDS instances
      • RDS supports data at rest encryption and SSL for data in transit encryption
      • RDS supports IAM database authentication with temporary credentials.
      • Existing RDS instance cannot be encrypted, create a snapshot -> encrypt it –> restore as encrypted DB
      • RDS PostgreSQL requires rds.force_ssl=1 and sslmode=ca/verify-full to enable SSL encryption
      • Know RDS Encrypted Database limitations
    • Understand RDS Monitoring and Notification
      • Know RDS supports notification events through SNS for events like database creation, deletion, snapshot creation, etc.
      • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
      • Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
      • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
    • RDS instance cannot be stopped if with read replicas
  • ElastiCache
    • is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.
    • Understand the differences between Redis vs. Memcached
  • Neptune
    • is a fully managed database service built for the cloud that makes it easier to build and run graph applications. Neptune provides built-in security, continuous backups, serverless compute, and integrations with other AWS services.  
    • provides Neptune loader to quickly import data from S3
    • supports VPC endpoints
  • Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service.
  • Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
  • Redshift
    • is a fully managed, fast, and powerful, petabyte-scale data warehouse service. It is not covered in depth.
    • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
      • COPY command which allows parallelism, and performs better than multiple COPY commands
      • COPY command can use manifest files to load data
      • COPY command handles encrypted data
    • Know Redshift cross region encrypted snapshot copy
      • Create a new key in destination region
      • Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region.
      • In the source region, enable cross-region replication and specify the name of the copy grant created.
    • Know Redshift supports Audit logging which covers authentication attempts, connections and disconnections usually for compliance reasons.
  • Data Migration Service (DMS)
    • DMS helps in migration of homogeneous and heterogeneous database
    • DMS with Full load plus Change Data Capture (CDC) migration capability can be used to migrate databases with zero downtime and no data loss.
    • DMS with SCT (Schema Conversion Tool) can be used to migrate heterogeneous databases.
    • Premigration Assessment evaluates specified components of a database migration task to help identify any problems that might prevent a migration task from running as expected.
    • Multiserver assessment report evaluates multiple servers based on input that you provide for each schema definition that you want to assess.
    • DMS provides support for data validation to ensure that your data was migrated accurately from the source to the target.
    • DMS supports LOB migration as a 2-step process. It can do a full or limited LOB migration
      • In full LOB mode, AWS DMS migrates all LOBs from source to target regardless of size. Full LOB mode can be quite slow.
      • In limited LOB mode, a maximum LOB size can be set that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is issued to the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
      • Recommended to use limited LOB mode whenever possible.

Security, Identity & Compliance

  • Identity and Access Management (IAM)
  • Key Management Services
    • is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
    • provides data at rest encryption for the databases.
  • AWS Secrets Manager
    • protects secrets needed to access applications, services, etc.
    • enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
    • supports automatic rotation of credentials for RDS, DocumentDB, etc.
  • Secrets Manager vs. Systems Manager Parameter Store
    • Secrets Manager supports automatic rotation while SSM Parameter Store does not
    • Parameter Store is cost-effective as compared to Secrets Manager.
  • Trusted Advisor provides RDS Idle instances

Management & Governance Tools

  • Understand AWS CloudWatch for Logs and Metrics.
    • EventBridge (CloudWatch Events) provides real-time alerts
    • CloudWatch can be used to store RDS logs with a custom retention period, which is indefinite by default.
    • CloudWatch Application Insights support .Net and SQL Server monitoring
  • Know CloudFormation for provisioning, in terms of
    • Stack drifts – to understand the difference between current state and on actual environment with any manual changes
    • Change Set – allows you to verify the changes before being propagated
    • parameters – allows you to configure variables or environment-specific values
    • Stack policy defines the update actions that can be performed on designated resources.
    • Deletion policy for RDS allows you to configure if the resources are retained, snapshot, or deleted once destroy is initiated
    • Supports secrets manager for DB credentials generation, storage, and easy rotation
    • System parameter store for environment-specific parameters

Whitepapers and articles

On the Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

Amazon DynamoDB with VPC Endpoints

DynamoDB VPC Endpoint

DynamoDB with VPC Endpoints

  • By default, communications to and from DynamoDB use the HTTPS protocol, which protects network traffic by using SSL/TLS encryption.
  • A VPC endpoint for DynamoDB enables EC2 instances in the VPC to use their private IP addresses to access DynamoDB with no exposure to the public internet.
  • Traffic between the VPC and the AWS service does not leave the Amazon network.
  • EC2 instances do not require public IP addresses, an internet gateway, a NAT device, or a virtual private gateway in the VPC.

  • VPC endpoint for DynamoDB routes any requests to a DynamoDB endpoint within the Region to a private DynamoDB endpoint within the Amazon network.
  • Applications running on EC2 instances in the VPC don’t need to be modified.
  • Endpoint name remains the same, but the route to DynamoDB stays entirely within the Amazon network and does not access the public internet.
  • VPC Endpoint Policies to control access to DynamoDB.

DynamoDB VPC Endpoint

Types of VPC Endpoints for DynamoDB

  • DynamoDB supports two types of VPC endpoints: Gateway Endpoints and Interface Endpoints (using AWS PrivateLink).
  • Both types keep network traffic on the AWS network.
  • Gateway endpoints and interface endpoints can be used together in the same VPC.

Gateway Endpoints

  • A gateway endpoint is specified in the route table to access DynamoDB from the VPC over the AWS network.
  • Use DynamoDB public IP addresses.
  • Do not allow access from on-premises networks.
  • Do not allow access from another AWS Region.
  • Not billed – Gateway endpoints are free of charge.
  • Available only in the Region where created.
  • Supported for both DynamoDB tables and DynamoDB Streams.

Interface Endpoints (AWS PrivateLink)

  • Announced in March 2024, DynamoDB now supports AWS PrivateLink for interface endpoints.
  • Use private IP addresses from the VPC to route requests to DynamoDB.
  • Represented by one or more elastic network interfaces (ENIs) with private IP addresses.
  • Allow access from on-premises networks via AWS Direct Connect or Site-to-Site VPN.
  • Allow cross-region access from another VPC using VPC peering or AWS Transit Gateway.
  • Billed – Interface endpoints incur hourly charges and data processing charges.
  • Support up to 50,000 requests per second per endpoint.
  • Compatible with existing gateway endpoints in the same VPC.
  • Enable simplified private network connectivity from on-premises workloads to DynamoDB.

Choosing Between Gateway and Interface Endpoints

  • Use Gateway Endpoints when:
    • Access is only needed from within the VPC.
    • Cost optimization is a priority (gateway endpoints are free).
    • Simple VPC-only connectivity is sufficient.
  • Use Interface Endpoints when:
    • Access is needed from on-premises networks via Direct Connect or VPN.
    • Cross-region access is required via VPC peering or Transit Gateway.
    • Private IP addressing is required for compliance or security policies.
    • Integration with AWS Management Console Private Access is needed.
  • Use Both Together when:
    • In-VPC applications can use the free gateway endpoint.
    • On-premises applications use interface endpoints for private connectivity.
    • This approach optimizes costs while enabling hybrid connectivity.

DynamoDB Streams with AWS PrivateLink

  • Announced in March 2025, DynamoDB Streams now supports AWS PrivateLink.
  • Allows invoking DynamoDB Streams APIs from within the VPC without traversing the public internet.
  • Only interface endpoints are supported for DynamoDB Streams – gateway endpoints are not supported.
  • Enables private connectivity for stream processing applications running on-premises or in other regions.
  • Supports FIPS endpoints in US and Canada commercial AWS Regions (announced November 2025).
  • To use DynamoDB console with AWS Management Console Private Access, create VPC endpoints for both:
    • com.amazonaws.<region>.dynamodb
    • com.amazonaws.<region>.dynamodb-streams

DynamoDB Accelerator (DAX) with AWS PrivateLink

  • Announced in October 2025, DAX now supports AWS PrivateLink.
  • Enables secure access to DAX management APIs (CreateCluster, DescribeClusters, DeleteCluster) over private IP addresses within the VPC.
  • Customers can access DAX using private DNS names.
  • Provides private connectivity for DAX cluster management operations.

IPv6 Support

  • Announced in October 2025, DynamoDB now supports Internet Protocol version 6 (IPv6).
  • IPv6 addresses can be used in VPCs when connecting to:
    • DynamoDB tables
    • DynamoDB Streams
    • DynamoDB Accelerator (DAX)
  • IPv6 support includes both AWS PrivateLink Gateway and Interface endpoints.
  • DAX supports IPv6 addressing with IPv4-only, IPv6-only, or dual-stack networking modes.
  • Available in all commercial AWS Regions and AWS GovCloud (US) Regions.

VPC Endpoint Policies

  • Endpoint policies can be attached to VPC endpoints to control access to DynamoDB.
  • Policies specify:
    • IAM principals that can perform actions
    • Actions that can be performed
    • Resources on which actions can be performed
  • Can restrict access to specific DynamoDB tables from a VPC endpoint.
  • Useful for implementing least-privilege access controls.

Considerations and Limitations

  • AWS PrivateLink for DynamoDB does not support:
    • Transport Layer Security (TLS) 1.1
    • Private and Hybrid Domain Name System (DNS) services
  • Network connectivity timeouts to AWS PrivateLink endpoints need to be handled by applications.
  • Interface endpoints support up to 50,000 requests per second per endpoint.
  • When using both gateway and interface endpoints together, applications must use endpoint-specific DNS names to route traffic through interface endpoints.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the services supported by VPC endpoints, using the Gateway endpoint type?
    1. Amazon EFS
    2. Amazon DynamoDB
    3. Amazon Glacier
    4. Amazon SQS
  2. A business application is hosted on Amazon EC2 and uses Amazon DynamoDB for its storage. The chief information security officer has directed that no application traffic between the two services should traverse the public internet. Which capability should the solutions architect use to meet the compliance requirements?
    1. AWS Key Management Service (AWS KMS)
    2. VPC endpoint
    3. Private subnet
    4. Virtual private gateway
  3. A company runs an application in the AWS Cloud and uses Amazon DynamoDB as the database. The company deploys Amazon EC2 instances to a private network to process data from the database. The company uses two NAT instances to provide connectivity to DynamoDB.
    The company wants to retire the NAT instances. A solutions architect must implement a solution that provides connectivity to DynamoDB and that does not require ongoing management. What is the MOST cost-effective solution that meets these requirements?

    1. Create a gateway VPC endpoint to provide connectivity to DynamoDB.
    2. Configure a managed NAT gateway to provide connectivity to DynamoDB.
    3. Establish an AWS Direct Connect connection between the private network and DynamoDB.
    4. Deploy an AWS PrivateLink endpoint service between the private network and DynamoDB.
  4. A company has an on-premises data center connected to AWS via AWS Direct Connect. The company needs to access DynamoDB tables from on-premises applications without traversing the public internet. What is the BEST solution?
    1. Create a gateway VPC endpoint for DynamoDB.
    2. Create an interface VPC endpoint (AWS PrivateLink) for DynamoDB.
    3. Configure a NAT gateway in the VPC.
    4. Use an internet gateway with security groups.
  5. A solutions architect needs to enable private connectivity to DynamoDB Streams for a stream processing application. Which VPC endpoint type should be used?
    1. Gateway endpoint only
    2. Interface endpoint only
    3. Either gateway or interface endpoint
    4. Both gateway and interface endpoints together
  6. A company wants to minimize costs for accessing DynamoDB from EC2 instances within the same VPC while maintaining private connectivity. What should they implement?
    1. Interface VPC endpoint
    2. Gateway VPC endpoint
    3. NAT gateway
    4. Internet gateway with security groups
  7. Which of the following are true about DynamoDB interface endpoints? (Select TWO)
    1. They support access from on-premises networks via Direct Connect or VPN.
    2. They are free of charge.
    3. They use private IP addresses from the VPC.
    4. They cannot be used with gateway endpoints in the same VPC.
    5. They support unlimited requests per second.

References

Amazon DynamoDB Time to Live – TTL

DynamoDB Time to Live – TTL

  • DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed.
  • After the date and time of the specified timestamp, DynamoDB deletes the item from the table without consuming any write throughput.

  • DynamoDB TTL is provided at no extra cost and can help reduce data storage by retaining only required data.
  • Items that are deleted from the table are also removed from any local secondary index and global secondary index in the same way as a DeleteItem operation.
  • DynamoDB Stream tracks the delete operation as a system delete, not a regular one.

How TTL Works

  • TTL allows defining a per-item expiration timestamp that indicates when an item is no longer needed.
  • DynamoDB automatically deletes expired items within a few days of their expiration time, without consuming write throughput.
  • Deletion Timeline: DynamoDB typically deletes expired items within two days (48 hours) of expiration.
    • The exact duration depends on the workload nature and table size.
    • Deletion rate is proportional to the total number of TTL-expired items.
  • Items pending deletion: Expired items that haven’t been deleted yet will still appear in reads, queries, and scans.
    • Use filter expressions to remove expired items from Scan and Query results.
    • Expired items can still be updated, including changing or removing their TTL attributes.
    • When updating expired items, use a condition expression to ensure the item hasn’t been subsequently deleted.
  • TTL process runs in the background as a low-priority task to avoid impacting table performance.
  • TTL deletions do not consume Write Capacity Units (WCU) in provisioned mode or Write Request Units in on-demand mode.

TTL Requirements and Limitations

  • Data Type: TTL attributes must use the Number data type. Other data types, such as String, are not supported.
  • Time Format: TTL attributes must use the Unix epoch time format (seconds since January 1, 1970, 00:00:00 UTC).
    • Be sure that the timestamp is in seconds, not milliseconds.
    • Items with TTL attributes that are not a Number type are ignored by the TTL process.
  • Five-Year Past Limitation: To be considered for expiry and deletion, the TTL timestamp cannot be more than five years in the past.
    • This prevents accidental deletion of historical data with very old timestamps.
    • Items with TTL values older than five years in the past are ignored by the TTL process.
  • Future Expiration: No limit on how far in the future the TTL timestamp can be set.
  • Attribute Selection: Only one attribute per table can be designated as the TTL attribute.

TTL with Global Tables

  • When using Global Tables version 2019.11.21 (Current), DynamoDB replicates TTL deletes to all replica tables.
  • Write Capacity Consumption:
    • The initial TTL delete does not consume WCU in the region where the TTL expiry occurs.
    • The replicated TTL delete to replica table(s) consumes a replicated Write Capacity Unit (provisioned mode) or Replicated Write Unit (on-demand mode) in each replica region.
    • Applicable charges apply for replicated TTL deletes.

TTL and DynamoDB Streams

  • Deleted items are sent to DynamoDB Streams as system deletions (not user deletions).
  • Stream records for TTL deletions include a special attribute to identify them as TTL-triggered deletions.
  • Can be used to trigger downstream actions via AWS Lambda, such as:
    • Archiving expired items to S3 or Amazon S3 Glacier.
    • Sending notifications when items expire.
    • Maintaining audit logs of deleted items.

Common Use Cases

  • TTL is useful if the stored items lose relevance after a specific time. For example:
    • Session Management: Remove user session data after inactivity period (e.g., 30 days).
    • IoT and Sensor Data: Remove sensor data after a year of inactivity.
    • Temporary Data: Delete temporary records like shopping carts, draft documents, or cache entries.
    • Compliance and Data Retention: Retain sensitive data for a certain amount of time according to contractual or regulatory obligations (e.g., GDPR, HIPAA).
    • Event Data: Remove event logs, audit trails, or metrics after a retention period.
    • Archive to S3: Archive expired items to an S3 data lake via DynamoDB Streams and AWS Lambda before deletion.

Best Practices

  • Calculate TTL on Write: Compute the expiration timestamp when creating or updating items.
    • For new items: TTL = createdAt + retention_period
    • For updated items: TTL = updatedAt + retention_period
  • Use Filter Expressions: Filter out expired items in application queries to avoid processing items pending deletion.
  • Archive Before Deletion: Use DynamoDB Streams with Lambda to archive important data to S3 before TTL deletion.
  • Monitor TTL Deletions: Track TTL deletion metrics using CloudWatch to ensure the deletion rate meets expectations.
  • Test TTL Behavior: Use the TTL preview feature in the DynamoDB console to simulate deletions before enabling TTL.
  • Avoid Very Old Timestamps: Ensure TTL values are not more than five years in the past to prevent them from being ignored.
  • Consider Global Table Costs: Account for replicated write costs when using TTL with Global Tables.

Enabling TTL

  • TTL can be enabled on a table through:
    • AWS Management Console
    • AWS CLI
    • AWS SDKs
    • AWS CloudFormation
  • Specify the attribute name that will store the TTL timestamp.
  • TTL can be enabled or disabled at any time without impacting table performance.
  • Changing the TTL attribute requires disabling TTL first, then re-enabling with the new attribute.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company developed an application by using AWS Lambda and Amazon DynamoDB. The Lambda function periodically pulls data from the company’s S3 bucket based on date and time tags and inserts specific values into a DynamoDB table for further processing. The company must remove data that is older than 30 days from the DynamoDB table. Which solution will meet this requirement with the MOST operational efficiency?
    1. Update the Lambda function to add the Version attribute in the DynamoDB table. Enable TTL on the DynamoDB table to expire entries that are older than 30 days based on the TTL attribute.
    2. Update the Lambda function to add the TTL attribute in the DynamoDB table. Enable TTL on the DynamoDB table to expire entries that are older than 30 days based on the TTL attribute.
    3. Use AWS Step Functions to delete entries that are older than 30 days.
    4. Use EventBridge to schedule the Lambda function to delete entries that are older than 30 days.
  2. A company stores IoT sensor data in a DynamoDB table. The data must be retained for 90 days for analysis and then automatically deleted. The solution must minimize costs. What should a solutions architect recommend?
    1. Create a Lambda function to scan and delete items older than 90 days, triggered daily by EventBridge.
    2. Enable TTL on the DynamoDB table with an expiration attribute set to 90 days from the item creation time.
    3. Use DynamoDB Streams with Lambda to move data to S3 Glacier after 90 days and delete from DynamoDB.
    4. Create a scheduled AWS Batch job to delete items older than 90 days.
  3. A DynamoDB table has TTL enabled. A developer notices that some items with expired TTL timestamps are still appearing in query results. What is the MOST likely explanation?
    1. TTL is not working correctly and needs to be disabled and re-enabled.
    2. The TTL attribute is using the wrong data type.
    3. Items are expired but have not yet been deleted by the background TTL process, which can take up to 48 hours.
    4. The TTL timestamp is more than five years in the past.
  4. A company uses DynamoDB Global Tables across three regions. TTL is enabled on the table. How are write capacity units consumed for TTL deletions?
    1. TTL deletions consume WCU in all regions including the region where expiration occurs.
    2. TTL deletions do not consume WCU in the region where expiration occurs, but consume replicated write units in replica regions.
    3. TTL deletions do not consume any WCU in any region.
    4. TTL deletions consume double WCU in the region where expiration occurs.
  5. What is the correct format for a DynamoDB TTL attribute value to expire an item on January 1, 2027, at 00:00:00 UTC?
    1. 2027-01-01T00:00:00Z (ISO 8601 format)
    2. 1735689600000 (milliseconds since epoch)
    3. 1735689600 (seconds since epoch)
    4. “1735689600” (string representation of seconds)
  6. A company wants to archive expired DynamoDB items to S3 before they are deleted by TTL. What is the BEST approach?
    1. Create a Lambda function that scans the table for expired items and copies them to S3 before TTL deletes them.
    2. Enable DynamoDB Streams and use a Lambda function to detect TTL deletions and archive items to S3.
    3. Disable TTL and use a scheduled Lambda function to manually delete items after archiving to S3.
    4. Use AWS Backup to automatically archive items before TTL deletion.
  7. Which of the following statements about DynamoDB TTL are correct? (Select TWO)
    1. TTL deletions consume write capacity units in the source region.
    2. TTL timestamps must be in Unix epoch time format in seconds.
    3. TTL can use String data type for the expiration attribute.
    4. TTL timestamps cannot be more than five years in the past to be considered for deletion.
    5. TTL guarantees deletion within exactly 48 hours of expiration.

References

Amazon DynamoDB Global Tables

Amazon DynamoDB Global Tables

  • DynamoDB Global Tables is a fully managed, serverless, multi-Region, multi-active database.
  • Global tables provide 99.999% availability, increased application resiliency, and improved business continuity.
  • Global table’s automatic cross-Region replication capability helps achieve fast, local read and write performance and regional fault tolerance for database workloads.
  • Applications can now perform reads and writes to DynamoDB in AWS Regions around the world, with changes in any Region propagated to every Region where a table is replicated.
  • Global Tables help in building applications to advantage of data locality to reduce overall latency.
  • Global Tables replicates data among Regions within a single AWS account.

Global Tables Working

  • Global Table is a collection of one or more replica tables, all owned by a single AWS account.
  • A single Amazon DynamoDB global table can only have one replica table per AWS Region.
  • Each replica table stores the same set of data items, has the same table name, and the same primary key schema.
  • When an application writes data to a replica table in one Region, DynamoDB replicates the writes to other replica tables in the other AWS Regions.
  • All replicas in a global table share the same table name, primary key schema, and item data.

Consistency Modes

  • When creating a global table, a consistency mode must be configured.
  • Global tables support two consistency modes: Multi-Region Eventual Consistency (MREC) and Multi-Region Strong Consistency (MRSC).
  • If no consistency mode is specified, the global table defaults to MREC.
  • A global table cannot contain replicas configured with different consistency modes.
  • Consistency mode cannot be changed after creation.

Multi-Region Eventual Consistency (MREC) – Default

  • MREC is the default consistency mode for global tables.
  • Item changes are asynchronously replicated to all other replicas, typically within a second or less.
  • Conflict Resolution: Uses Last Write Wins approach based on the latest internal timestamp on a per-item basis.
  • Consistency Behavior:
    • Supports eventual consistency for cross-Region reads.
    • Supports strong consistency for same-Region reads (returns latest version if item was last updated in that Region).
    • May return stale data for strongly consistent reads if the item was last updated in a different Region.
  • Recovery Point Objective (RPO): Equal to replication delay between replicas (usually a few seconds).
  • Replica Management:
    • Create by adding a replica to an existing DynamoDB table.
    • Can add replicas to expand to more Regions or remove replicas if no longer needed.
    • Can have a replica in any Region where DynamoDB is available.
    • No performance impact when adding replicas.
  • Requirements: Requires DynamoDB Streams enabled with New and Old image settings.
  • Use Cases:
    • Applications that can tolerate stale data from strongly consistent reads if data was updated in another Region.
    • Prioritize lower write and strongly consistent read latencies over multi-Region read consistency.
    • Multi-Region high availability strategy can tolerate RPO greater than zero.

Multi-Region Strong Consistency (MRSC) – January 2025

  • Announced at AWS re:Invent 2024 (preview) and generally available in January 2025.
  • Item changes are synchronously replicated to at least one other Region before the write operation returns a successful response.
  • Zero RPO: Provides Recovery Point Objective (RPO) of zero for highest resilience.
  • Consistency Behavior:
    • Strongly consistent read operations on any MRSC replica always return the latest version of an item.
    • Conditional writes always evaluate against the latest version of an item.
    • Provides strong read-after-write consistency across all Regions.
  • Deployment Requirements:
    • Must be deployed in exactly three Regions.
    • Can configure with three replicas OR two replicas + one witness.
    • Witness: A component that contains data written to replicas and supports MRSC’s availability architecture. Cannot perform read/write operations on a witness. Witness is owned and managed by DynamoDB.
  • Regional Availability: Available in three Region sets:
    • US Region set: US East (N. Virginia), US East (Ohio), US West (Oregon)
    • EU Region set: Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt)
    • AP Region set: Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Osaka)
    • MRSC global tables cannot span Region sets (e.g., cannot mix US and EU Regions).
  • Creation Requirements:
    • Create by adding one replica and a witness OR two replicas to an existing DynamoDB table.
    • Table must be empty when converting to MRSC (existing items not supported).
    • Cannot add additional replicas after creation.
    • Cannot delete a single replica or witness (must delete two replicas or one replica + witness to convert back to single-Region table).
  • Write Conflicts: Write operation fails with ReplicatedWriteConflictException when attempting to modify an item already being modified in another Region. Failed writes can be retried.
  • Limitations:
    • Time to Live (TTL): Not supported
    • Local Secondary Indexes (LSIs): Not supported
    • Transactions: Not supported (TransactWriteItems and TransactGetItems return errors)
    • DynamoDB Streams: Not used for replication (can be enabled separately)
  • Performance Trade-off: Higher write and strongly consistent read latencies compared to MREC.
  • Use Cases:
    • Need strongly consistent reads across multiple Regions.
    • Prioritize global read consistency over lower write latency.
    • Multi-Region high availability strategy requires RPO of zero.
    • Financial applications, inventory management, or any system requiring strict consistency.

Pricing Reduction (November 2024)

  • Effective November 1, 2024, DynamoDB reduced prices for global tables by up to 67%.
  • On-demand mode: Global tables cost 67% less than before.
  • Provisioned capacity mode: Global tables cost 33% less than before.
  • Makes global tables significantly more cost-effective for multi-Region deployments.

Replication and Throughput

  • MREC Replication:
    • Uses DynamoDB Streams to replicate changes.
    • Streams are enabled by default on all replicas and cannot be disabled.
    • Replication process may combine multiple changes into a single replicated write.
    • Stream records are ordered per-item but ordering between items may differ between replicas.
  • MRSC Replication:
    • Does not use DynamoDB Streams for replication.
    • Streams can be enabled separately if needed.
    • Stream records are identical for every replica, including ordering.
  • Provisioned Mode:
    • Replication consumes write capacity.
    • Auto scaling settings for read and write capacities are synchronized between replicas.
    • Read capacity can be independently configured per replica using ProvisionedThroughputOverride.
  • On-demand Mode:
    • Write capacity is automatically synchronized across all replicas.
    • DynamoDB automatically adjusts capacity based on traffic.

Monitoring and Testing

  • Replication Latency (MREC only):
    • MREC global tables publish ReplicationLatency metric to CloudWatch.
    • Tracks elapsed time between item write in one replica and appearance in another replica.
    • Expressed in milliseconds for every source-destination Region pair.
    • MRSC global tables do not publish this metric (synchronous replication).
  • Fault Injection Testing:
    • Both MREC and MRSC integrate with AWS Fault Injection Service (AWS FIS).
    • Can simulate Region isolation by pausing replication to/from a selected replica.
    • Test error handling, recovery mechanisms, and multi-Region traffic shift behavior.

Additional Features and Considerations

  • Time to Live (TTL):
    • MREC: Supported. TTL settings synchronized across all replicas. TTL deletes replicated to all replicas (charged for replicated deletes).
    • MRSC: Not supported.
  • Transactions:
    • MREC: Supported but only atomic within the Region where invoked. Not replicated as a unit.
    • MRSC: Not supported.
  • Point-in-Time Recovery (PITR):
    • Can be enabled on each local replica independently.
    • PITR settings are not synchronized between replicas.
  • DynamoDB Accelerator (DAX):
    • Writes to global table replicas bypass DAX, updating DynamoDB directly.
    • DAX caches can become stale and are only refreshed when cache TTL expires.
  • Settings Synchronization:
    • Always synchronized: Capacity mode, write capacity, GSI definitions, encryption, TTL (MREC)
    • Can be overridden per replica: Read capacity, table class
    • Never synchronized: Deletion protection, PITR, tags, Contributor Insights

DynamoDB Global Tables vs. Aurora Global Databases

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is building a web application on AWS. The application requires the database to support read and write operations in multiple AWS Regions simultaneously. The database also needs to propagate data changes between Regions as the changes occur. The application must be highly available and must provide a latency of single-digit milliseconds. Which solution meets these requirements?
    1. Amazon DynamoDB global tables
    2. Amazon DynamoDB streams with AWS Lambda to replicate the data
    3. An Amazon ElastiCache for Redis cluster with cluster mode enabled and multiple shards
    4. An Amazon Aurora global database
  2. A financial services company requires a multi-Region database with zero data loss (RPO = 0) and strongly consistent reads across all Regions. Which DynamoDB global tables consistency mode should they use?
    1. Multi-Region Eventual Consistency (MREC)
    2. Multi-Region Strong Consistency (MRSC)
    3. Single-Region Strong Consistency
    4. Cross-Region Read Replicas
  3. A company wants to create a DynamoDB global table with MRSC for their inventory management system. They have existing data in a table in us-east-1. What must they do before converting to MRSC?
    1. Enable DynamoDB Streams on the table.
    2. Configure three replicas in different Regions.
    3. Empty the table of all existing data.
    4. Enable Point-in-Time Recovery (PITR).
  4. A company has a DynamoDB global table with MREC configured across us-east-1, eu-west-1, and ap-southeast-1. An item is updated simultaneously in us-east-1 and eu-west-1. How does DynamoDB resolve this conflict?
    1. The write in the primary Region takes precedence.
    2. Last Write Wins based on the latest internal timestamp.
    3. Both writes are rejected and must be retried.
    4. The write with the larger data size takes precedence.
  5. A company wants to deploy a DynamoDB global table with MRSC. They need replicas in us-east-1, eu-west-1, and ap-southeast-1. What will happen?
    1. The global table will be created successfully.
    2. The creation will fail because MRSC cannot span Region sets.
    3. The global table will be created with MREC instead.
    4. A witness will be automatically placed in a fourth Region.
  6. Which of the following features are NOT supported with DynamoDB MRSC global tables? (Select THREE)
    1. Time to Live (TTL)
    2. DynamoDB Streams
    3. Local Secondary Indexes (LSIs)
    4. Global Secondary Indexes (GSIs)
    5. Transaction operations (TransactWriteItems)
    6. Point-in-Time Recovery (PITR)
  7. A company has a DynamoDB global table with MREC. They perform a strongly consistent read in us-west-2, but the item was last updated in eu-west-1. What will the read return?
    1. The latest version of the item from eu-west-1.
    2. Potentially stale data (the version before the eu-west-1 update).
    3. An error indicating the item is being replicated.
    4. The read will be automatically redirected to eu-west-1.
  8. What is the typical replication latency for DynamoDB global tables with MREC?
    1. 5-10 seconds
    2. Within a second or less
    3. Within 5 minutes
    4. Synchronous (no latency)

References

Amazon DynamoDB Streams

Amazon DynamoDB Streams

  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
  • DynamoDB Streams is a serverless data streaming feature that makes it straightforward to track, process, and react to item-level changes in DynamoDB tables in near real-time.
  • DynamoDB Streams stores the data for the last 24 hours, after which they are erased.
  • DynamoDB Streams maintains an ordered sequence of the events per item; however, sequence across items is not maintained.
  • Example:
    • For e.g., suppose that you have a DynamoDB table tracking high scores for a game and that each item in the table represents an individual player. If you make the following three updates in this order:
      • Update 1: Change Player 1’s high score to 100 points
      • Update 2: Change Player 2’s high score to 50 points
      • Update 3: Change Player 1’s high score to 125 points
    • DynamoDB Streams will maintain the order for Player 1 score events. However, it would not maintain order across the players. So Player 2 score event is not guaranteed between the 2 Player 1 events.
  • Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.
  • DynamoDB Streams APIs help developers consume updates and receive the item-level data before and after items are changed.

DynamoDB Streams Features

  • Streams allow reads at up to twice the rate of the provisioned write capacity of the DynamoDB table.
  • Streams have to be enabled on a per-table basis. When enabled on a table, DynamoDB captures information about every modification to data items in the table.
  • Streams support Encryption at rest to encrypt the data.
  • Streams are designed for No Duplicates so that every update made to the table will be represented exactly once in the stream.
  • Streams write stream records in near-real time so that applications can consume these streams and take action based on the contents.
  • Stream records contain information about a data modification to a single item in a DynamoDB table.
  • Each stream record has a sequence number that reflects the order in which the record was published to the stream.

Stream View Types

  • When enabling a stream on a table, you must specify the stream view type, which determines what information is written to the stream:
  • KEYS_ONLY: Only the key attributes of the modified item.
  • NEW_IMAGE: The entire item, as it appears after it was modified.
  • OLD_IMAGE: The entire item, as it appeared before it was modified.
  • NEW_AND_OLD_IMAGES: Both the new and the old images of the item (recommended for maximum flexibility).

Use Cases

  • Multi-Region Replication: Keep other data stores up-to-date with the latest changes to DynamoDB (used by DynamoDB Global Tables).
  • Real-time Analytics: Stream data to analytics services for real-time insights.
  • Event-Driven Architectures: Trigger actions based on changes made to the table.
  • Data Aggregation: Aggregate data from multiple tables into a single view.
  • Audit and Compliance: Maintain audit logs of all changes to data.
  • Search Index Updates: Keep search indexes (e.g., OpenSearch) synchronized with DynamoDB data.
  • Cache Invalidation: Invalidate caches when data changes.
  • Notifications: Send notifications when specific data changes occur.

Processing DynamoDB Streams

  • Stream records can be processed using multiple methods:

AWS Lambda

  • Most common and recommended approach for processing DynamoDB Streams.
  • Lambda polls the stream and invokes the function synchronously when new records are available.
  • Lambda automatically handles scaling, retries, and error handling.
  • Supports batch processing of stream records.
  • Can filter events using event filtering to reduce invocations and costs.

Kinesis Data Streams

  • DynamoDB can stream change data directly to Amazon Kinesis Data Streams.
  • Provides longer data retention (up to 365 days vs. 24 hours for DynamoDB Streams).
  • Enables integration with Kinesis Data Firehose, Kinesis Data Analytics, and other Kinesis consumers.
  • Supports fan-out to multiple consumers.
  • Better for high-throughput scenarios requiring multiple consumers.

Kinesis Client Library (KCL)

  • KCL can be used to build custom applications that process DynamoDB Streams.
  • DynamoDB Streams Kinesis Adapter allows KCL applications to consume DynamoDB Streams.
  • KCL 3.0 Support (June 2025): DynamoDB Streams now supports Kinesis Client Library 3.0.
    • Reduces compute costs to process streaming data by up to 33% compared to previous KCL versions.
    • Improved load balancing algorithm based on CPU utilization.
    • Enhanced performance and efficiency.
    • Note: KCL 1.x reaches end-of-support on January 30, 2026. Migrate to KCL 3.x.

AWS PrivateLink Support (March 2025)

  • Announced in March 2025, DynamoDB Streams now supports AWS PrivateLink.
  • Allows invoking DynamoDB Streams APIs from within your Amazon VPC without traversing the public internet.
  • Only interface endpoints are supported for DynamoDB Streams (gateway endpoints are not supported).
  • Enables private connectivity for stream processing applications running on-premises or in other Regions.
  • Supports FIPS endpoints in US and Canada commercial AWS Regions (announced November 2025).
  • Enhances security by keeping stream data within the AWS network.
  • Critical for compliance requirements that mandate private network connectivity.
  • Can be accessed from on-premises via AWS Direct Connect or Site-to-Site VPN.

DynamoDB Streams vs. Kinesis Data Streams

  • DynamoDB Streams:
    • 24-hour data retention
    • Automatically scales with table
    • No additional cost (included with DynamoDB)
    • Simpler to set up and use
    • Best for simple event-driven architectures
  • Kinesis Data Streams:
    • Up to 365 days data retention
    • Manual capacity management (or on-demand mode)
    • Additional cost for Kinesis
    • More complex but more flexible
    • Best for multiple consumers and longer retention needs
  • Recommendation: Use DynamoDB Streams for simple use cases with Lambda. Use Kinesis Data Streams for complex scenarios requiring multiple consumers or longer retention.

Best Practices

  • Choose the Right View Type: Use NEW_AND_OLD_IMAGES for maximum flexibility unless you have specific requirements.
  • Handle Duplicates: Although designed for no duplicates, implement idempotent processing logic.
  • Monitor Stream Processing: Use CloudWatch metrics to monitor Lambda invocations, errors, and iterator age.
  • Use Event Filtering: Filter events in Lambda to reduce unnecessary invocations and costs.
  • Batch Processing: Configure appropriate batch sizes for Lambda to optimize throughput and cost.
  • Error Handling: Implement proper error handling and configure dead-letter queues for failed records.
  • Consider Kinesis for Multiple Consumers: If you need multiple consumers, use Kinesis Data Streams instead.
  • Migrate to KCL 3.0: If using KCL, migrate to version 3.0 for cost savings and performance improvements.
  • Use PrivateLink for Security: Enable AWS PrivateLink for enhanced security and compliance.

Limitations and Considerations

  • Stream records are available for only 24 hours.
  • Streams do not guarantee ordering across different items (only per-item ordering).
  • Stream records are eventually consistent with the table.
  • Enabling streams does not affect table performance.
  • Streams cannot be enabled on tables with local secondary indexes that use non-key attributes in the projection.
  • For Global Tables with MREC, streams are enabled by default and cannot be disabled.
  • For Global Tables with MRSC, streams are not used for replication but can be enabled separately.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An application currently writes a large number of records to a DynamoDB table in one region. There is a requirement for a secondary application to retrieve new records written to the DynamoDB table every 2 hours and process the updates accordingly. Which of the following is an ideal way to ensure that the secondary application gets the relevant changes from the DynamoDB table?
    1. Insert a timestamp for each record and then scan the entire table for the timestamp as per the last 2 hours.
    2. Create another DynamoDB table with the records modified in the last 2 hours.
    3. Use DynamoDB Streams to monitor the changes in the DynamoDB table.
    4. Transfer records to S3 which were modified in the last 2 hours.
  2. A company needs to process DynamoDB stream records from an on-premises application without exposing traffic to the public internet. What should they implement?
    1. Use a NAT gateway to access DynamoDB Streams.
    2. Create an interface VPC endpoint for DynamoDB Streams using AWS PrivateLink.
    3. Create a gateway VPC endpoint for DynamoDB Streams.
    4. Use an internet gateway with security groups.
  3. A company wants to reduce costs for processing DynamoDB Streams using KCL. What should they do?
    1. Switch from KCL to Lambda for processing.
    2. Migrate from KCL 1.x to KCL 3.0 for up to 33% cost reduction.
    3. Reduce the number of shards in the stream.
    4. Increase the batch size for stream processing.
  4. A company needs to maintain an audit log of all changes to a DynamoDB table for 90 days. DynamoDB Streams only retains data for 24 hours. What is the BEST solution?
    1. Enable PITR on the DynamoDB table.
    2. Stream DynamoDB changes to Kinesis Data Streams with 90-day retention.
    3. Use Lambda to copy stream records to S3 every 24 hours.
    4. Create on-demand backups every 24 hours.
  5. A developer wants to capture both the old and new values of items when they are modified in a DynamoDB table. Which stream view type should they configure?
    1. KEYS_ONLY
    2. NEW_IMAGE
    3. OLD_IMAGE
    4. NEW_AND_OLD_IMAGES
  6. Which of the following statements about DynamoDB Streams are correct? (Select TWO)
    1. Stream records are available for 24 hours.
    2. Streams guarantee ordering across all items in the table.
    3. Streams maintain ordered sequence of events per item.
    4. Streams can be processed only by Lambda functions.
    5. Enabling streams impacts table write performance.
  7. A company has multiple applications that need to process the same DynamoDB change events. What is the BEST approach?
    1. Create multiple Lambda functions triggered by the same DynamoDB Stream.
    2. Stream DynamoDB changes to Kinesis Data Streams and use multiple consumers.
    3. Enable multiple DynamoDB Streams on the same table.
    4. Use DynamoDB Streams with fan-out to multiple Lambda functions.

References

Amazon DynamoDB Consistency

DynamoDB Consistency

  • AWS has a Region, which is a physical location around the world where we cluster data centers, with one or more Availability Zones which are discrete data centers with redundant power, networking, and connectivity in an AWS Region.
  • Amazon automatically stores each DynamoDB table in the three geographically distributed locations or AZs for durability.
  • DynamoDB consistency represents the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.

DynamoDB Consistency Modes

Eventually Consistent Reads (Default)

  • Eventual consistency option maximizes the read throughput.
  • Consistency across all copies is usually reached within a second.
  • However, an eventually consistent read might not reflect the results of a recently completed write.
  • Repeating a read after a short time should return the updated data.
  • DynamoDB uses eventually consistent reads, by default.
  • Use Cases:
    • Applications that can tolerate reading slightly stale data.
    • Read-heavy workloads where throughput is more important than immediate consistency.
    • Cost-sensitive applications (eventually consistent reads are half the cost of strongly consistent reads).

Strongly Consistent Reads

  • Strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.
  • Ensures that the most up-to-date data is returned.
  • Cost: Strongly consistent reads are 2x the cost of eventually consistent reads (consume twice the read capacity units).
  • Disadvantages:
    • A strongly consistent read might not be available if there is a network delay or outage. In this case, DynamoDB may return a server error (HTTP 500).
    • Strongly consistent reads may have higher latency than eventually consistent reads.
    • Not supported on Global Secondary Indexes (GSIs) – only eventually consistent reads are supported on GSIs.
    • Strongly consistent reads use more throughput capacity than eventually consistent reads.
  • Use Cases:
    • Applications requiring immediate read-after-write consistency.
    • Financial transactions or inventory management where stale data is unacceptable.
    • Scenarios where data accuracy is critical.

Specifying Consistency Mode

  • DynamoDB allows the user to specify whether the read should be eventually consistent or strongly consistent at the time of the request.
  • Read operations (such as GetItem, Query, and Scan) provide a ConsistentRead parameter. If set to true, DynamoDB uses strongly consistent reads during the operation.
  • Default Behavior: Query, GetItem, and BatchGetItem operations perform eventually consistent reads by default.
  • Forcing Strong Consistency:
    • Query and GetItem operations can be forced to be strongly consistent by setting ConsistentRead=true.
    • Query operations cannot perform strongly consistent reads on Global Secondary Indexes (GSIs).
    • BatchGetItem operations can be forced to be strongly consistent on a per-table basis.
    • Scan operations can be forced to be strongly consistent.

Transactional Consistency

  • DynamoDB supports transactions with full ACID (Atomicity, Consistency, Isolation, Durability) properties.
  • Transactions provide all-or-nothing execution for multiple operations across one or more tables.
  • Transaction Operations:
    • TransactWriteItems – Perform multiple write operations atomically.
    • TransactGetItems – Perform multiple read operations with snapshot isolation.
  • Consistency Guarantees:
    • Atomicity: All operations in a transaction succeed or fail together.
    • Consistency: Transactions move the database from one valid state to another.
    • Isolation: Transactions are isolated from each other using snapshot isolation.
    • Durability: Once a transaction is committed, it is durable.
  • Regional Scope: Transactional operations provide ACID guarantees within a single Region.
  • Global Tables Consideration: For Global Tables with MREC, transactions are only atomic within the Region where invoked (not replicated as a unit).
  • Cost: Transactional operations consume 2x the write capacity units compared to standard writes.

Multi-Region Strong Consistency (MRSC) – January 2025

  • Announced at AWS re:Invent 2024 and generally available in January 2025.
  • Available for DynamoDB Global Tables configured with Multi-Region Strong Consistency mode.
  • Capability: Provides strong consistency across multiple AWS Regions.
  • Guarantee: Strongly consistent reads on an MRSC table always return the latest version of an item, irrespective of the Region where the read is performed.
  • Zero RPO: Enables Recovery Point Objective (RPO) of zero for highest resilience.
  • How It Works:
    • Item changes are synchronously replicated to at least one other Region before the write operation returns success.
    • Strongly consistent reads always reflect the latest committed write across all Regions.
    • Conditional writes always evaluate against the latest version of an item globally.
  • Deployment Requirements:
    • Must be deployed in exactly three Regions.
    • Can configure with 3 replicas OR 2 replicas + 1 witness.
    • Available in three Region sets: US, EU, and AP (cannot span Region sets).
  • Trade-offs:
    • Higher write latency compared to MREC (eventual consistency) due to synchronous replication.
    • Higher strongly consistent read latency compared to MREC.
  • Use Cases:
    • Financial applications requiring global strong consistency.
    • Inventory management systems across multiple Regions.
    • Applications requiring zero data loss (RPO = 0).
    • Compliance scenarios requiring strict consistency guarantees.
  • Limitations:
    • Transactions not supported on MRSC tables.
    • TTL not supported on MRSC tables.
    • Local Secondary Indexes not supported on MRSC tables.

Consistency Comparison

Consistency Type Scope Latency Cost Use Case
Eventually Consistent Single Region Lowest 1x RCU Read-heavy, can tolerate stale data
Strongly Consistent Single Region Low-Medium 2x RCU Immediate consistency required
Transactional Single Region Medium 2x WCU ACID guarantees, multiple operations
MRSC (Global Tables) Multi-Region Higher Varies Global strong consistency, zero RPO

Best Practices

  • Default to Eventually Consistent: Use eventually consistent reads by default for cost and performance optimization.
  • Use Strong Consistency Selectively: Only use strongly consistent reads when immediate consistency is required.
  • Avoid Strong Consistency on GSIs: Design data models to avoid needing strongly consistent reads on GSIs (not supported).
  • Consider Read-After-Write Patterns: If your application writes and immediately reads, use strongly consistent reads or implement retry logic.
  • Use Transactions for Multi-Item Operations: When multiple items must be updated atomically, use transactions.
  • Evaluate MRSC for Global Applications: For applications requiring global strong consistency, consider MRSC Global Tables.
  • Monitor Consistency Metrics: Use CloudWatch to monitor read/write patterns and adjust consistency settings accordingly.
  • Handle Errors Gracefully: Implement retry logic for strongly consistent reads that may fail during network issues.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following statements is true about DynamoDB?
    1. Requests are eventually consistent unless otherwise specified.
    2. Requests are strongly consistent.
    3. Tables do not contain primary keys.
    4. None of the above
  2. How is provisioned throughput affected by the chosen consistency model when reading data from a DynamoDB table?
    1. Strongly consistent reads use the same amount of throughput as eventually consistent reads
    2. Strongly consistent reads use variable throughput depending on read activity
    3. Strongly consistent reads use more throughput than eventually consistent reads.
    4. Strongly consistent reads use less throughput than eventually consistent reads
  3. A company needs to perform a query on a Global Secondary Index (GSI) and requires the most up-to-date data. What consistency mode should they use?
    1. Strongly consistent reads
    2. Eventually consistent reads (GSIs do not support strongly consistent reads)
    3. Transactional reads
    4. Multi-Region strong consistency
  4. A financial application requires strong consistency across multiple AWS Regions with zero data loss (RPO = 0). Which DynamoDB feature should they use?
    1. DynamoDB Global Tables with MREC (eventual consistency)
    2. DynamoDB Global Tables with MRSC (multi-Region strong consistency)
    3. DynamoDB with strongly consistent reads
    4. DynamoDB transactions
  5. A developer needs to update multiple items across two DynamoDB tables atomically. Which feature should they use?
    1. Strongly consistent writes
    2. BatchWriteItem operation
    3. DynamoDB transactions (TransactWriteItems)
    4. Conditional writes
  6. What is the cost difference between eventually consistent reads and strongly consistent reads in DynamoDB?
    1. No difference in cost
    2. Strongly consistent reads cost 2x more (consume 2x RCU)
    3. Strongly consistent reads cost 3x more
    4. Eventually consistent reads cost 2x more
  7. Which of the following statements about DynamoDB consistency are correct? (Select TWO)
    1. Eventually consistent reads are the default for Query and GetItem operations.
    2. Strongly consistent reads are supported on Global Secondary Indexes.
    3. Strongly consistent reads may return HTTP 500 errors during network issues.
    4. Transactions provide ACID guarantees across multiple Regions.
    5. MRSC Global Tables support local secondary indexes.

References