AWS RDS Multi-AZ Deployment

RDS Multi-AZ Instance Deployment

RDS Multi-AZ Deployment

  • RDS Multi-AZ deployments provide high availability and automatic failover support for DB instances
  • Multi-AZ helps improve the durability and availability of a critical system, enhancing availability during planned system maintenance, DB instance failure, and Availability Zone disruption.
  • A Multi-AZ DB instance deployment
    • has one standby DB instance that provides failover support but doesn’t serve read traffic.
    • There is only one row for the DB instance.
    • The value of Role is Instance or Primary.
    • The value of Multi-AZ is Yes.
  • A Multi-AZ DB cluster deployment
    • has two standby DB instances that provide failover support and can also serve read traffic.
    • There is a cluster-level row with three DB instance rows under it.
    • For the cluster-level row, the value of Role is Multi-AZ DB cluster.
    • For each instance-level row, the value of Role is Writer instance or Reader instance.
    • For each instance-level row, the value of Multi-AZ is 3 Zones.

RDS Multi-AZ DB Instance Deployment

  • RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different AZ.
  • RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.
  • RDS Multi-AZ deployment maintains the same endpoint for the DB Instance after a failover, so the application can resume database operation without the need for manual administrative intervention.
  • Multi-AZ is a high-availability feature and NOT a scaling solution for read-only scenarios; a standby replica can’t be used to serve read traffic. To service read-only traffic, use a Read Replica.
  • RDS performs an automatic failover to the standby, so that database operations can be resumed as soon as the failover is complete.
  • Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring.

RDS Multi-AZ Instance Deployment

RDS Multi-AZ DB Cluster Deployment

  • RDS Multi-AZ DB cluster deployment is a high-availability deployment mode of RDS with two readable standby DB instances.
  • RDS Multi-AZ DB cluster has a writer DB instance and two reader DB instances in three separate AZs in the same AWS Region.
  • With a Multi-AZ DB cluster, RDS semi-synchronously replicates data from the writer DB instance to both of the reader DB instances using the DB engine’s native replication capabilities.
  • Multi-AZ DB clusters provide high availability, increased capacity for read workloads, and lower write latency when compared to Multi-AZ DB instance deployments.
  • If an event of an outage, RDS manages failover from the writer DB instance to one of the reader DB instances. RDS does this based on which reader DB instance has the most recent change record.

RDS Mulit-AZ DB Cluster

Multi-AZ DB Instance vs Multi-AZ DB Cluster

RDS Multi-AZ DB Instance vs DB Cluster

RDS Multi-AZ vs Read Replicas

RDS Mulit-AZ vs Multi-Region vs Read Replicas

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is deploying a new two-tier web application in AWS. The company has limited staff and requires high availability, and the application requires complex queries and table joins. Which configuration provides the solution for the company’s requirements?
    1. MySQL Installed on two Amazon EC2 Instances in a single Availability Zone (does not provide High Availability out of the box)
    2. Amazon RDS for MySQL with Multi-AZ
    3. Amazon ElastiCache (Just a caching solution)
    4. Amazon DynamoDB (Not suitable for complex queries and joins)
  2. What would happen to an RDS (Relational Database Service) multi-Availability Zone deployment if the primary DB instance fails?
    1. IP of the primary DB Instance is switched to the standby DB Instance.
    2. A new DB instance is created in the standby availability zone.
    3. The canonical name record (CNAME) is changed from primary to standby.
    4. The RDS (Relational Database Service) DB instance reboots.
  3. Will my standby RDS instance be in the same Availability Zone as my primary?
    1. Only for Oracle RDS types
    2. Yes
    3. Only if configured at launch
    4. No
  4. Is creating a Read Replica of another Read Replica supported?
    1. Only in certain regions
    2. Only with MySQL based RDS
    3. Only for Oracle RDS types
    4. No
  5. A user is planning to set up the Multi-AZ feature of RDS. Which of the below mentioned conditions won’t take advantage of the Multi-AZ feature?
    1. Availability zone outage
    2. A manual failover of the DB instance using Reboot with failover option
    3. Region outage
    4. When the user changes the DB instance’s server type
  6. When you run a DB Instance as a Multi-AZ deployment, the “_____” serves database writes and reads
    1. secondary
    2. backup
    3. stand by
    4. primary
  7. When running my DB Instance as a Multi-AZ deployment, can I use the standby for read or write operations?
    1. Yes
    2. Only with MSSQL based RDS
    3. Only for Oracle RDS instances
    4. No
  8. Read Replicas require a transactional storage engine and are only supported for the _________ storage engine
    1. OracleISAM
    2. MSSQLDB
    3. InnoDB
    4. MyISAM
  9. A user is configuring the Multi-AZ feature of an RDS DB. The user came to know that this RDS DB does not use the AWS technology, but uses server mirroring to achieve replication. Which DB is the user using right now?
    1. MySQL
    2. Oracle
    3. MS SQL
    4. PostgreSQL
  10. If you have chosen Multi-AZ deployment, in the event of a planned or unplanned outage of your primary DB Instance, Amazon RDS automatically switches to the standby replica. The automatic failover mechanism simply changes the ______ record of the main DB Instance to point to the standby DB Instance.
    1. DNAME
    2. CNAME
    3. TXT
    4. MX
  11. When automatic failover occurs, Amazon RDS will emit a DB Instance event to inform you that automatic failover occurred. You can use the _____ to return information about events related to your DB Instance
    1. FetchFailure
    2. DescriveFailure
    3. DescribeEvents
    4. FetchEvents
  12. The new DB Instance that is created when you promote a Read Replica retains the backup window period.
    1. TRUE
    2. FALSE
  13. Will I be alerted when automatic failover occurs?
    1. Only if SNS configured
    2. No
    3. Yes
    4. Only if Cloudwatch configured
  14. Can I initiate a “forced failover” for my MySQL Multi-AZ DB Instance deployment?
    1. Only in certain regions
    2. Only in VPC
    3. Yes
    4. No
  15. A user is accessing RDS from an application. The user has enabled the Multi-AZ feature with the MS SQL RDS DB. During a planned outage how will AWS ensure that a switch from DB to a standby replica will not affect access to the application?
    1. RDS will have an internal IP which will redirect all requests to the new DB
    2. RDS uses DNS to switch over to standby replica for seamless transition
    3. The switch over changes Hardware so RDS does not need to worry about access
    4. RDS will have both the DBs running independently and the user has to manually switch over
  16. Which of the following is part of the failover process for a Multi-AZ Amazon Relational Database Service (RDS) instance?
    1. The failed RDS DB instance reboots.
    2. The IP of the primary DB instance is switched to the standby DB instance.
    3. The DNS record for the RDS endpoint is changed from primary to standby.
    4. A new DB instance is created in the standby availability zone.
  17. Which of these is not a reason a Multi-AZ RDS instance will failover?
    1. An Availability Zone outage
    2. A manual failover of the DB instance was initiated using Reboot with failover
    3. To autoscale to a higher instance class (Refer link)
    4. Master database corruption occurs
    5. The primary DB instance fails
  18. How does Amazon RDS multi Availability Zone model work?
    1. A second, standby database is deployed and maintained in a different availability zone from master, using synchronous replication. (Refer link)
    2. A second, standby database is deployed and maintained in a different availability zone from master using asynchronous replication.
    3. A second, standby database is deployed and maintained in a different region from master using asynchronous replication.
    4. A second, standby database is deployed and maintained in a different region from master using synchronous replication.
  19. A user is using a small MySQL RDS DB. The user is experiencing high latency due to the Multi AZ feature. Which of the below mentioned options may not help the user in this situation?
    1. Schedule the automated back up in non-working hours
    2. Use a large or higher size instance
    3. Use PIOPS
    4. Take a snapshot from standby Replica
  20. What is the charge for the data transfer incurred in replicating data between your primary and standby?
    1. No charge. It is free.
    2. Double the standard data transfer charge
    3. Same as the standard data transfer charge
    4. Half of the standard data transfer charge
  21. A user has enabled the Multi AZ feature with the MS SQL RDS database server. Which of the below mentioned statements will help the user understand the Multi AZ feature better?
    1. In a Multi AZ, AWS runs two DBs in parallel and copies the data asynchronously to the replica copy
    2. In a Multi AZ, AWS runs two DBs in parallel and copies the data synchronously to the replica copy
    3. In a Multi AZ, AWS runs just one DB but copies the data synchronously to the standby replica
    4. AWS MS SQL does not support the Multi AZ feature

Choosing the Right Data Science Specialization: Where to Focus Your Skills

Choosing the Right Data Science Specialization: Where to Focus Your Skills

In the rapidly evolving world of technology, data science stands out as a field of endless opportunities and diverse pathways. With its foundations deeply rooted in statistics, computer science, and domain-specific knowledge, data science has become indispensable for organizations seeking to make data-driven decisions. However, the vastness of this field can be overwhelming, making specialization a strategic necessity for aspiring data scientists.

This article aims to navigate through the labyrinth of data science specializations, helping you align your career with your interests, skills, and the evolving demands of the job market.

Understanding the Breadth of Data Science

Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to draw knowledge and discover insights from structured and unstructured data. It includes multiranged activities, from data collection and cleaning to complex algorithmic computations and predictive modeling.

Key Areas Within Data Science

  • Machine Learning: This involves creating algorithms that can learn from pre-fed data and make predictions or decisions based on it.
  • Deep Learning: A specialized subdomain of machine learning, focusing on neural networks and algorithms inspired by the structure and function of the brain.
  • Data Engineering: This is the backbone of data science, focusing on the practical aspects of data collection, storage, and retrieval.
  • Data Visualization: It involves converting complex data sets into understandable and interactive graphical representations.
  • Big Data Analytics: This deals with extracting meaningful insights from very large, diverse data sets that are often beyond the capability of traditional data-processing applications.
  • AI and Robotics: This cutting-edge field combines data science with robotics, focusing on creating machines that can perform actions/operations that typically require human intelligence.

Interconnectivity of These Areas

While these specializations are distinct, they are interconnected. For instance, data engineering is foundational for machine learning, and AI applications often rely on insights derived from big data analytics.

Factors to Consider When Choosing a Specialization

  • Personal Interests and Strengths
    • Your choice should resonate with your personal interests. If you are fascinated by how algorithms can mimic human learning, deep learning could be your calling. Alternatively, if you enjoy the challenges of handling and organizing large data sets, data engineering might suit you.
  • Industry Demand and Job Market Trends
    • It’s crucial to align your specialization with the market demand. Fields like AI and machine learning are rapidly growing and offer numerous job opportunities. Tracking industry trends can provide valuable insights into which specializations are most in demand.
  • Long-term Career Goals
    • Consider where you want to be in your career in the next five to ten years. Some specializations may offer more opportunities for growth, leadership roles, or transitions into different areas of data science.
  • Impact of Emerging Technologies
    • Emerging technologies can redefine the landscape of data science. Continuously updating with the knowledge about these changes can help you choose a specialization that remains relevant in the future.

Deep Dive into Popular Data Science Specializations

  • Machine Learning
    • Overview and Applications: From predictive modeling in finance to recommendation systems in e-commerce, machine learning is revolutionizing various industries.
    • Required Skills and Tools: Proficiency in programming languages like Python or R, understanding of algorithms, and familiarity with TensorFlow or Scikit-learn like machine learning frameworks are essential.
  • Data Engineering
    • Role in Data Science: Data engineers build and maintain the infrastructure that allows data scientists to analyze and utilize data effectively.
    • Key Skills and Technologies: Skills in database management, ETL (Extract, Transform, Load) processes, and knowledge of SQL, NoSQL, Hadoop, and Spark are crucial.
  • Big Data Analytics
    • Understanding Big Data: This specialization deals with extremely large data sets that discover patterns, trends, and associations, particularly relating to human behavior and interactions.
    • Tools and Techniques: Familiarity with big data platforms like Apache Hadoop and Spark, along with data mining and statistical analysis, is important.
  • AI and Robotics
    • The Frontier of Data Science: This field is at the cutting edge, developing intelligent systems with the capability of performing tasks that particularly require human intelligence.
    • Skills and Knowledge Base: A deep understanding of AI principles, programming, and robotics is necessary, along with skills in machine learning and neural networks.

Educational Pathways for Each Specialization

  • Academic Courses and Degrees
    • Pursuing a formal education in data science or a related field can provide a strong theoretical foundation. Many universities like MIT now offer specialized courses in machine learning, AI, and big data analytics, like the Data Analysis Certificate program.
  • Online Courses and Bootcamps
    • Online platforms like Great Learning offer specialized courses that are more flexible and often industry-oriented. Bootcamps, on the other hand, provide intensive, hands-on training in specific areas of data science.
  • Certifications and Workshops
    • Professional certifications from recognized bodies can add significant value to your resume. Educational choices like the Data Science course showcase your expertise and commitment to professional development.
  • Self-learning Resources
    • The internet is replete with resources for self-learners. From online tutorials and forums to webinars and eBooks, the opportunities for self-paced learning in data science are abundant.

Building Experience in Your Chosen Specialization

  • Internships and Entry-level Positions
    • Gaining practical experience is crucial. Internships and entry-level positions provide real-world experience and help you understand the practical challenges and applications of your chosen specialization.
  • Personal and Open-source Projects
    • Working on personal data science projects or contributing to open-source projects can be a great way to apply your skills. These projects can also be a valuable addition to your portfolio.
  • Networking and Community Involvement
    • Building a professional network and participating in data science communities can lead to job opportunities and collaborations. Attending industry conferences and seminars is also a great way to stay updated and connected.
  • Industry Conferences and Seminars
    • These events are excellent for learning about the latest industry trends, best data science practices, and emerging technologies. They also offer opportunities to meet industry leaders and peers.

Future Trends and Evolving Specializations

  • Predicting the Future of Data Science
    • The field of data science is constantly evolving. Staying informed about future trends is crucial for choosing a specialization that will remain relevant and in demand.
  • Emerging Specializations and Technologies
    • Areas like quantum computing, edge analytics, and ethical AI are emerging as new frontiers in data science. These fields are likely to offer exciting new opportunities for specialization in the coming years.
  • Staying Adaptable and Continuous Learning
    • The work-way to a successful career in data science is adaptability and a commitment to continuous learning. The field is dynamic, and staying abreast of new developments is essential.

Conclusion

Choosing the right data science specialization is a critical decision that can shape your career trajectory. It requires a careful consideration of your personal interests, the current job market, and future industry trends. Whether your passion lies in the intricate algorithms of machine learning, the structural challenges of data engineering, or the innovative frontiers of AI and robotics, there is a niche for every aspiring data scientist. The journey is one of continuous learning, adaptability, and an unwavering curiosity about the power of data. As the field continues to grow and diversify, the opportunities for data scientists are bound to expand, offering a rewarding and dynamic career path.

 

2023 Black Friday & Cyber Monday Deals

Udemy – Black Friday Sale – till 27th Nov

https://www.udemy.com/?deal_code=BFCP23&utm_term=Homepage&utm_content=Textlink&utm_campaign=BlackFriday2023

Braincert – till 27th Nov

Use Coupon Code – BLACK_FRIDAY

AWS Certifications

KodeKloud – Black Friday Sale – till 27th Nov

45% OFF on Promo

Whizlabs – Black Friday Sale – till 27th Nov

Black Friday 2023

Linux Foundation – Till 5th December

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

AWS Database - Specialty Certificate

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

I recently revalidated my AWS Certified Database – Specialty (DBS-C01) certification just before it expired. The format and domains are pretty much the same as the previous exam, however, it has been enhanced to cover a lot of new services.

AWS Certified Database – Specialty (DBS-C01) Exam Content

AWS Certified Database – Specialty (DBS-C01) exam validates your understanding of databases, including the concepts of design, migration, deployment, access, maintenance, automation, monitoring, security, and troubleshooting, and covers the following tasks:

  • Understand and differentiate the key features of AWS database services.
  • Analyze needs and requirements to design and recommend appropriate database solutions using AWS services

Refer to AWS Database – Specialty Exam Guide

DBS-C01 Domains

AWS Certified Database – Specialty (DBS-C01) Exam Summary

  • Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
  • DBS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
  • DBS-C01 exam includes two types of questions, multiple-choice and multiple-response.
  • DBS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
  • Associate exams currently cost $ 300 + tax.
  • You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
  • As always, mark the questions for review, move on, and come back to them after you are done with all.
  • As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
  • AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
  • Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Database – Specialty (DBS-C01) Exam Resources

AWS Certified Database – Specialty (DBS-C01) Exam Summary

  • AWS Certified Database – Specialty exam focuses completely on AWS Data services from relational, non-relational, graph, caching, and data warehousing. It also covers deployments, automation, migration, security, monitoring, and troubleshooting aspects of them.

Database Services

  • Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Aurora, RDS, DynamoDB
  • DynamoDB
    • is a fully managed NoSQL database service providing single-digit millisecond latency.
    • DynamoDB provisioned throughput supports On-demand and provisioned throughput capacity modes.
      • On-demand mode
        • provides a flexible billing option capable of serving thousands of requests per second without capacity planning
        • does not support reserved capacity
      • Provisioned mode
        • requires you to specify the number of reads and writes per second as required by the application
        • Understand the provisioned capacity calculations
    • DynamoDB Auto Scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
    • Know DynamoDB Burst capacity, Adaptive capacity
    • DynamoDB Consistency mode determines the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.
      • supports eventual and strongly consistent reads.
      • Eventual requires less throughput but might return stale data, whereas, Strongly consistent reads require higher throughput but would always return correct data.
    • DynamoDB secondary indexes provide efficient access to data with attributes other than the primary key.
      • LSI uses the same partition key but a different sort key, whereas, GSI is a separate table with a different partition key and/or sort key.
      • GSI can cause primary table throttling if under-provisioned.
      • Make sure you understand the difference between the Local Secondary Index and the Global Secondary Index
    • DynamoDB Global Tables is a new multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
    • DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed. (hint: know TTL can expire the data and this can be captured by using DynamoDB Streams)
    • DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
    • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
    • DynamoDB Triggers (just like database triggers) is a feature that allows the execution of custom actions based on item-level updates on a table.
    • DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement even at millions of requests per second.
      • DAX does not support fine-grained access control like DynamoDB.
    • DynamoDB Backups support PITR
      • AWS Backup can be used to backup and restore, and it supports cross-region snapshot copy as well.
    • VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway
    • Understand DynamoDB Best practices (hint: selection of keys to avoid hot partitions and creation of LSI and GSI)
  • Aurora
    • is a relational database engine that combines the speed and reliability with the simplicity and cost-effectiveness of open-source databases. 
    • provides MySQL and PostgreSQL compatibility
    • Aurora Disaster Recovery & High Availability can be achieved using Read Replicas with very minimal downtime.
      • Aurora promotes read replicas as per the priority tier (tier 0 is the highest), the largest size if the tier matches
    • Aurora Global Database provides cross-region read replicas for low-latency reads. Remember it is not multi-master and would not provide low latency writes across regions as DynamoDB Global tables.
    • Aurora Connection endpoints support
      • Cluster for primary read/write
      • Reader for read replicas
      • Custom for a specific group of instances
      • Instance for specific single instance – Not recommended
    • Aurora Fast Failover techniques
      • set TCP keepalives low
      • set Java DNS caching timeouts low
      • Set the timeout variables used in the JDBC connection string as low
      • Use the provided read and write Aurora endpoints
      • Use cluster cache management for Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there’s a failover.
    • Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
    • Aurora Backtrack feature helps rewind the DB cluster to the specified time. It is not a replacement for backups.
    • Aurora Server Auditing Events for different activities cover log-in, DML, permission changes DCL, schema changes DDL, etc.
    • Aurora Cluster Cache management feature which helps fast failover
    • Aurora Clone feature which allows you to create quick and cost-effective clones
    • Aurora supports fault injection queries to simulate various failovers like node down, primary failover, etc.
    • RDS PostgreSQL and MySQL can be migrated to Aurora, by creating an Aurora Read Replica from the instance. Once the replica lag is zero, switch the DNS with no data loss
    • Aurora Database Activity Streams help stream audit logs to external services like Kinesis
    • Supports stored procedures calling lambda functions
  • Relational Database Service (RDS)
    • provides a relational database in the cloud with multiple database options.
    • RDS Snapshots, Backups, and Restore
      • restoring a DB from a snapshot does not retain the parameter group and security group
      • automated snapshots cannot be shared. Make a manual backup from the snapshot before sharing the same.
    • RDS Read Replicas
      • allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.
      • increased scalability and database availability in the case of an AZ failure.
      • supports cross-region replicas.
    • RDS Multi-AZ provides high availability and automatic failover support for DB instances.
    • Understand the differences between RDS Multi-AZ vs Read Replicas
      • Multi-AZ failover can be simulated using Reboot with Failure option
      • Read Replicas require automated backups enabled
    • Understand DB components esp. DB parameter group, DB options groups
      • Dynamic parameters are applied immediately
      • Static parameters need manual reboot.
      • Default parameter group cannot be modified. Need to create custom parameter group and associate to RDS
      • Know max connections also depends on DB instance size
    • RDS Custom automates database administration tasks and operations. while making it possible for you as a database administrator to access and customize the database environment and operating system.
    • RDS Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess the load on the database, and determine when and where to take action. 
    • RDS Security
      • RDS supports security groups to control who can access RDS instances
      • RDS supports data at rest encryption and SSL for data in transit encryption
      • RDS supports IAM database authentication with temporary credentials.
      • Existing RDS instance cannot be encrypted, create a snapshot -> encrypt it –> restore as encrypted DB
      • RDS PostgreSQL requires rds.force_ssl=1 and sslmode=ca/verify-full to enable SSL encryption
      • Know RDS Encrypted Database limitations
    • Understand RDS Monitoring and Notification
      • Know RDS supports notification events through SNS for events like database creation, deletion, snapshot creation, etc.
      • CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
      • Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
      • RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
    • RDS instance cannot be stopped if with read replicas
  • ElastiCache
    • is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.
    • Understand the differences between Redis vs. Memcached
  • Neptune
    • is a fully managed database service built for the cloud that makes it easier to build and run graph applications. Neptune provides built-in security, continuous backups, serverless compute, and integrations with other AWS services.  
    • provides Neptune loader to quickly import data from S3
    • supports VPC endpoints
  • Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service.
  • Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
  • Redshift
    • is a fully managed, fast, and powerful, petabyte-scale data warehouse service. It is not covered in depth.
    • Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
      • COPY command which allows parallelism, and performs better than multiple COPY commands
      • COPY command can use manifest files to load data
      • COPY command handles encrypted data
    • Know Redshift cross region encrypted snapshot copy
      • Create a new key in destination region
      • Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region.
      • In the source region, enable cross-region replication and specify the name of the copy grant created.
    • Know Redshift supports Audit logging which covers authentication attempts, connections and disconnections usually for compliance reasons.
  • Data Migration Service (DMS)
    • DMS helps in migration of homogeneous and heterogeneous database
    • DMS with Full load plus Change Data Capture (CDC) migration capability can be used to migrate databases with zero downtime and no data loss.
    • DMS with SCT (Schema Conversion Tool) can be used to migrate heterogeneous databases.
    • Premigration Assessment evaluates specified components of a database migration task to help identify any problems that might prevent a migration task from running as expected.
    • Multiserver assessment report evaluates multiple servers based on input that you provide for each schema definition that you want to assess.
    • DMS provides support for data validation to ensure that your data was migrated accurately from the source to the target.
    • DMS supports LOB migration as a 2-step process. It can do a full or limited LOB migration
      • In full LOB mode, AWS DMS migrates all LOBs from source to target regardless of size. Full LOB mode can be quite slow.
      • In limited LOB mode, a maximum LOB size can be set that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is issued to the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
      • Recommended to use limited LOB mode whenever possible.

Security, Identity & Compliance

  • Identity and Access Management (IAM)
  • Key Management Services
    • is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
    • provides data at rest encryption for the databases.
  • AWS Secrets Manager
    • protects secrets needed to access applications, services, etc.
    • enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
    • supports automatic rotation of credentials for RDS, DocumentDB, etc.
  • Secrets Manager vs. Systems Manager Parameter Store
    • Secrets Manager supports automatic rotation while SSM Parameter Store does not
    • Parameter Store is cost-effective as compared to Secrets Manager.
  • Trusted Advisor provides RDS Idle instances

Management & Governance Tools

  • Understand AWS CloudWatch for Logs and Metrics.
    • EventBridge (CloudWatch Events) provides real-time alerts
    • CloudWatch can be used to store RDS logs with a custom retention period, which is indefinite by default.
    • CloudWatch Application Insights support .Net and SQL Server monitoring
  • Know CloudFormation for provisioning, in terms of
    • Stack drifts – to understand the difference between current state and on actual environment with any manual changes
    • Change Set – allows you to verify the changes before being propagated
    • parameters – allows you to configure variables or environment-specific values
    • Stack policy defines the update actions that can be performed on designated resources.
    • Deletion policy for RDS allows you to configure if the resources are retained, snapshot, or deleted once destroy is initiated
    • Supports secrets manager for DB credentials generation, storage, and easy rotation
    • System parameter store for environment-specific parameters

Whitepapers and articles

On the Exam Day

  • Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
  • If you are taking the AWS Online exam
    • Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
    • The online verification process does take some time and usually, there are glitches.
    • Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
    • Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

Amazon DynamoDB with VPC Endpoints

DynamoDB VPC Endpoint

DynamoDB with VPC Endpoints

  • By default, communications to and from DynamoDB use the HTTPS protocol, which protects network traffic by using SSL/TLS encryption.
  • VPC endpoint for DynamoDB enables EC2 instances in the VPC to use their private IP addresses to access DynamoDB with no exposure to the public internet.
  • Traffic between the VPC and the AWS service does not leave the Amazon network.
  • EC2 instances do not require public IP addresses, an internet gateway, a NAT device, or a virtual private gateway in the VPC.

  • VPC endpoint for DynamoDB routes any requests to a DynamoDB endpoint within the Region to a private DynamoDB endpoint within the Amazon network.
  • Applications running on EC2 instances in the VPC don’t need to be modified.
  • Endpoint name remains the same, but the route to DynamoDB stays entirely within the Amazon network and does not access the public internet.
  • VPC Endpoint Policies to control access to DynamoDB.

DynamoDB VPC Endpoint

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the services supported by VPC endpoints, using the Gateway endpoint type?
    1. Amazon EFS
    2. Amazon DynamoDB
    3. Amazon Glacier
    4. Amazon SQS
  2. A business application is hosted on Amazon EC2 and uses Amazon DynamoDB for its storage. The chief information security officer has directed that no application traffic between the two services should traverse the public internet. Which capability should the solutions architect use to meet the compliance requirements?
    1. AWS Key Management Service (AWS KMS)
    2. VPC endpoint
    3. Private subnet
    4. Virtual private gateway
  3. A company runs an application in the AWS Cloud and uses Amazon DynamoDB as the database. The company deploys Amazon EC2 instances to a private network to process data from the database. The company uses two NAT instances to provide connectivity to DynamoDB.
    The company wants to retire the NAT instances. A solutions architect must implement a solution that provides connectivity to DynamoDB and that does not require ongoing management. What is the MOST cost-effective solution that meets these requirements?

    1. Create a gateway VPC endpoint to provide connectivity to DynamoDB.
    2. Configure a managed NAT gateway to provide connectivity to DynamoDB.
    3. Establish an AWS Direct Connect connection between the private network and DynamoDB.
    4. Deploy an AWS PrivateLink endpoint service between the private network and DynamoDB.

References

Amazon VPC endpoints to access DynamoDB

Amazon DynamoDB Time to Live – TTL

DynamoDB Time to Live – TTL

  • DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed.
  • After the date and time of the specified timestamp, DynamoDB deletes the item from the table without consuming any write throughput.

  • DynamoDB TTL is provided at no extra cost and can help reduce data storage by retaining only required data.
  • Items that are deleted from the table are also removed from any local secondary index and global secondary index in the same way as a DeleteItem operation.
  • Expired items get removed from the table and indexes within about 48 hours.
  • DynamoDB Stream tracks the delete operation as a system delete, not a regular one.
  • TTL requirements
    • TTL attributes must use the Number data type. Other data types, such as String, aren’t supported.
      TTL attributes must use the epoch time format.- Be sure that the timestamp is in seconds, not milliseconds
  • TTL is useful if the stored items lose relevance after a specific time. for e.g.
    • Remove user or sensor data after a year of inactivity in an application
    • Archive expired items to an S3 data lake via DynamoDB Streams and AWS Lambda.
    • Retain sensitive data for a certain amount of time according to contractual or regulatory obligations.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company developed an application by using AWS Lambda and
    Amazon DynamoDB. The Lambda function periodically pulls data from the company’s S3 bucket based on date and time tags and inserts specific values into a DynamoDB table for further processing. The company must remove data that is older than 30 days from the DynamoDB table. Which solution will meet this requirement with the MOST operational efficiency?

    1. Update the Lambda function to add the Version attribute in the DynamoDB table. Enable TTL on the DynamoDB table to expire entries that are older than 30 days based on the TTL attribute.
    2. Update the Lambda function to add the TTL attribute in the DynamoDB table. Enable TTL on the DynamoDB table to expire entries that are older than 30 days based on the TTL attribute.
    3. Use AWS Step Functions to delete entries that are older than 30 days.
    4. Use EventBridge to schedule the Lambda function to delete entries that are older than 30 days.

References

DynamoDB_TTL_Time_to_Live

Amazon DynamoDB Backup and Restore

DynamoDB Backup and Restore

  • DynamoDB Backup and Restore provides fully automated on-demand backup, restore, and point-in-time recovery for data protection and archiving.
  • On-demand backup allows the creation of full backups of DynamoDB table for data archiving, helping you meet corporate and governmental regulatory requirements.
  • Point-in-time recovery (PITR) provides continuous backups of your DynamoDB table data. When enabled, DynamoDB maintains incremental backups of your table for the last 35 days until you explicitly turn it off.

On-demand Backups

  • DynamoDB on-demand backup helps create full backups of the tables for long-term retention, and archiving for regulatory compliance needs.
  • Backup and restore actions run with no impact on table performance or availability.
  • Backups are preserved regardless of table deletion and retained until they are explicitly deleted.
  • On-demand backups are cataloged, and discoverable.
  • On-demand backups can be created using
    • DynamoDB
      • can be used to backup and restore DynamoDB tables.
      • DynamoDB on-demand backups cannot be copied to a different account or Region.
    • AWS Backup (Recommended)
      • is a fully managed data protection service that makes it easy to centralize and automate backups across AWS services, in the cloud, and on-premises
      • provides enhanced backup features
      • can configure backup schedules & policies and monitor activity for the AWS resources and on-premises workloads in one place.
      • can copy the on-demand backups across AWS accounts and Regions,
      • encryption using an AWS KMS key that is independent of the DynamoDB table encryption key.
      • apply write-once-read-many (WORM) setting for the backups using the AWS Backup Vault Lock policy.
      • add cost allocation tags to on-demand backups, and
      • transition on-demand backups to cold storage for lower costs.

PITR – Point-In-Time Recovery

  • DynamoDB point-in-time recovery – PITR enables automatic, continuous, incremental backup of the table with per-second granularity.
  • PITR-enabled tables that were deleted can be recovered in the preceding 35 days and restored to their state just before they were deleted.
  • PITR helps protect against accidental writes and deletes.
  • PITR can back up tables with hundreds of terabytes of data with no impact on the performance or availability of the production applications.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A sysops engineer must create nightly backups of an Amazon DynamoDB table. Which backup methodology should the database specialist use to MINIMIZE management overhead?
    1. Install the AWS CLI on an Amazon EC2 instance. Write a CLI command that creates a backup of the DynamoDB table. Create a scheduled job or task that runs the command on a nightly basis.
    2. Create an AWS Lambda function that creates a backup of the DynamoDB table. Create an Amazon CloudWatch Events rule
      that runs the Lambda function on a nightly basis.
    3. Create a backup plan using AWS Backup, specify a backup frequency of every 24 hours, and give the plan a nightly backup window.
    4. Configure DynamoDB backup and restore for an on-demand backup frequency of every 24 hours.

References

DynamoDB_Backup_Restore

Amazon DynamoDB Global Tables

Amazon DynamoDB Global Tables

  • DynamoDB Global Tables is a fully managed, serverless, multi-master, active-active database.
  • Global tables provide 99.999% availability, increased application resiliency, and improved business continuity.
  • Global table’s automatic cross-region replication capability helps achieve fast, local read and write performance and regional fault tolerance for database workloads.
  • Applications can now perform reads and writes to DynamoDB in AWS regions around the world, with changes in any region propagated to every region where a table is replicated.
  • Global Tables help in building applications to advantage of data locality to reduce overall latency.
  • Global Tables supports eventual consistency & strong consistency for same region reads, but only eventual consistency for cross-region reads.
  • Global Tables replicates data among regions within a single AWS account and currently does not support cross-account access.
  • Global Tables uses the Last Write Wins approach for conflict resolution.

Global Tables Working

  • Global Table is a collection of one or more replica tables, all owned by a single AWS account.
  • A single Amazon DynamoDB global table can only have one replica table per AWS Region.
  • Each replica table stores the same set of data items, has the same table name, and the same primary key schema.
  • When an application writes data to a replica table in one Region, DynamoDB automatically replicates the writes to other replica tables in the other AWS Regions.
  • Global Tables requires DynamoDB streams enabled with New and Old image settings.

DynamoDB Global Tables vs. Aurora Global Databases

AWS Aurora Global Database vs DynamoDB Global Tables

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A company is building a web application on AWS. The application requires the database to support read and write operations in multiple AWS Regions simultaneously. The database also needs to propagate data changes between Regions as the changes occur. The application must be highly available and must provide a latency of single-digit milliseconds. Which solution meets these requirements?
    1. Amazon DynamoDB global tables
    2. Amazon DynamoDB streams with AWS Lambda to replicate the data
    3. An Amazon ElastiCache for Redis cluster with cluster mode enabled and multiple shards
    4. An Amazon Aurora global database

References

Amazon_DynamoDB_Global_Tables

Amazon DynamoDB Streams

Amazon DynamoDB Streams

  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
  • DynamoDB Streams stores the data for the last 24 hours, after which they are erased.
  • DynamoDB Streams maintains an ordered sequence of the events per item however, sequence across items is not maintained.
  • Example
    • For e.g., suppose that you have a DynamoDB table tracking high scores for a game and that each item in the table represents an individual player. If you make the following three updates in this order:
      • Update 1: Change Player 1’s high score to 100 points
      • Update 2: Change Player 2’s high score to 50 points
      • Update 3: Change Player 1’s high score to 125 points
    • DynamoDB Streams will maintain the order for Player 1 score events. However, it would not maintain order across the players. So Player 2 score event is not guaranteed between the 2 Player 1 events
  • Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time.
  • DynamoDB Streams APIs help developers consume updates and receive the item-level data before and after items are changed.
  • Streams allow reads at up to twice the rate of the provisioned write capacity of the DynamoDB table.
  • Streams have to be enabled on a per-table basis. When enabled on a table, DynamoDB captures information about every modification to data items in the table.
  • Streams support Encryption at rest to encrypt the data.
  • Streams are designed for No Duplicates so that every update made to the table will be represented exactly once in the stream.
  • Streams write stream records in near-real time so that applications can consume these streams and take action based on the contents.
  • Streams can be used for multi-region replication to keep other data stores up-to-date with the latest changes to DynamoDB or to take actions based on the changes made to the table
  • Stream records can be processed using Kinesis Data Streams, Lambda, or KCL application.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An application currently writes a large number of records to a DynamoDB table in one region. There is a requirement for a secondary application to retrieve new records written to the DynamoDB table every 2 hours and process the updates accordingly. Which of the following is an ideal way to ensure that the secondary application gets the relevant changes from the DynamoDB table?
    1. Insert a timestamp for each record and then scan the entire table for the timestamp as per the last 2 hours.
    2. Create another DynamoDB table with the records modified in the last 2 hours.
    3. Use DynamoDB Streams to monitor the changes in the DynamoDB table.
    4. Transfer records to S3 which were modified in the last 2 hours.

References

DynamoDB_Streams

Amazon DynamoDB Consistency

DynamoDB Consistency

  • AWS has a Region, which is a physical location around the world where we cluster data centers, with one or more Availability Zones which are discrete data centers with redundant power, networking, and connectivity in an AWS Region.
  • Amazon automatically stores each DynamoDB table in the three geographically distributed locations or AZs for durability.
  • DynamoDB consistency represents the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.

DynamoDB Consistency Modes

Eventually Consistent Reads (Default)

  • Eventual consistency option maximizes the read throughput.
  • Consistency across all copies is usually reached within a second
  • However, an eventually consistent read might not reflect the results of a recently completed write.
  • Repeating a read after a short time should return the updated data.
  • DynamoDB uses eventually consistent reads, by default.

Strongly Consistent Reads

  • Strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
  • Strongly consistent reads are 2x the cost of Eventually consistent reads
  • Strongly Consistent Reads come with disadvantages
    • A strongly consistent read might not be available if there is a network delay or outage. In this case, DynamoDB may return a server error (HTTP 500).
    • Strongly consistent reads may have higher latency than eventually consistent reads.
    • Strongly consistent reads are not supported on global secondary indexes.
    • Strongly consistent reads use more throughput capacity than eventually consistent reads.
  • DynamoDB allows the user to specify whether the read should be eventually consistent or strongly consistent at the time of the request
  • Read operations (such as GetItemQuery, and Scan) provide a ConsistentRead parameter, if set to true, DynamoDB uses strongly consistent reads during the operation.
  • Query, GetItem, and BatchGetItem operations perform eventually consistent reads by default.
    • Query and GetItem operations can be forced to be strongly consistent
    • Query operations cannot perform strongly consistent reads on Global Secondary Indexes
    • BatchGetItem operations can be forced to be strongly consistent on a per-table basis

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following statements is true about DynamoDB?
    1. Requests are eventually consistent unless otherwise specified.
    2. Requests are strongly consistent.
    3. Tables do not contain primary keys.
    4. None of the above
  2. How is provisioned throughput affected by the chosen consistency model when reading data from a DynamoDB table?
    1. Strongly consistent reads use the same amount of throughput as eventually consistent reads
    2. Strongly consistent reads use variable throughput depending on read activity
    3. Strongly consistent reads use more throughput than eventually consistent reads.
    4. Strongly consistent reads use less throughput than eventually consistent reads

References

DynamoDB_Read_Consistency