AWS Certified Machine Learning -Specialty (MLS-C01) Exam Learning Path

AWS Machine Learning - Specialty Certification

June 4, 2024 ~ Last updated on : June 4, 2024 ~ jayendrapatil ~ 28 Comments

AWS Certified Machine Learning -Specialty (MLS-C01) Exam Learning Path

Finally Re-certified the updated AWS Certified Machine Learning – Specialty (MLS-C01) certification exam after 3 months of preparation.

In terms of the difficulty level of all professional and specialty certifications, I find this to be the toughest, partly because I am still diving deep into machine learning and relearned everything from basics for this certification.
Machine Learning is a vast specialization in itself and with AWS services, there is a lot to cover and know for the exam. This is the only exam, where the majority of the focus is on concepts outside of AWS i.e. pure machine learning. It also includes AWS Machine Learning and Data Engineering services.

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Content

AWS Certified Machine Learning – Specialty (MLS-C01) exam validates
- Select and justify the appropriate ML approach for a given business problem.
- Identify appropriate AWS services to implement ML solutions.
- Design and implement scalable, cost-optimized, reliable, and secure ML solutions.

Refer AWS Certified Machine Learning – Specialty Exam Guide for details

AWS Certified Machine Learning – Specialty Domains

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Summary

Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.

MLS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
MLS-C01 exam includes two types of questions, multiple-choice and multiple-response.
MLS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.

Specialty exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
As always, mark the questions for review, move on, and come back to them after you are done with all.

As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Machine Learning Specialty Exam
- Whizlabs – AWS Certified Machine Learning Specialty Course
- Exam Readiness: AWS Certified Machine Learning – Specialty
Practice tests
- Braincert – AWS Certified Machine Learning – Specialty MLS-C01 Practice Exams
- Whizlabs – AWS Certified Machine Learning Specialty Practice Tests

AWS Certified Machine Learning – Specialty (MLS-C01) Exam Topics

AWS Certified Machine Learning – Specialty exam covers a lot of Machine Learning concepts. It digs deep into Machine learning concepts, most of which are not related to AWS.
AWS Certified Machine Learning – Speciality exam covers the E2E Machine Learning lifecycle, right from data collection, transformation, making it usable and efficient for Machine Learning, pre-processing data for Machine Learning, training and validation, and implementation.

Machine Learning Concepts

Exploratory Data Analysis
- Feature selection and Engineering
  - remove features that are not related to training
  - remove features that have the same values, very low correlation, very little variance, or a lot of missing values
  - Apply techniques like Principal Component Analysis (PCA) for dimensionality reduction i.e. reduce the number of features.
  - Apply techniques such as One-hot encoding and label encoding to help convert strings to numeric values, which are easier to process.
  - Apply Normalization i.e. values between 0 and 1 to handle data with large variance.
  - Apply feature engineering for feature reduction e.g. using a single height/weight feature instead of both features.
- Handle Missing data
  - remove the feature or rows with missing data
  - impute using Mean/Median values – valid only for Numeric values and not categorical features also does not factor correlation between features
  - impute using k-NN, Multivariate Imputation by Chained Equation (MICE), Deep Learning – more accurate and helps factors correlation between features
- Handle unbalanced data
  - Source more data
  - Oversample minority or Undersample majority
  - Data augmentation using techniques like Synthetic Minority Oversampling Technique (SMOTE).
Modeling
- Know about Algorithms – Supervised, Unsupervised and Reinforcement and which algorithm is best suitable based on the available data either labelled or unlabelled.
  - Supervised learning trains on labeled data e.g. Linear regression. Logistic regression, Decision trees, Random Forests
  - Unsupervised learning trains on unlabelled data e.g. PCA, SVD, K-means
  - Reinforcement learning trained based on actions and rewards e.g. Q-Learning
- Hyperparameters
  - are parameters exposed by machine learning algorithms that control how the underlying algorithm operates and their values affect the quality of the trained models
  - some of the common hyperparameters are learning rate, batch, epoch (hint: If the learning rate is too large, the minimum slope might be missed and the graph would oscillate If the learning rate is too small, it requires too many steps which would take the process longer and is less efficient)

Evaluation
- Know difference in evaluating model accuracy
  - Use Area Under the (Receiver Operating Characteristic) Curve (AUC) for Binary classification
  - Use root mean square error (RMSE) metric for regression
- Understand Confusion matrix
  - A true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class.
  - A false positive is an outcome where the model incorrectly predicts the positive class. A false negative is an outcome where the model incorrectly predicts the negative class.
  - Recall or Sensitivity or TPR (True Positive Rate): Number of items correctly identified as positive out of total true positives- TP/(TP+FN) (hint: use this for cases like fraud detection, cost of marking non fraud as frauds is lower than marking fraud as non-frauds)
  - Specificity or TNR (True Negative Rate): Number of items correctly identified as negative out of total negatives- TN/(TN+FP) (hint: use this for cases like videos for kids, the cost of dropping few valid videos is lower than showing few bad ones)
- Handle Overfitting problems
  - Simplify the model, by reducing the number of layers
  - Early Stopping – form of regularization while training a model with an iterative method, such as gradient descent
  - Data Augmentation
  - Regularization – technique to reduce the complexity of the model
  - Dropout is a regularization technique that prevents overfitting
  - Never train on test data

Machine Learning Services

SageMaker
- supports both File mode, Pipe mode, and Fast File mode
  - File mode loads all of the data from S3 to the training instance volumes VS Pipe mode streams data directly from S3
  - File mode needs disk space to store both the final model artifacts and the full training dataset. VS Pipe mode which helps reduce the required size for EBS volumes.
  - Fast File mode combines the ease of use of the existing File Mode with the performance of Pipe Mode.
- Using RecordIO format allows algorithms to take advantage of Pipe mode when training the algorithms that support it.
- supports Model tracking capability to manage up to thousands of machine learning model experiments
- supports automatic scaling for production variants. Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in your workload
- provides pre-built Docker images for its built-in algorithms and the supported deep learning frameworks used for training & inference
- SageMaker Automatic Model Tuning
  - is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.
  - Best practices
    - limit the search to a smaller number as the difficulty of a hyperparameter tuning job depends primarily on the number of hyperparameters that Amazon SageMaker has to search
    - DO NOT specify a very large range to cover every possible value for a hyperparameter as it affects the success of hyperparameter optimization.
    - log-scaled hyperparameter can be converted to improve hyperparameter optimization.
    - running one training job at a time achieves the best results with the least amount of compute time.
    - Design distributed training jobs so that you get they report the objective metric that you want.
- know how to take advantage of multiple GPUs (hint: increase learning rate and batch size w.r.t to the increase in GPUs)
- Elastic Interface (now replaced by Inferentia) helps attach low-cost GPU-powered acceleration to EC2 and SageMaker instances or ECS tasks to reduce the cost of running deep learning inference.
- SageMaker Inference options.
  - Real-time inference is ideal for online inferences that have low latency or high throughput requirements.
  - Serverless Inference is ideal for intermittent or unpredictable traffic patterns as it manages all of the underlying infrastructure with no need to manage instances or scaling policies.
  - Batch Transform is suitable for offline processing when large amounts of data are available upfront and you don’t need a persistent endpoint.
  - Asynchronous Inference is ideal when you want to queue requests and have large payloads with long processing times.
- SageMaker Model deployment allows deploying multiple variants of a model to the same SageMaker endpoint to test new models without impacting the user experience
  - Production Variants
    - supports A/B or Canary testing where you can allocate a portion of the inference requests to each variant.
    - helps compare production variants’ performance relative to each other.
  - Shadow Variants
    - replicates a portion of the inference requests that go to the production variant to the shadow variant.
    - logs the responses of the shadow variant for comparison and not returned to the caller.
    - helps test the performance of the shadow variant without exposing the caller to the response produced by the shadow variant.
- SageMaker Managed Spot training can help use spot instances to save cost and with Checkpointing feature can save the state of ML models during training
- SageMaker Feature Store
  - helps to create, share, and manage features for ML development.
  - is a centralized store for features and associated metadata so features can be easily discovered and reused.
- SageMaker Debugger provides tools to debug training jobs and resolve problems such as overfitting, saturated activation functions, and vanishing gradients to improve the model’s performance.
- SageMaker Model Monitor monitors the quality of SageMaker machine learning models in production and can help set alerts that notify when there are deviations in the model quality.
- SageMaker Automatic Model Tuning helps find a set of hyperparameters for an algorithm that can yield an optimal model.
- SageMaker Data Wrangler
  - reduces the time it takes to aggregate and prepare tabular and image data for ML from weeks to minutes.
  - simplifies the process of data preparation (including data selection, cleansing, exploration, visualization, and processing at scale) and feature engineering.
- SageMaker Experiments is a capability of SageMaker that lets you create, manage, analyze, and compare machine learning experiments.
- SageMaker Clarify helps improve the ML models by detecting potential bias and helping to explain the predictions that the models make.
- SageMaker Model Governance is a framework that gives systematic visibility into ML model development, validation, and usage.
- SageMaker Autopilot is an automated machine learning (AutoML) feature set that automates the end-to-end process of building, training, tuning, and deploying machine learning models.
- SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge.
- SageMaker API and SageMaker Runtime support VPC interface endpoints powered by AWS PrivateLink that helps connect VPC directly to the SageMaker API or SageMaker Runtime using AWS PrivateLink without using an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.
- Algorithms –
  - Blazing text provides Word2vec and text classification algorithms
  - DeepAR provides supervised learning algorithm for forecasting scalar (one-dimensional) time series (hint: train for new products based on existing products sales data).
  - Factorization machines provide supervised classification and regression tasks, helps capture interactions between features within high dimensional sparse datasets economically.
  - Image classification algorithm is a supervised learning algorithm that supports multi-label classification.
  - IP Insights is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses.
  - K-means is an unsupervised learning algorithm for clustering as it attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups.
  - k-nearest neighbors (k-NN) algorithm is an index-based algorithm. It uses a non-parametric method for classification or regression.
  - Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. Used to identify number of topics shared by documents within a text corpus
  - Neural Topic Model (NTM) Algorithm is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
  - Linear models are supervised learning algorithms used for solving either classification or regression problems.
    - For regression (predictor_type=’regressor’), the score is the prediction produced by the model.
    - For classification (predictor_type=’binary_classifier’ or predictor_type=’multiclass_classifier’)
  - Object Detection algorithm detects and classifies objects in images using a single deep neural network
  - Principal Component Analysis (PCA) is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) (hint: dimensionality reduction)
  - Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data points (hint: anomaly detection)
  - Sequence to Sequence is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. (hint: text summarization is the key use case)

SageMaker Ground Truth
- provides automated data labeling using machine learning
- helps build highly accurate training datasets for machine learning quickly using Amazon Mechanical Turk
- provides annotation consolidation to help improve the accuracy of the data object’s labels. It combines the results of multiple worker’s annotation tasks into one high-fidelity label.
- automated data labeling uses machine learning to label portions of the data automatically without having to send them to human workers

Machine Learning & AI Managed Services

Comprehend
- natural language processing (NLP) service to find insights and relationships in text.
- identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is; analyzes text using tokenization and parts of speech; and automatically organizes a collection of text files by topic.
Lex
- provides conversational interfaces using voice and text helpful in building voice and text chatbots
Polly
- text into speech
- supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch or volume.
- supports pronunciation lexicons to customize the pronunciation of words
Rekognition – analyze images and video
- helps identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
Translate – natural and fluent language translation
Transcribe – automatic speech recognition (ASR) speech-to-text

Kendra – an intelligent search service that uses NLP and advanced ML algorithms to return specific answers to search questions from your data.
Panorama brings computer vision to the on-premises camera network.
Augmented AI (Amazon A2I) is an ML service that makes it easy to build the workflows required for human review.

Forecast – highly accurate forecasts.

Analytics

Make sure you know and understand data engineering concepts mainly in terms of data capture, migration, transformation, and storage.
Kinesis
- Understand Kinesis Data Streams and Kinesis Data Firehose in depth
- Kinesis Data Analytics can process and analyze streaming data using standard SQL and integrates with Data Streams and Firehose
- Know Kinesis Data Streams vs Kinesis Firehose
  - Know Kinesis Data Streams is open ended on both producer and consumer. It supports KCL and works with Spark.
  - Know Kinesis Firehose is open ended for producer only. Data is stored in S3, Redshift and ElasticSearch.
  - Kinesis Firehose works in batches with minimum 60secs interval.
  - Kinesis Data Firehose supports data transformation and record format conversion using Lambda function (hint: can be used for transforming csv or JSON into parquet)
- Kinesis Video Streams provides a fully managed service to ingest, index store, and stream live video. HLS can be used to view a Kinesis video stream, either for live playback or to view archived video.
OpenSearch (ElasticSearch) is a search service that supports indexing, full-text search, faceting, etc.
Data Pipeline helps define data-driven flows to automate and schedule regular data movement and data processing activities in AWS
Glue is a fully managed, ETL (extract, transform, and load) service that automates the time-consuming steps of data preparation for analytics
- helps setup, orchestrate, and monitor complex data flows.
- Glue Data Catalog is a central repository to store structural and operational metadata for all the data assets.
- Glue crawler connects to a data store, extracts the schema of the data, and then populates the Glue Data Catalog with this metadata
- Glue DataBrew is a visual data preparation tool that enables users to clean and normalize data without writing any code.
DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between storage systems and services.

Security, Identity & Compliance

Security is covered very lightly. (hint : SageMaker can read data from KMS-encrypted S3. Make sure, the KMS key policies include the role attached with SageMaker)

Management & Governance Tools

Understand AWS CloudWatch for Logs and Metrics. (hint: SageMaker is integrated with Cloudwatch and logs and metrics are all stored in it)

Storage

Understand Data Storage Options – Know patterns for S3 vs RDS vs DynamoDB vs Redshift. (hint: S3 is, by default, the data storage option or Big Data storage, and look for it in the answer.)

Whitepapers and articles

Certified Kubernetes Security Specialist CKS Learning Path

Certified Kubernetes Security Specialist Certificate

November 26, 2023 ~ Last updated on : May 14, 2024 ~ jayendrapatil ~ 3 Comments

Certified Kubernetes Security Specialist CKS Learning Path

With Certified Kubernetes Security Specialist CKS certification, I have recertified the triad of Kubernetes certification. After knowing how to use and administer Kubernetes, the last piece was to understand the security intricacies and CKS preparation does provide you a deep dive into it.

CKS is more of an open-book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
CKS focuses on securing container-based applications and Kubernetes platforms during build, deployment, and runtime.

Unlike AWS and GCP certifications, you would be required to solve, debug actual problems, and provision resources on a live Kubernetes cluster.
Even though it is an open book test, you need to know where the information is.
Trust me, if you are not prepared this time is not going to be sufficient.

CKS Exam Pattern

CKS exam curriculum includes these general domains and their weights on the exam:
- Cluster Setup – 10%
- Cluster Hardening – 15%
- System Hardening – 15%
- Minimize Microservice Vulnerabilities – 20%
- Supply Chain Security – 20%
- Monitoring, Logging and Runtime Security – 20%
CKS exam has been upgraded and requires you to solve 15-20 questions in 2 hours. I got 16 questions.

CKS was already upgraded to use the k8s 1.28 version. But it keeps on being upgraded with new Kubernetes versions.
You are allowed to open another browser tab which can be from kubernetes.io or other product documentation like Falco. Do not open any other windows.
Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.

CKS Exam Preparation and Tips

I used the courses from KodeKloud CKS for practicing and it would be good enough to cover what is required for the exam.
Prepare yourself with the imperative commands as much as you can. This will help cut down the time required to solve half of the questions.
Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.

CKS exam provides 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.
Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.

With CKS is important to move the master node for any changes to the cluster kube-apiserver .
SSH to nodes and gaining root access is allowed if needed.
Read carefully the Information provided within the questions with the i mark. They would provide very useful hints in addressing the question and save time. for e.g., namespaces to look into for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.

Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.
If you need to edit further use --dry-run=client -o yaml to get a headstart with the YAML spec file and edit the same.
I personally use alias kk=kubectl to avoid typing kubectl

CKS Resources

Go through the CKS Curriculum
Linux Foundation CKS Course and CKS Certification Bundle
KodeKloud – Mumshad Mannambeth Certified Kubernetes Security Specialist (CKS) with Practice Tests
- Excellent course which covers the right topics required for the CKS
- It also provides hands-on labs for each of the topics, giving you actual experience working on the Kubernetes cluster.
- Make sure to practice the labs, as long as you don’t need to refer to the hints and can do most of it without documentation.

Udemy Kubernetes CKS 2021 Complete Course – Theory – Practice
Cover Kubernetes Security Overview
Practice CKS Exercises

Cover Kubernetes tutorials which provide a good hands-on guide
Cover kubectl cheatsheet for commands
Cover Tasks from Kubernetes documentation

CKS Key Topics

Cluster Setup – 10%

Practice CKS Exercises – Cluster Setup
Securing a Cluster covers a lot of these features
Use Network security policies to restrict cluster level access
- Understand Network Policies
- Use Network security policies to restrict cluster level access
- Exam tip: Know how to create Network Policies using proper selectors

Use CIS benchmark to review the security configuration of Kubernetes components (etcd, kubelet, kubedns, kubeapi)
- Center of Internet Security – CIS defines security best practices for Kubernetes and can help evaluate and recommendation for the fixes.
- Aqua Security kube-bench is a free tool that can help evaluate the k8s cluster for CIS rules.
- Exam tip: Know how to read the CIS report, identify failures, map it to the recommendation, and fix the same.
Properly set up Ingress objects with security control
- Ingress endpoint can be configured with TLS endpoint
- Exam tip: Know how to create a TLS secret and associate the same with the Ingress
Protect node metadata and endpoints
- Authentication using Certificates and Service Accounts
- Authorization using Node and RBAC
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.
- Exam tip: Know to create Service Accounts with automount disabled using the automountServiceAccountToken flag.

Minimize use of, and access to, GUI elements
- Kubernetes Dashboard is a GUI component that needs to be secured.
Verify platform binaries before deploying
- Exam tip: Know how to verify platform binaries digest using sha

Cluster Hardening – 15%

Practice CKS Exercises – Cluster Harding
Restrict access to Kubernetes API
- Control anonymous requests to Kube-apiserver
Use Role-Based Access Controls to minimize exposure
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.

Exercise caution in using service accounts e.g. disable defaults, minimize permissions on newly created ones.
- Exam tip: Know how to create Service Accounts, Roles, and Cluster Roles and associate them together using Role Binding and Cluster Role Binding.
- Exam tip: Know automountServiceAccountToken can be used to prevent the service account from being auto-mounted.

Update Kubernetes frequently
- Kubernetes supports N to N-2 versions and it is recommended to upgrade the components
- Exam tip: Know how to upgrade a Kubernetes cluster (although it did not appear on my exam)

System Hardening – 15%

Practice CKS Exercises – System Harding
Minimize host OS footprint (reduce attack surface)
- Control access using SSH, disable root and password-based logins
- Remove unwanted packages and ports
Minimize IAM roles
- IAM roles are usually with Cloud providers and relate to the least privilege access principle.

Minimize external access to the network
- External access can be controlled using Network Policies through egress policies.
Appropriately use kernel hardening tools such as AppArmor, seccomp
- Runtime classes provided by gvisor and kata containers can help provide further isolation of the containers
- Secure Computing – Seccomp tool helps control syscalls made by containers
- AppArmor can be configured for any application to reduce its potential host attack surface and provide a greater in-depth defense.
- PodSecurityPolicies – PSP enables fine-grained authorization of pod creation and updates.
  - Apply host updates
  - Install minimal required OS fingerprint
  - Identify and address open ports
  - Remove unnecessary packages
  - Protect access to data with permissions
  - Restrict allowed hostpaths
- Exam tip: Know how to load AppArmor profiles, and enable them for the pods. AppArmor is in beta and needs to be enabled using container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>

Minimize Microservice Vulnerabilities – 20%

Practice CKS Exercises – Minimize Microservice Vulnerabilities

Setup appropriate OS-level security domains e.g. using PSP, OPA, security contexts.
- Pod Security Contexts help define security for pods and containers at the pod or at the container level. Capabilities can be added at the container level only.
- Pod Security Policies enable fine-grained authorization of pod creation and updates and is implemented as an optional admission controller.
- Open Policy Agent helps enforce custom policies on Kubernetes objects without recompiling or reconfiguring the Kubernetes API server.
- Admission controllers
  - can be used for validating configurations as well as mutating the configurations.
  - Mutating controllers are triggered before validating controllers.
  - Allows extension by adding custom controllers using MutatingAdmissionWebhook and ValidatingAdmissionWebhook.
- Exam tip: Know how to configure Pod Security Context, Pod Security Policies

Manage Kubernetes secrets
- Exam Tip: Know how to read secret values, create secrets and mount the same on the pods.
Use container runtime sandboxes in multi-tenant environments (e.g. gvisor, kata containers)
- Exam tip: Know how to create a Runtime and associate it with a pod using runtimeClassName
Implement pod to pod encryption by use of mTLS
- Practice manage TLS certificates in a Cluster
- Service Mesh Istio can be used to establish MTLS for Intra pod communication.
- Istio automatically configures workload sidecars to use mutual TLS when calling other workloads. By default, Istio configures the destination workloads using PERMISSIVE mode. When PERMISSIVE mode is enabled, a service can accept both plain text and mutual TLS traffic. In order to only allow mutual TLS traffic, the configuration needs to be changed to STRICT mode.
- Exam tip: No questions related to mTLS appeared in the exam

Supply Chain Security – 20%

Practice CKS Exercises – Supply Chain Security
Minimize base image footprint
- Remove unnecessary tools. Remove shells, package manager & vi tools.
- Use slim/minimal images with required packages only. Do not include unnecessary software like build tools and utilities, troubleshooting, and debug binaries.
- Build the smallest image possible – To reduce the size of the image, install only what is strictly needed
- Use distroless, Alpine, or relevant base images for the app.
- Use official images from verified sources only.
Secure your supply chain: whitelist allowed registries, sign and validate images
- Work with images securely using a private repository
- Consider before using public images as you cannot control what’s inside them
- Configure the Kubernetes cluster to pull the images from a private registry instead of an external registry.
- Using ImagePolicyWebhook admission Controller to whitelist allowed image registries to sign and validate images.
- Task @ Pulling Image from Private Registry

Use static analysis of user workloads (e.g.Kubernetes resources, Docker files)
- Tools like Kubesec can be used to perform a static security risk analysis of the configurations files.
Scan images for known vulnerabilities
- Aqua Security Trivy & Anchore can be used for scanning vulnerabilities in the container images.
- Exam Tip: Know how to use the Trivy tool to scan images for vulnerabilities. Also, remember to use the --severity for e.g. --severity=CRITICAL flag for filtering a specific category.

Monitoring, Logging and Runtime Security – 20%

Practice CKS Exercises – Monitoring, Logging, and Runtime Security

Perform behavioral analytics of syscall process and file activities at the host and container level to detect malicious activities
Detect threats within a physical infrastructure, apps, networks, data, users, and workloads
Detect all phases of attack regardless of where it occurs and how it spreads

Perform deep analytical investigation and identification of bad actors within the environment
- Tools like strace and Aqua Security Tracee can be used to check the syscalls. However, with a number of processes, it would be tough to track and monitor all and they do not provide alerting.
- Tools like Falco & Sysdig provide deep, process-level visibility into dynamic, distributed production environments and can be used to define rules to track, monitor, and alert on activities when a certain rule is violated.
- Exam Tip: Know how to use Falco, define new rules, enable logging. Make use of the falco_rules.local.yaml file for overrides. (I did not get questions for Falco in my exam).
Ensure immutability of containers at runtime
- Immutability prevents any changes from being made to the container or to the underlying host through the container.
- It is recommended to create new images and perform a rolling deployment instead of modifying the existing running containers.
- Launch the container in read-only mode using the --read-only flag from the docker run or by using the readOnlyRootFilesystem option in Kubernetes.
- PodSecurityContext and PodSecurityPolicy can be used to define and enforce container immutability
  - ReadOnlyRootFilesystem – Requires that containers must run with a read-only root filesystem (i.e. no writable layer).
  - Privileged – determines if any container in a pod can enable privileged mode. This allows the container nearly all the same access as processes running on the host.
- Task @ Configure Pod Container Security Context
- Exam Tip: Know how to define a PodSecurityPolicy to enforce rules. Remember, Cluster Roles and Role Binding needs to be configured to provide access to the PSP to make it work.
Use Audit Logs to monitor access
- Kubernetes auditing is handled by the kube-apiserver which requires defining an audit policy file.
- Auditing captures the stages as RequestReceived -> (Authn and Authz) -> ResponseStarted (-w) -> ResponseComplete (for success) OR Panic (for failures)
- Exam Tip: Know how to configure audit policies and enable audit on the kube-apiserver. Make sure the kube-apiserver is up and running.
- Task @ Kubernetes Auditing

CKS Articles

Securing a Cluster
11 ways not to get hacked
GKE Best Practices for Building Containers
Security Best Practices (A bit older but still parts are relevant)

CKS General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.
Make sure you have a valid government-issued ID card as it would be checked.
You are not allowed to have anything around you and no one should enter the room.
The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.
Copy + Paste works fine.
You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using VI editor.

All the Best …

Certified Kubernetes Administrator CKA Learning Path

November 12, 2023 ~ Last updated on : May 14, 2024 ~ jayendrapatil ~ 8 Comments

Certified Kubernetes Administrator CKA Learning Path

Recertified Certified Kubernetes Administrator CKA certification recently with 91%. After knowing how to use Kubernetes, it was really interesting and intriguing to know Kubernetes internals and how the overall system works.

CKA is more of an open-book test, where you have access to the official Kubernetes documentation exam, but it focuses more on hands-on experience.
CKA focuses on “The skills required to be a successful Kubernetes Administrator “. It tests the candidate’s ability to do basic installation as well as configuring and managing production-grade Kubernetes clusters.

Unlike AWS and GCP certifications, you would be required to solve, debug actual problems, and provision resources on a live Kubernetes cluster.
Even though it is an open book test, you need to know where the information is.
Trust me, if you are not prepared this time is not going to be sufficient.

CKA Exam Pattern

CKA exam curriculum includes these general domains and their weights on the exam:
- Cluster Architecture, Installation & Configuration – 25%
- Workloads & Scheduling – 15%
- Services & Networking – 20%
- Storage – 10%
- Troubleshooting – 30%
~~CKA requires you to solve 24 questions in 3 hours.~~
CKA exam has been upgraded and requires you to solve 15-20 questions in 2 hours. I got 17 questions.

CKA was already upgraded to use the k8s 1.28 version. But it keeps on being upgraded with new Kubernetes versions.
You are allowed to open another browser tab which can be from kubernetes.io or other product documentation like Falco. Do not open any other windows.
Exam questions can be attempted in any order and don’t have to be sequential. So be sure to move ahead and come back later.

CKA Exam Preparation and Tips

I used the courses from KodeKloud CKA for practicing and it would be good enough to cover what is required for the exam.
Prepare yourself with the imperative commands as much as you can. This will help cut down the time required to solve half of the questions. I was not stretched for time for CKA and had much time to review.
Each exam question carries weight so be sure you attempt the exams with higher weights before focusing on the lower ones. So target the ones with higher weights and quicker solutions like debugging ones.

CKA exam provides 6-8 different preconfigured K8s clusters. Each question refers to a different Kubernetes cluster, and the context needs to be switched. Be sure to execute the kubectl use context command, which is available with every question and you just need to copy-paste it.
Check for the namespace mentioned in the question, to find resources and create resources. Use the -n <namespace>
You would be performing most of the interaction from the client node. However, pay attention to the node (master or worker) you need to execute the exams and make sure you return back to the base node.

With CKA is important to move the master node for any changes to the cluster kube-apiserver .
SSH to nodes and gaining root access is allowed if needed.
Read carefully the Information provided within the questions with the i mark. They would provide very useful hints in addressing the question and save time. for e.g., namespaces to look into for a failed pod, what has already been created like configmap, secrets, network policies so that you do not create the same.

Make sure you know the imperative commands to create resources, as you won’t have much time to create and edit YAML files.
If you need to edit further use --dry-run=client -o yaml to get a headstart with the YAML spec file and edit the same.
I personally use alias kk=kubectl to avoid typing kubectl

CKA Learning Path

Go through the CKA Curriculum
Mumshad Mannambeth Kodekloud course
- Excellent course which covers the right topics required for the CKA
- It also provides hands-on labs for each of the topics, giving you actual experience working on the Kubernetes cluster.
- Make sure to practice the labs, as long as you don’t need to refer to the hints and can do most of it without documentation.
Udemy Certified Kubernetes Administrator by Zeal Vora. It does offer practical hands-on though.

Practice CKA Exercises
Cover Kubernetes tutorials which provide a good hands-on guide
Cover kubectl cheatsheet for commands

Cover Tasks from Kubernetes documentation

CKA Key Topics

Cluster Architecture, Installation & Configuration – 25%

Practice CKA Exercises – Cluster Architecture, Installation & Configuration
Manage role based access control (RBAC)
- Authorization using Node and RBAC
Use Kubeadm to install a basic cluster
- Practice creating Kubernetes Cluster using Kubeadm

Manage a highly-available Kubernetes cluster
- Configure a highly-available Kubernetes cluster
Provision underlying infrastructure to deploy a Kubernetes cluster

Perform a version upgrade on a Kubernetes cluster using Kubeadm
- Practice Upgrading kubeadm clusters
Implement etcd backup and restore
- Make sure you read ETCD backup and practice using documentation

Workloads & Scheduling – 15%

Practice CKA Exercises – Workloads & Scheduling
Understand deployments and how to perform rolling update and rollbacks
- Understand deployments and how to perform rolling update and rollbacks. Practice kubectl rollout commands to check status and undo deployments.
Use ConfigMaps and Secrets to configure applications
- ConfigMaps are used to store non-confidential data in key-value pairs.
- Task Create a ConfigMap and mount it as a volume.
- Know how to Manage Kubernetes secrets
- Task Create Secrets and refer to them in a Pod.
- Exam Tip: Know how to read secret values, create secrets, and mount the same on the pods.
- Exam Tip: Know how to create ConfigMaps and mount the same on the pods.
Know how to scale applications
- Understand Scaling an Application using Deployment
Understand the primitives used to create robust, self-healing, application deployments
- Know how to scale and create self-healing applications using replicas

Understand how resource limits can affect Pod scheduling
- Know how to assign consumed CPU and Memory resources
- Exam Tip: Know how to configure pods with requests and limits.

Awareness of manifest management and common templating tools

Services & Networking – 20%

Practice CKA Exercises – Services & Networking
Understand host networking configuration on the cluster nodes

Understand connectivity between Pods
- Understand Cluster Networking
Understand ClusterIP, NodePort, LoadBalancer service types and endpoints
- Understand Service Networking and practice how to expose pod and. deployments as service.
Know how to use Ingress controllers and Ingress resources
- Know Ingress and how to use Ingress rules

Know how to configure and use CoreDNS
- Practice DNS for Services and Pods using nslookup
- Understand CoreDNS for Service Discovery

Choose an appropriate container network interface plugin
- Know Network Plugins

Storage – 10%

Practice CKA Exercises – Storage

Understand storage classes, persistent volumes
- Understand and focus on creating Persistent Volumes,
Understand volume mode, access modes, and reclaim policies for volumes
- Understand volume mode, access modes, and reclaim policies
Understand persistent volume claims primitive
- Understand Persistent Volume Claims and associate them with Pods

Know how to configure applications with persistent storage
- Practice Configure a Pod to Use a Volume for Storage – focus on using Empty Dir as the volume, so the storage is ephemeral to pod.
- Practice Configure Pod Container Persistent Volume Storage – focus on creating Pods with host path volumes

Troubleshooting – 30%

Practice CKA Exercises – Troubleshooting
Evaluate cluster and node logging
- Refer Cluster logging

Understand how to monitor applications
- Know resource usage monitoring as you would be needed to check resource usage using the kubectl top command
Manage container stdout & stderr logs
- Know how to Debug running pods using the kubectl logs command
Troubleshoot application failure
- Practice Debug application for troubleshooting application failures

Troubleshoot cluster component failure
- Practice Debug cluster for troubleshooting control plane failure and worker node failure.
  - Understand the control plane architecture.
  - Focus on kube-apiserver, static pod config which causes the control panel pods to be referred and deployed.
  - Check pods in kube-system if they are all running. Use docker ps -a command on the node to inspect the reason for exiting containers.
  - Check kubelet service if the worker node is shown not ready

Troubleshoot networking

Scheduling

Understand label selectors to schedule Pods on nodes using nodeSelector and Practice Assign Pod Nodes
Understand DaemonSets and how to provision. Remember there is no imperative way to create DaemonSet, so either create a deployment and filter of copy from the documentation.

Understand how resource limits can affect Pod scheduling
Understand how to run multiple schedulers and how to configure Pods to use them
Practice how to Create Static Pods esp. on worker nodes. Static pods can be configured using yaml files located in staticPodPath referred by the kube-apiserver. Make sure the property is defined.

Security

Know how to configure authentication and authorization using CertificateSigningRequest and RBAC authorization
Know how to configure network policies
Practice manage TLS certificates in a Cluster

Work with images securely using private repository
Define security contexts
Secure persistent key value store using Secrets. Practice passing Secrets to Pods using Volumes and Environment variables.

CKA General information and practices

The exam can be taken online from anywhere.
Make sure you have prepared your workspace well before the exams.
Make sure you have a valid government-issued ID card as it would be checked.

You are not allowed to have anything around you and no one should enter the room.
The exam proctor will be watching you always, so refrain from doing any other activities. Your screen is also always shared.
Copy + Paste works fine.

You will have an online notepad on the right corner to note down. I hardly used it, but it can be useful to type and modify text instead of using VI editor.

All the Best …

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

September 19, 2023 ~ Last updated on : May 13, 2024 ~ jayendrapatil ~ 19 Comments

AWS Certified Database – Specialty (DBS-C01) Exam Learning Path

I recently revalidated my AWS Certified Database – Specialty (DBS-C01) certification just before it expired. The format and domains are pretty much the same as the previous exam, however, it has been enhanced to cover a lot of new services.

AWS Certified Database – Specialty (DBS-C01) Exam Content

AWS Certified Database – Specialty (DBS-C01) exam validates your understanding of databases, including the concepts of design, migration, deployment, access, maintenance, automation, monitoring, security, and troubleshooting, and covers the following tasks:

Understand and differentiate the key features of AWS database services.

Analyze needs and requirements to design and recommend appropriate database solutions using AWS services

Refer to AWS Database – Specialty Exam Guide

AWS Certified Database – Specialty (DBS-C01) Exam Summary

Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
DBS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
DBS-C01 exam includes two types of questions, multiple-choice and multiple-response.

DBS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.
Specialty exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.

As always, mark the questions for review, move on, and come back to them after you are done with all.
As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.

Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Database – Specialty (DBS-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Database Specialty Exam
- Whizlabs – AWS Certified Database Specialty Course
Practice tests
- Braincert – AWS Certified Database – Specialty (DBS-C01) Practice Exams
- Stephane Maarek – AWS Database – Specialty Practice Tests
- Whizlabs – AWS Certified Database Specialty Practice Tests

AWS Certified Database – Specialty (DBS-C01) Exam Summary

AWS Certified Database – Specialty exam focuses completely on AWS Data services from relational, non-relational, graph, caching, and data warehousing. It also covers deployments, automation, migration, security, monitoring, and troubleshooting aspects of them.

Database Services

Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Aurora, RDS, DynamoDB
DynamoDB
- is a fully managed NoSQL database service providing single-digit millisecond latency.
- DynamoDB provisioned throughput supports On-demand and provisioned throughput capacity modes.
  - On-demand mode
    - provides a flexible billing option capable of serving thousands of requests per second without capacity planning
    - does not support reserved capacity
  - Provisioned mode
    - requires you to specify the number of reads and writes per second as required by the application
    - Understand the provisioned capacity calculations
- DynamoDB Auto Scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
- Know DynamoDB Burst capacity, Adaptive capacity
- DynamoDB Consistency mode determines the manner and timing in which the successful write or update of a data item is reflected in a subsequent read operation of that same item.
  - supports eventual and strongly consistent reads.
  - Eventual requires less throughput but might return stale data, whereas, Strongly consistent reads require higher throughput but would always return correct data.
- DynamoDB secondary indexes provide efficient access to data with attributes other than the primary key.
  - LSI uses the same partition key but a different sort key, whereas, GSI is a separate table with a different partition key and/or sort key.
  - GSI can cause primary table throttling if under-provisioned.
  - Make sure you understand the difference between the Local Secondary Index and the Global Secondary Index
- DynamoDB Global Tables is a new multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
  - Understand the differences between DynamoDB Global tables and Aurora Global databases esp. in terms of allowing writes in multiple regions.
- DynamoDB Time to Live – TTL enables a per-item timestamp to determine when an item is no longer needed. (hint: know TTL can expire the data and this can be captured by using DynamoDB Streams)
- DynamoDB cross-region replication allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
- DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table.
- DynamoDB Triggers (just like database triggers) is a feature that allows the execution of custom actions based on item-level updates on a table.
- DynamoDB Accelerator – DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement even at millions of requests per second.
  - DAX does not support fine-grained access control like DynamoDB.
- DynamoDB Backups support PITR
  - AWS Backup can be used to backup and restore, and it supports cross-region snapshot copy as well.
- VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway
- Understand DynamoDB Best practices (hint: selection of keys to avoid hot partitions and creation of LSI and GSI)
Aurora
- is a relational database engine that combines the speed and reliability with the simplicity and cost-effectiveness of open-source databases.
- provides MySQL and PostgreSQL compatibility
- Aurora Disaster Recovery & High Availability can be achieved using Read Replicas with very minimal downtime.
  - Aurora promotes read replicas as per the priority tier (tier 0 is the highest), the largest size if the tier matches
- Aurora Global Database provides cross-region read replicas for low-latency reads. Remember it is not multi-master and would not provide low latency writes across regions as DynamoDB Global tables.
- Aurora Connection endpoints support
  - Cluster for primary read/write
  - Reader for read replicas
  - Custom for a specific group of instances
  - Instance for specific single instance – Not recommended
- Aurora Fast Failover techniques
  - set TCP keepalives low
  - set Java DNS caching timeouts low
  - Set the timeout variables used in the JDBC connection string as low
  - Use the provided read and write Aurora endpoints
  - Use cluster cache management for Aurora PostgreSQL. Cluster cache management ensures that application performance is maintained if there’s a failover.
- Aurora Serverless is an on-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora.
- Aurora Backtrack feature helps rewind the DB cluster to the specified time. It is not a replacement for backups.
- Aurora Server Auditing Events for different activities cover log-in, DML, permission changes DCL, schema changes DDL, etc.
- Aurora Cluster Cache management feature which helps fast failover
- Aurora Clone feature which allows you to create quick and cost-effective clones
- Aurora supports fault injection queries to simulate various failovers like node down, primary failover, etc.
- RDS PostgreSQL and MySQL can be migrated to Aurora, by creating an Aurora Read Replica from the instance. Once the replica lag is zero, switch the DNS with no data loss
- Aurora Database Activity Streams help stream audit logs to external services like Kinesis
- Supports stored procedures calling lambda functions

Relational Database Service (RDS)
- provides a relational database in the cloud with multiple database options.
- RDS Snapshots, Backups, and Restore
  - restoring a DB from a snapshot does not retain the parameter group and security group
  - automated snapshots cannot be shared. Make a manual backup from the snapshot before sharing the same.
- RDS Read Replicas
  - allow elastic scaling beyond the capacity constraints of a single DB instance for read-heavy database workloads.
  - increased scalability and database availability in the case of an AZ failure.
  - supports cross-region replicas.
- RDS Multi-AZ provides high availability and automatic failover support for DB instances.
- Understand the differences between RDS Multi-AZ vs Read Replicas
  - Multi-AZ failover can be simulated using Reboot with Failure option
  - Read Replicas require automated backups enabled
- Understand DB components esp. DB parameter group, DB options groups
  - Dynamic parameters are applied immediately
  - Static parameters need manual reboot.
  - Default parameter group cannot be modified. Need to create custom parameter group and associate to RDS
  - Know max connections also depends on DB instance size
- RDS Custom automates database administration tasks and operations. while making it possible for you as a database administrator to access and customize the database environment and operating system.
- RDS Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess the load on the database, and determine when and where to take action.
- RDS Security
  - RDS supports security groups to control who can access RDS instances
  - RDS supports data at rest encryption and SSL for data in transit encryption
  - RDS supports IAM database authentication with temporary credentials.
  - Existing RDS instance cannot be encrypted, create a snapshot -> encrypt it –> restore as encrypted DB
  - RDS PostgreSQL requires rds.force_ssl=1 and sslmode=ca/verify-full to enable SSL encryption
  - Know RDS Encrypted Database limitations
- Understand RDS Monitoring and Notification
  - Know RDS supports notification events through SNS for events like database creation, deletion, snapshot creation, etc.
  - CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance.
  - Enhanced Monitoring metrics are useful to understand how different processes or threads on a DB instance use the CPU.
  - RDS Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
- RDS instance cannot be stopped if with read replicas

ElastiCache
- is a managed web service that helps deploy and run Memcached or Redis protocol-compliant cache clusters in the cloud easily.
- Understand the differences between Redis vs. Memcached

Neptune
- is a fully managed database service built for the cloud that makes it easier to build and run graph applications. Neptune provides built-in security, continuous backups, serverless compute, and integrations with other AWS services.
- provides Neptune loader to quickly import data from S3
- supports VPC endpoints
Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed Apache Cassandra–compatible database service.
Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.

Redshift
- is a fully managed, fast, and powerful, petabyte-scale data warehouse service. It is not covered in depth.
- Know Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
  - COPY command which allows parallelism, and performs better than multiple COPY commands
  - COPY command can use manifest files to load data
  - COPY command handles encrypted data
- Know Redshift cross region encrypted snapshot copy
  - Create a new key in destination region
  - Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the destination region.
  - In the source region, enable cross-region replication and specify the name of the copy grant created.
- Know Redshift supports Audit logging which covers authentication attempts, connections and disconnections usually for compliance reasons.
Data Migration Service (DMS)
- DMS helps in migration of homogeneous and heterogeneous database
- DMS with Full load plus Change Data Capture (CDC) migration capability can be used to migrate databases with zero downtime and no data loss.
- DMS with SCT (Schema Conversion Tool) can be used to migrate heterogeneous databases.
- Premigration Assessment evaluates specified components of a database migration task to help identify any problems that might prevent a migration task from running as expected.
- Multiserver assessment report evaluates multiple servers based on input that you provide for each schema definition that you want to assess.
- DMS provides support for data validation to ensure that your data was migrated accurately from the source to the target.
- DMS supports LOB migration as a 2-step process. It can do a full or limited LOB migration
  - In full LOB mode, AWS DMS migrates all LOBs from source to target regardless of size. Full LOB mode can be quite slow.
  - In limited LOB mode, a maximum LOB size can be set that AWS DMS should accept. Doing so allows AWS DMS to pre-allocate memory and load the LOB data in bulk. LOBs that exceed the maximum LOB size are truncated and a warning is issued to the log file. In limited LOB mode, you get significant performance gains over full LOB mode.
  - Recommended to use limited LOB mode whenever possible.

Security, Identity & Compliance

Identity and Access Management (IAM)
- Understand IAM in depth
- Understand IAM Roles
Key Management Services
- is a managed encryption service that allows the creation and control of encryption keys to enable data encryption.
- provides data at rest encryption for the databases.
AWS Secrets Manager
- protects secrets needed to access applications, services, etc.
- enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle
- supports automatic rotation of credentials for RDS, DocumentDB, etc.
Secrets Manager vs. Systems Manager Parameter Store
- Secrets Manager supports automatic rotation while SSM Parameter Store does not
- Parameter Store is cost-effective as compared to Secrets Manager.
Trusted Advisor provides RDS Idle instances

Management & Governance Tools

Understand AWS CloudWatch for Logs and Metrics.
- EventBridge (CloudWatch Events) provides real-time alerts
- CloudWatch can be used to store RDS logs with a custom retention period, which is indefinite by default.
- CloudWatch Application Insights support .Net and SQL Server monitoring
Know CloudFormation for provisioning, in terms of
- Stack drifts – to understand the difference between current state and on actual environment with any manual changes
- Change Set – allows you to verify the changes before being propagated
- parameters – allows you to configure variables or environment-specific values
- Stack policy defines the update actions that can be performed on designated resources.
- Deletion policy for RDS allows you to configure if the resources are retained, snapshot, or deleted once destroy is initiated
- Supports secrets manager for DB credentials generation, storage, and easy rotation
- System parameter store for environment-specific parameters

Whitepapers and articles

AWS Database Services Cheat Sheet

On the Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

AWS Data Analytics - Specialty DAS-C01 Certificate

July 11, 2023 ~ Last updated on : October 4, 2023 ~ jayendrapatil ~ 25 Comments

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Learning Path

Recertified with the AWS Certified Data Analytics – Specialty (DAS-C01) which tends to cover a lot of big data topics focused on AWS services.

Data Analytics – Specialty (DAS-C01) has replaced the previous Big Data – Specialty (BDS-C01).

AWS Certified Data Analytics – Specialty (DAS-C01) exam basically validates

Define AWS data analytics services and understand how they integrate with each other.
Explain how AWS data analytics services fit in the data lifecycle of collection, storage, processing, and visualization.

Refer AWS Certified Data Analytics – Specialty Exam Guide for details

AWS Certified Data Analytics - Specialty DAS-C01 Domains

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified Data Analytics Specialty Exam
- Whizlabs – AWS Certified Data Analytics – Specialty Course
Practice tests
- Braincert – AWS Certified Data Analytics – Specialty DAS-C01 Practice Exams
- Stephane Maarek – Practice Exams | AWS Certified Data Analytics Specialty
- Whizlabs – AWS Certified Data Analytics – Specialty Practice Tests

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Summary

Specialty exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.

DAS-C01 exam has 65 questions to be solved in 170 minutes which gives you roughly 2 1/2 minutes to attempt each question.
DAS-C01 exam includes two types of questions, multiple-choice and multiple-response.
DAS-C01 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.

Specialty exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.
As always, mark the questions for review and move on and come back to them after you are done with all.

As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Data Analytics – Specialty (DAS-C01) Exam Topics

AWS Certified Data Analytics – Specialty exam, as its name suggests, covers a lot of Big Data concepts right from data collection, ingestion, transfer, storage, pre and post-processing, analytics, and visualization with the added concepts for data security at each layer.

Analytics

Make sure you know and cover all the services in-depth, as 80% of the exam is focused on topics like Glue, Kinesis, and Redshift.
AWS Analytics Services Cheat Sheet

Glue
- DAS-C01 covers Glue in great detail.
- AWS Glue is a fully managed, ETL service that automates the time-consuming steps of data preparation for analytics.
- supports server-side encryption for data at rest and SSL for data in motion.
- Glue ETL engine to Extract, Transform, and Load data that can automatically generate Scala or Python code.
- Glue Data Catalog is a central repository and persistent metadata store to store structural and operational metadata for all the data assets. It works with Apache Hive as its metastore.
- Glue Crawlers scan various data stores to automatically infer schemas and partition structures to populate the Data Catalog with corresponding table definitions and statistics.
- Glue Job Bookmark tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run.
- Glue Streaming ETL enables performing ETL operations on streaming data using continuously-running jobs.
- Glue provides flexible scheduler that handles dependency resolution, job monitoring, and retries.
- Glue Studio offers a graphical interface for authoring AWS Glue jobs to process data allowing you to define the flow of the data sources, transformations, and targets in the visual interface and generating Apache Spark code on your behalf.
- Glue Data Quality helps reduces manual data quality efforts by automatically measuring and monitoring the quality of data in data lakes and pipelines.
- Glue DataBrew helps prepare, visualize, clean, and normalize data directly from the data lake, data warehouses, and databases, including S3, Redshift, Aurora, and RDS.
Redshift
- Redshift is also covered in depth.
- Cover Redshift Advanced topics
  - Redshift Distribution Style determines how data is distributed across compute nodes and helps minimize the impact of the redistribution step by locating the data where it needs to be before the query is executed.
  - Redshift Enhanced VPC routing forces all COPY and UNLOAD traffic between the cluster and the data repositories through the VPC.
  - Workload management (WLM) enables users to flexibly manage priorities within workloads so that short, fast-running queries won’t get stuck in queues behind long-running queries.
  - Redshift Spectrum helps query and retrieve structured and semistructured data from files in S3 without having to load the data into Redshift tables.
  - Federated Query feature allows querying and analyzing data across operational databases, data warehouses, and data lakes.
  - Short query acceleration (SQA) prioritizes selected short-running queries ahead of longer-running queries.
  - Redshift Serverless is a serverless option of Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure.
- Redshift Best Practices w.r.t selection of Distribution style, Sort key, importing/exporting data
  - COPY command which allows parallelism, and performs better than multiple COPY commands
  - COPY command can use manifest files to load data
  - COPY command handles encrypted data
- Redshift Resizing cluster options (elastic resize did not support node type changes before, but does now)
- Redshift supports encryption at rest and in transit
- Redshift supports encrypting an unencrypted cluster using KMS. However, you can’t enable hardware security module (HSM) encryption by modifying the cluster. Instead, create a new, HSM-encrypted cluster and migrate your data to the new cluster.
- Know Redshift views to control access to data.
Elastic Map Reduce
- Understand EMRFS
  - Use Consistent view to make sure S3 objects referred by different applications are in sync. Although, it is not needed now.
- Know EMR Best Practices (hint: start with many small nodes instead of few large nodes)
- Know EMR Encryption options
  - supports SSE-S3, SS3-KMS, CSE-KMS, and CSE-Custom encryption for EMRFS
  - supports LUKS encryption for local disks
  - supports TLS for data in transit encryption
  - supports EBS encryption
- Hive metastore can be externally hosted using RDS, Aurora, and AWS Glue Data Catalog
- Know also different technologies
  - Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources
  - Spark is a distributed processing framework and programming model that helps do machine learning, stream processing, or graph analytics using Amazon EMR clusters
  - Zeppelin/Jupyter as a notebook for interactive data exploration and provides open-source web application that can be used to create and share documents that contain live code, equations, visualizations, and narrative text
  - Phoenix is used for OLTP and operational analytics, allowing you to use standard SQL queries and JDBC APIs to work with an Apache HBase backing store
Kinesis
- Understand Kinesis Data Streams and Kinesis Data Firehose in depth
- Know Kinesis Data Streams vs Kinesis Firehose
  - Know Kinesis Data Streams is open-ended for both producer and consumer. It supports KCL and works with Spark.
  - Know Kinesis Firehose is open-ended for producers only. Data is stored in S3, Redshift, and OpenSearch.
  - Kinesis Firehose works in batches with minimum 60secs intervals and in near-real time.
  - Kinesis Firehose supports out-of-the-box transformation and custom transformation using Lambda
- Kinesis supports encryption at rest using server-side encryption
- Kinesis Producer Library supports batching
- Kinesis Data Analytics
  - helps transform and analyze streaming data in real time using Apache Flink.
  - supports anomaly detection using Random Cut Forest ML
  - supports reference data stored in S3.
OpenSearch
- OpenSearch is a search service that supports indexing, full-text search, faceting, etc.
- OpenSearch can be used for analysis and supports visualization using OpenSearch Dashboards which can be real-time.
- OpenSearch Service Storage tiers support Hot, UltraWarm, and Cold and the data can be transitioned using Index State management.

QuickSight
- Know Visual Types (hint: esp. word clouds, plotting line, bar, and story based visualizations)
- Know Supported Data Sources
- QuickSight provides IP addresses that need to be whitelisted for QuickSight to access the data store.
- QuickSight provides direct integration with Microsoft AD
- QuickSight supports Row level security using dataset rules to control access to data at row granularity based on permissions associated with the user interacting with the data.
- QuickSight supports ML insights as well
- QuickSight supports users defined via IAM or email signup.
Athena
- is a serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
- provides a simplified, flexible way to analyze data in an S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python without loading the data.
- integrates with QuickSight for visualizing the data or creating dashboards.
- uses a managed Glue Data Catalog to store information and schemas about the databases and tables that you create for the data stored in S3
- Workgroups can be used to separate users, teams, applications, or workloads, to set limits on the amount of data each query or the entire workgroup can process, and to track costs.
- Athena best practices recommended partitioning the data, partition projection, and using the Columnar file format like ORC or Parquet as they support compression and are splittable.

Know Data Pipeline for data transfer

Security, Identity & Compliance

Data security is a key concept controlled in the Data Analytics – Specialty exam
Identity and Access Management (IAM)
- Understand IAM in depth
- Understand IAM Roles
- Understand Identity Providers & Federation
- Understand IAM Policies
Deep dive into Key Management Service (KMS). There would be quite a few questions on this.
- Understand how KMS works
- Understand IAM Policies, Key Policies, Grants
- Know KMS are regional and how to use in other regions.
Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in S3.

Understand AWS Cognito esp. authentication across devices

Management & Governance Tools

Understand AWS CloudWatch for Logs and Metrics.
CloudWatch Subscription Filters can be used to route data to Kinesis Data Streams, Kinesis Data Firehose, and Lambda.

Whitepapers and articles

On the Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.

If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Data Pipeline

June 27, 2023 ~ Last updated on : July 18, 2023 ~ jayendrapatil ~ 4 Comments

AWS Data Pipeline

AWS Data Pipeline is a web service that makes it easy to automate and schedule regular data movement and data processing activities in AWS

helps define data-driven workflows
integrates with on-premises and cloud-based storage systems

helps quickly define a pipeline, which defines a dependent chain of data sources, destinations, and predefined or custom data processing activities
supports scheduling where the pipeline regularly performs processing activities such as distributed data copy, SQL transforms, EMR applications, or custom scripts against destinations such as S3, RDS, or DynamoDB.
ensures that the pipelines are robust and highly available by executing the scheduling, retry, and failure logic for the workflows as a highly scalable and fully managed service.

AWS Data Pipeline features

Distributed, fault-tolerant, and highly available
Managed workflow orchestration service for data-driven workflows
Infrastructure management service, as it will provision and terminate resources as required

Provides dependency resolution
Can be scheduled
Supports Preconditions for readiness checks.

Grants control over retries, including frequency and number
Native integration with S3, DynamoDB, RDS, EMR, EC2 and Redshift
Support for both AWS based and external on-premise resources

AWS Data Pipeline Concepts

Pipeline Definition

Pipeline definition helps the business logic to be communicated to the AWS Data Pipeline
Pipeline definition defines the location of data (Data Nodes), activities to be performed, the schedule, resources to run the activities, per-conditions, and actions to be performed

Pipeline Components, Instances, and Attempts

Pipeline components represent the business logic of the pipeline and are represented by the different sections of a pipeline definition.

Pipeline components specify the data sources, activities, schedule, and preconditions of the workflow
When AWS Data Pipeline runs a pipeline, it compiles the pipeline components to create a set of actionable instances and contains all the information needed to perform a specific task
Data Pipeline provides durable and robust data management as it retries a failed operation depending on frequency & defined number of retries

Task Runners

A task runner is an application that polls AWS Data Pipeline for tasks and then performs those tasks
When Task Runner is installed and configured,
- it polls AWS Data Pipeline for tasks associated with activated pipelines
- after a task is assigned to Task Runner, it performs that task and reports its status back to Pipeline.
A task is a discreet unit of work that the Pipeline service shares with a task runner and differs from a pipeline, which defines activities and resources that usually yields several tasks
Tasks can be executed either on the AWS Data Pipeline managed or user-managed resources.

Data Nodes

Data Node defines the location and type of data that a pipeline activity uses as source (input) or destination (output)
supports S3, Redshift, DynamoDB, and SQL data nodes

Databases

supports JDBC, RDS, and Redshift database

Activities

An activity is a pipeline component that defines the work to perform
Data Pipeline provides pre-defined activities for common scenarios like sql transformation, data movement, hive queries, etc

Activities are extensible and can be used to run own custom scripts to support endless combinations

Preconditions

Precondition is a pipeline component containing conditional statements that must be satisfied (evaluated to True) before an activity can run
A pipeline supports
- System-managed preconditions
  - are run by the AWS Data Pipeline web service on your behalf and do not require a computational resource
  - Includes source data and keys check for e.g. DynamoDB data, table exists or S3 key exists or prefix not empty
- User-managed preconditions
  - run on user defined and managed computational resources
  - Can be defined as Exists check or Shell command

Resources

A resource is a computational resource that performs the work that a pipeline activity specifies
supports AWS Data Pipeline-managed and self-managed resources
AWS Data Pipeline-managed resources include EC2 and EMR, which are launched by the Data Pipeline service only when they’re needed

Self managed on-premises resources can also be used, where a Task Runner package is installed which continuously polls the AWS Data Pipeline service for work to perform
Resources can run in the same region as their working data set or even on a region different than AWS Data Pipeline
Resources launched by AWS Data Pipeline are counted within the resource limits and should be taken into account

Actions

Actions are steps that a pipeline takes when a certain event like success, or failure occurs.
Pipeline supports SNS notifications and termination action on resources

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

An International company has deployed a multi-tier web application that relies on DynamoDB in a single region. For regulatory reasons they need disaster recovery capability in a separate region with a Recovery Time Objective of 2 hours and a Recovery Point Objective of 24 hours. They should synchronize their data on a regular basis and be able to provision the web application rapidly using CloudFormation. The objective is to minimize changes to the existing web application, control the throughput of DynamoDB used for the synchronization of data and synchronize only the modified elements. Which design would you choose to meet these requirements?
1. Use AWS data Pipeline to schedule a DynamoDB cross region copy once a day. Create a ‘Lastupdated’ attribute in your DynamoDB table that would represent the timestamp of the last update and use it as a filter. (Refer Blog Post)
2. Use EMR and write a custom script to retrieve data from DynamoDB in the current region using a SCAN operation and push it to DynamoDB in the second region. (No Schedule and throughput control)
3. Use AWS data Pipeline to schedule an export of the DynamoDB table to S3 in the current region once a day then schedule another task immediately after it that will import data from S3 to DynamoDB in the other region. (With AWS Data pipeline the data can be copied directly to other DynamoDB table)
4. Send each item into an SQS queue in the second region; use an auto-scaling group behind the SQS queue to replay the write in the second region. (Not Automated to replay the write)
Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements. Customers can show off their Individuality on the ski slopes and have access to head-up-displays, GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards. Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time?
1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)

References

AWS_Data_Pipeline_Developer_Guide

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Learning Path

AWS SysOps Administor - Associate SOA-C02 Certification

April 14, 2023 ~ Last updated on : October 4, 2023 ~ jayendrapatil

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Learning Path

I recently recertified for the AWS Certified SysOps Administrator – Associate (SOA-C02) exam.

SOA-C02 is the updated version of the SOA-C01 AWS exam with hands-on labs included, which is the first with AWS.

NOTE: As of March 28, 2023, the AWS Certified SysOps Administrator – Associate exam will not include exam labs until further notice. This removal of exam labs is temporary while we evaluate the exam labs and make improvements to provide an optimal candidate experience.

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Content

AWS SysOps Administrator – Associate SOA-C02 is intended for system administrators in a cloud operations role.

SOA-C02 validates a candidate’s ability to deploy, manage, and operate workloads on AWS which includes
- Deploy, manage, and operate workloads on AWS
- Support and maintain AWS workloads according to the AWS Well-Architected Framework
- Perform operations by using the AWS Management Console and the AWS CLI
- Implement security controls to meet compliance requirements
- Monitor, log, and troubleshoot systems
- Apply networking concepts (for example, DNS, TCP/IP, firewalls)
- Implement architectural requirements (for example, high availability, performance, capacity)
- Perform business continuity and disaster recovery procedures
- Identify, classify, and remediate incidents

Refer AWS Certified SysOps – Associate (SOA-C02) Exam Guide

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Summary

SOA-C02 is the first AWS exam that included 2 sections
- Objective questions
- Hands-on labs
With Labs
- SOA-C02 Exam consists of around 50 objective-type questions and 3 Hands-on labs to be answered in 190 minutes.
- Labs are performed in a separate instance. Copy-paste works, so make sure you copy the exact names on resource creation.
- Labs are pretty easy if you have worked on AWS.
- Plan to leave 20 minutes to complete each exam lab.
- NOTE: Once you complete a section and click next you cannot go back to the section. The same is for the labs. Once a lab is completed, you cannot return back to the lab.
- Practice the Sample Lab provided when you book the exam, which would give you a feel of how the hands-on exam would actually be.

Without Labs
- SOA-C02 exam consists of 65 questions in 130 minutes, and the time is more than sufficient if you are well-prepared.
SOA-C02 exam includes two types of questions, multiple-choice and multiple-response.

SOA-C02 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 720.
Associate exams currently cost $ 150 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.

AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified SysOps Administrator Associate
- Adrian Cantrill – AWS Certified SysOps Administrator – Associate
- Adrian Cantrill – All Associate Bundle
- DolfinEd Udemy AWS Certified Solutions Architect Associate – SAA-C02 (Self-Paced)
- Whizlabs – AWS Certified SysOps Administrator – Associate Course
- Exam Readiness: AWS Certified SysOps Administrator – Associate
Practice Tests
- Braincert AWS Certified SysOps Administrator – Associate SOA-C02 Practice Exams
- Stephane Maarek – Practice Exams: AWS Certified SysOps Administrator Associate
- Whizlabs – AWS Certified SysOps Administrator – Associate Practice Tests
Signed up with AWS for the Free Tier account which provides a lot of the Services to be tried for free with certain limits which are more than enough to get things going. Be sure to decommission anything, if you using anything beyond the free limits, preventing any surprises 🙂

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Topics

SOA-C02 mainly focuses on SysOps and DevOps tools in AWS and the ability to deploy, manage, operate, and automate workloads on AWS.

Management & Governance Tools

CloudFormation
- provides an easy way to create and manage a collection of related AWS resources, provision and update them in an orderly and predictable fashion.
- CloudFormation Concepts cover
  - Templates act as a blueprint for provisioning of AWS resources
  - Stacks are collection of resources as a single unit, that can be created, updated, and deleted by creating, updating, and deleting stacks.
  - Change Sets present a summary or preview of the proposed changes that CloudFormation will make when a stack is updated.
  - Nested stacks are stacks created as part of other stacks.
- CloudFormation template anatomy consists of resources, parameters, outputs, and mappings.
- CloudFormation supports multiple features
  - Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.
  - Termination protection helps prevent a stack from being accidentally deleted.
  - Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update.
  - StackSets help create, update, or delete stacks across multiple accounts and Regions with a single operation.
  - Helper scripts with creation policies can help wait for the completion of events before provisioning or marking resources complete.
  - DependsOn attribute can specify the resource creation order and control the creation of a specific resource follows another.
  - Update policy supports rolling and replacing updates with AutoScaling.
  - Deletion policies to help retain or backup resources during stack deletion.
  - Custom resources can be configured for uses cases not supported for e.g. retrieve AMI IDs or interact with external services
- Understand CloudFormation Best Practices esp. Nested Stacks and logical grouping
Elastic Beanstalk helps to quickly deploy and manage applications in the AWS Cloud without having to worry about the infrastructure that runs those applications.
- Understand Elastic Beanstalk overall – Applications, Versions, and Environments
- Deployment strategies with their advantages and disadvantages
OpsWorks is a configuration management service that helps to configure and operate applications in a cloud enterprise by using Chef.

Understand CloudFormation vs Elastic Beanstalk vs OpsWorks
AWS Organizations
- Difference between Service Control Policies and IAM Policies
- SCP provides the maximum permission that a user can have, however, the user still needs to be explicitly given IAM policy.
- Consolidated billing enables consolidating payments from multiple AWS accounts and includes combined usage and volume discounts including sharing of Reserved Instances across accounts.
Systems Manager is the operations hub and provides various services like parameter store, patch manager
- Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management. Does not support secrets rotation. Use Secrets Manager instead
- Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.
- Patch Manager helps automate the process of patching managed instances with both security-related and other types of updates.

CloudWatch
- collects monitoring and operational data in the form of logs, metrics, and events, and visualizes it.
  - EC2 metrics can track (disk, network, CPU, status checks) but do not capture metrics like memory, disk swap, disk storage, etc.
  - CloudWatch unified agent can be used to gather custom metrics like memory, disk swap, disk storage, etc.
  - CloudWatch Alarm actions can be configured to perform actions based on various metrics for e.g. CPU below 5%
  - CloudWatch alarm can monitor StatusCheckFailed_System status on an EC2 instance and automatically recover the instance if it becomes impaired due to an underlying hardware failure or a problem that requires AWS involvement to repair.
  - Know ELB monitoring
    - Load Balancer metrics SurgeQueueLength and SpilloverCount
    - HealthyHostCount, UnHealthyHostCount determines the number of healthy and unhealthy instances registered with the load balancer.
    - Reasons for 4XX and 5XX errors
- CloudWatch logs can be used to monitor, store, and access log files from EC2 instances, CloudTrail, Route 53, and other sources. You can create metric filters over the logs.
- CloudWatch Subscription Filters can be used to send logs to Kinesis Data Streams, Lambda, or Kinesis Data Firehose.
- EventBridge (CloudWatch Events) is a serverless event bus service that makes it easy to connect applications with data from a variety of sources.
- EventBridge or CloudWatch events can be used as a trigger for periodically scheduled events.
- CloudWatch unified agent helps collect metrics and logs from EC2 instances and on-premises servers and push them to CloudWatch.

CloudTrail for audit and governance
- With Organizations, the trail can be configured to log CloudTrail from all accounts to a central account.
- CloudTrail log file integrity validation can be used to check whether a log file was modified, deleted, or unchanged after being delivered.

AWS Config is a fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
- supports managed as well as custom rules that can be evaluated on periodic basis or as the event occurs for compliance and trigger automatic remediation
- Conformance pack is a collection of AWS Config rules and remediation actions that can be easily deployed as a single entity in an account and a Region or across an organization in AWS Organizations.

Control Tower
- to setup, govern, and secure a multi-account environment
- strongly recommended guardrails cover EBS encryption

Service Catalog
- allows organizations to create and manage catalogues of IT services that are approved for use on AWS with minimal permissions.
Trusted Advisor provides recommendations that help follow AWS best practices covering security, performance, cost, fault tolerance & service limits.

AWS Health Dashboard is the single place to learn about the availability and operations of AWS services.
Cost allocation tags can be used to differentiate resource costs and analyzed using Cost Explorer or on a Cost Allocation report.
Understand how to setup Billing Alerts using CloudWatch

Networking & Content Delivery

VPC – Virtual Private Cloud is a virtual network in AWS
- Understand Public Subnet (has access to the Internet) vs Private Subnet (no access to the Internet)
- Route table defines rules, termed as routes, which determine where network traffic from the subnet would be routed
- Internet Gateway enables access to the internet
- Bastion host – allow access to instances in the private subnet without directly exposing them to the internet.
- NAT helps route traffic from private subnets to the internet
- NAT instance vs NAT Gateway
- Virtual Private Gateway – Connectivity between on-premises and VPC
- Egress-Only Internet Gateway – relevant to IPv6 only to allow egress traffic from private subnet to internet, without allowing ingress traffic
- VPC Flow Logs enables you to capture information about the IP traffic going to and from network interfaces in the VPC and can help in monitoring the traffic or troubleshooting any connectivity issues
- Security Groups vs NACLs esp. Security Groups are stateful and NACLs are stateless.
- VPC Peering provides a connection between two VPCs that enables routing of traffic between them using private IP addresses.
- VPC Endpoints enables the creation of a private connection between VPC to supported AWS services and VPC endpoint services powered by PrivateLink using its private IP address
- Ability to debug networking issues like EC2 not accessible, EC2 not reachable, or not able to communicate with others or Internet.
Route 53 provides a scalable DNS system
- supports ALIAS record type helps map zone apex records to ELB, CloudFront, and S3 endpoints.
- Understand Routing Policies and their use cases
  - Failover routing policy helps to configure active-passive failover.
  - Geolocation routing policy helps route traffic based on the location of the users.
  - Geoproximity routing policy helps route traffic based on the location of the resources and, optionally, shift traffic from resources in one location to resources in another.
  - Latency routing policy use with resources in multiple AWS Regions and you want to route traffic to the Region that provides the best latency with less round-trip time.
  - Weighted routing policy helps route traffic to multiple resources in specified proportions.
- Focus on Weighted, Latency routing policies
Understand ELB, ALB, and NLB and what features they provide like
- Understand keys differences ELB vs ALB vs NLB
- ALB provides content and path routing
- NLB provides the ability to give static IPs to the load balancer esp. if there is a requirement to whitelist IPs.
- LB access logs provide the source IP address
- supports Sticky sessions to enable the load balancer to bind a user’s session to a specific target.
Understand CloudFront and use cases
- CloudFront can be used with S3 to expose static data and website
Know VPN and Direct Connect to provide AWS to on-premises connectivity. Not covered in detail.

Compute

Understand EC2 in depth
- Understand EC2 instance types and use cases.
- Understand EC2 purchase options esp. spot instances and improved reserved instances options.
- Understand EC2 Metadata & Userdata.
- Understand EC2 Security.
  - Use IAM Role work with EC2 instances to access services
  - IAM Role can now be attached to stopped and runnings instances
- AMIs provide the information required to launch an instance, which is a virtual server in the cloud.
  - AMIs are regional and can be shared publicly or with other accounts
  - Only AMIs with unencrypted volumes or encrypted with a CMK (customer-managed keys) can be shared.
  - The best practice is to use prebaked or golden images to reduce startup time for the applications. Leverage EC2 Image Builder.
- Troubleshooting EC2 issues
  - RequestLimitExceeded
  - InstanceLimitExceeded – Concurrent running instance limit, default is 20, has been reached in a region. Request increase in limits.
  - InsufficientInstanceCapacity – AWS does not currently have enough available capacity to service the request. Change AZ or Instance Type.
- Monitoring EC2 instances
  - System status checks failure – Stop and Start
  - Instance status checks failure – Reboot
- EC2 supports Instance Recovery where the recovered instance is identical to the original instance, including the instance ID, private IP addresses, Elastic IP addresses, and all instance metadata.
- EC2 Image Builder can be used to pre-baked images with software to speed up booting and launching time.
Understand Placement groups
- Cluster Placement Group provide low latency, High-Performance Computing by the logical grouping of instances within a Single AZ
- Spread Placement Groups is a group of instances that are each placed on distinct underlying hardware i.e. each instance on a distinct rack across AZ
- Partition Placement Groups is a group of instances spread across partitions i.e. group of instances spread across racks across AZs
Understand Auto Scaling
- Auto Scaling can be configured with multiple AZs for high availability to launch instances across multiple AZs
- Auto Scaling attempts to distribute instances evenly between the AZs that are enabled for the Auto Scaling group
- Auto Scaling supports
  - Dynamic scaling, which allows you to scale automatically in response to the changing demand
  - Schedule scaling, which allows you to scale the application in response to predictable load changes
  - Manual scaling can be performed by changing the desired capacity or adding and removing instances
- Auto Scaling life cycle hooks can be used to perform activities before instance termination.
Understand Lambda and its use cases
- Lambda functions can be hosted in VPC with internet access controlled by a NAT instance.
- RDS Proxy acts as an intermediary between the application and an RDS database. RDS Proxy establishes and manages the necessary connection pools to the database so that the application creates fewer database connections.

Storage

S3 provides an object storage service
- Understand storage classes with lifecycle policies
- S3 data protection provides encryption at rest and encryption in transit
  - S3 default encryption can be used to encrypt the data with S3 bucket policies to prevent or reject unencrypted object uploads.
- Multi-part handling for fault-tolerant and performant large file uploads
- static website hosting, CORS
- S3 Versioning can help recover from accidental deletes and overwrites.
- Pre-Signed URLs for both upload and download
- S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between the client and an S3 bucket using globally distributed edge locations in CloudFront.
Understand Glacier as archival storage. Glacier does not provide immediate access to the data even with expediated retrievals.
Understand EBS storage option
- EBS vs Instance store volumes
- EBS volume types and their use cases, limitations esp. IOPS
Storage Gateway allows storage of data in the AWS cloud for scalable and cost-effective storage while maintaining data security.
- Gateway-cached volumes stores data is stored in S3 and retains a copy of recently read data locally for low latency access to the frequently accessed data
- Gateway-stored volumes maintain the entire data set locally to provide low latency access
EFS is a cost-optimized, serverless, scalable, and fully managed file storage for use with AWS Cloud and on-premises resources.
- supports data at rest encryption only during the creation. After creation, the file system cannot be encrypted and must be copied over to a new encrypted disk.
- supports General purpose and Max I/O performance mode.
- If hitting PercentIOLimit issue move to Max I/O performance mode.
FSx makes it easy and cost-effective to launch, run, and scale feature-rich, high-performance file systems in the cloud
FSx for Windows supports SMB protocol and a Multi-AZ file system to provide high availability across multiple AZs.
AWS Backup can be used to automate backup for EC2 instances and EFS file systems
Data Lifecycle Manager to automate the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs.
AWS DataSync automates moving data between on-premises storage and S3 or Elastic File System (EFS).

Databases

RDS provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks.
- Understand RDS Multi-AZ vs Read Replicas and use cases
- Multi-AZ deployment provides high availability, durability, and failover support
- Read replicas enable increased scalability and database availability in the case of an AZ failure.
- Automated backups and database change logs enable point-in-time recovery of the database during the backup retention period, up to the last five minutes of database usage.
Aurora is a fully managed, MySQL- and PostgreSQL-compatible, relational database engine
- Backtracking “rewinds” the DB cluster to the specified time and performs in-place restore and does not create a new instance.
- Automated Backups that help restore the DB as a new instance
Know ElastiCache use cases, mainly for caching performance
- Understand ElastiCache Redis vs Memcached
- Redis provides Multi-AZ support helps provide high availability across AZs and Online resharding to dynamically scale.
- ElastiCache can be used as a caching layer for RDS.
Know DynamoDB. Not covered in detail

Security

IAM provides Identity and Access Management services.
- Focus on IAM role and its use case, especially with the EC2 instance
- Understand IAM identity providers and federation and use cases
- Understand the process to configure cross-account access
S3 Encryption supports data at rest and in transit encryption
- Understand S3 with SSE, SSE-C, SSE-KMS
- S3 default encryption can help encrypt objects, however, it does not encrypt existing objects before the setting was enabled. You can use S3 Inventory to list the objects and S3 Batch to encrypt them.
Understand KMS for key management and envelope encryption
- KMS with imported customer key material does not support rotation and has to be done manually.
AWS WAF – Web Application Firewall helps protect the applications against common web exploits like XSS or SQL Injection and bots that may affect availability, compromise security, or consume excessive resources
AWS GuardDuty is a threat detection service that continuously monitors the AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
AWS Secrets Manager can help securely expose credentials as well as rotate them.
- Secrets Manager integrates with Lambda and supports credentials rotation
AWS Shield is a managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS
Amazon Inspector
- is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS.
- automatically assesses applications for exposure, vulnerabilities, and deviations from best practices.
AWS Certificate Manager (ACM) handles the complexity of creating, storing, and renewing public and private SSL/TLS X.509 certificates and keys that protect the AWS websites and applications.
Know AWS Artifact as on-demand access to compliance reports

Analytics

Amazon Athena can be used to query S3 data without duplicating the data and using SQL queries
OpenSearch (Elasticsearch) service is a distributed search and analytics engine built on Apache Lucene.
- Opensearch production setup would be 3 AZs, 3 dedicated master nodes, 6 nodes with two replicas in each AZ.

Integration Tools

Understand SQS as a message queuing service and SNS as pub/sub notification service
- Focus on SQS as a decoupling service
- Understand SQS FIFO, make sure you know the differences between standard and FIFO
Understand CloudWatch integration with SNS for notification

Practice Labs

Create IAM users, IAM roles with specific limited policies.
Create a private S3 bucket
- enable versioning
- enable default encryption
- enable lifecycle policies to transition and expire the objects
- enable same region replication
Create a public S3 bucket with static website hosting
Set up a VPC with public and private subnets with Routes, SGs, NACLs.
Set up a VPC with public and private subnets and enable communication from private subnets to the Internet using NAT gateway
Create EC2 instance, create a Snapshot and restore it as a new instance.
Set up Security Groups for ALB and Target Groups, and create ALB, Launch Template, Auto Scaling Group, and target groups with sample applications. Test the flow.
Create Multi-AZ RDS instance and instance force failover.
Set up SNS topic. Use Cloud Watch Metrics to create a CloudWatch alarm on specific thresholds and send notifications to the SNS topic
Set up SNS topic. Use Cloud Watch Logs to create a CloudWatch alarm on log patterns and send notifications to the SNS topic.
Update a CloudFormation template and re-run the stack and check the impact.
Use AWS Data Lifecycle Manager to define snapshot lifecycle.
Use AWS Backup to define EFS backup with hourly and daily backup rules.

AWS Certified SysOps Administrator – Associate (SOA-C02) Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Learning Path

AWS DevOps - Professional DOP-C02 Certificate

April 12, 2023 ~ Last updated on : October 4, 2023 ~ jayendrapatil ~ 1 Comment

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Learning Path

AWS Certified DevOps Engineer – Professional (DOP-C02) exam is the upgraded pattern of the DevOps Engineer – Professional (DOP-C01) exam which was released in March 2023.

I recently attempted the latest pattern and DOP-C02 is quite similar to DOP-C01 with the inclusion of new services and features.

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Content

AWS Certified DevOps Engineer – Professional (DOP-C02) exam is intended for individuals who perform a DevOps engineer role and focuses on provisioning, operating, and managing distributed systems and services on AWS.
DOP-C02 basically validates
- Implement and manage continuous delivery systems and methodologies on AWS
- Implement and automate security controls, governance processes, and compliance validation
- Define and deploy monitoring, metrics, and logging systems on AWS
- Implement systems that are highly available, scalable, and self-healing on the AWS platform
- Design, manage, and maintain tools to automate operational processes

Refer to AWS Certified DevOps Engineer – Professional Exam Guide

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Resources

Online Courses
- Stephane Maarek – AWS Certified DevOps Engineer Professional
- Adrian Cantrill – AWS Certified DevOps Engineer – Professional
- Adrian Cantrill – AWS Professional Bundle
- Whizlabs – AWS Certified DevOps Engineer Professional Course
- Coursera – DevOps on AWS Specialization

Practice tests
- Braincert AWS Certified DevOps Engineer – Professional DOP-C02 Practice Exams
- Stephane Maarek – AWS Certified DevOps Engineer Professional Practice Tests
- Whizlabs – AWS Certified DevOps Engineer Professional Practice Tests

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Summary

Professional exams are tough, lengthy, and tiresome. Most of the questions and answers options have a lot of prose and a lot of reading that needs to be done, so be sure you are prepared and manage your time well.
Each solution involves multiple AWS services.

DOP-C02 exam has 75 questions to be solved in 170 minutes. Only 65 affect your score, while 10 unscored questions are for evaluation for future use.
DOP-C02 exam includes two types of questions, multiple-choice and multiple-response.
DOP-C02 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 750.

Each question mainly touches multiple AWS services.
Professional exams currently cost $ 300 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.

As always, mark the questions for review and move on and come back to them after you are done with all.
As always, having a rough architecture or mental picture of the setup helps focus on the areas that you need to improve. Trust me, you will be able to eliminate 2 answers for sure and then need to focus on only the other two. Read the other 2 answers to check the difference area and that would help you reach the right answer or at least have a 50% chance of getting it right.
AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.

Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Topics

AWS Certified DevOps Engineer – Professional exam covers a lot of concepts and services related to Automation, Deployments, Disaster Recovery, HA, Monitoring, Logging, and Troubleshooting. It also covers security and compliance related topics.

Management & Governance tools

CloudFormation
- provides an easy way to create and manage a collection of related AWS resources, provision and update them in an orderly and predictable fashion.
- Make sure you have gone through and executed a CloudFormation template to provision AWS resources.
- CloudFormation Concepts cover
  - Templates act as a blueprint for provisioning of AWS resources
  - Stacks are collection of resources as a single unit, that can be created, updated, and deleted by creating, updating, and deleting stacks.
  - Change Sets present a summary or preview of the proposed changes that CloudFormation will make when a stack is updated.
  - Nested stacks are stacks created as part of other stacks.
- CloudFormation template anatomy consists of resources, parameters, outputs, and mappings.
- CloudFormation supports multiple features
  - Drift detection enables you to detect whether a stack’s actual configuration differs, or has drifted, from its expected configuration.
  - Termination protection helps prevent a stack from being accidentally deleted.
  - Stack policy can prevent stack resources from being unintentionally updated or deleted during a stack update.
  - StackSets help create, update, or delete stacks across multiple accounts and Regions with a single operation.
  - Helper scripts with creation policies can help wait for the completion of events before provisioning or marking resources complete.
  - Update policy supports rolling and replacing updates with AutoScaling.
  - Deletion policies to help retain or backup resources during stack deletion.
  - Custom resources can be configured for uses cases not supported for e.g. retrieve AMI IDs or interact with external services
- Understand CloudFormation Best Practices esp. Nested Stacks and logical grouping

Elastic Beanstalk
- helps to quickly deploy and manage applications in the AWS Cloud without having to worry about the infrastructure that runs those applications.
- Understand Elastic Beanstalk overall – Applications, Versions, and Environments
- Deployment strategies with their advantages and disadvantages
OpsWorks
- is a configuration management service that helps to configure and operate applications in a cloud enterprise by using Chef.
- Understand OpsWorks overall – stacks, layers, recipes
- Understand OpsWorks Lifecycle events esp. the Configure event and how it can be used.
- Understand OpsWorks Deployment Strategies
- Know OpsWorks auto-healing and how to be notified for it.
Understand CloudFormation vs Elastic Beanstalk vs OpsWorks
AWS Organizations
- Difference between Service Control Policies and IAM Policies
- SCP provides the maximum permission that a user can have, however, the user still needs to be explicitly given IAM policy.
Systems Manager
- AWS Systems Manager and its various services like parameter store, patch manager
- Parameter Store provides secure, scalable, centralized, hierarchical storage for configuration data and secret management. Does not support secrets rotation. Use Secrets Manager instead
- Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys.
- Patch Manager helps automate the process of patching managed instances with both security-related and other types of updates.
CloudWatch
- supports monitoring, logging, and alerting.
- CloudWatch logs can be used to monitor, store, and access log files from EC2 instances, CloudTrail, Route 53, and other sources. You can create metric filters over the logs.
- CloudWatch Subscription Filters can be used to send logs to Kinesis Data Streams, Lambda, or Kinesis Data Firehose.
- EventBridge (CloudWatch Events) is a serverless event bus service that makes it easy to connect applications with data from a variety of sources.
- EventBridge or CloudWatch events can be used as a trigger for periodically scheduled events.
- CloudWatch unified agent helps collect metrics and logs from EC2 instances and on-premises servers and push them to CloudWatch.
- CloudWatch Synthetics helps create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs

CloudTrail
- for audit and governance
- With Organizations, the trail can be configured to log CloudTrail from all accounts to a central account.

Config is a fully managed service that provides AWS resource inventory, configuration history, and configuration change notifications to enable security, compliance, and governance.
- supports managed as well as custom rules that can be evaluated on periodic basis or as the event occurs for compliance and trigger automatic remediation
- Conformance pack is a collection of AWS Config rules and remediation actions that can be easily deployed as a single entity in an account and a Region or across an organization in AWS Organizations.

Control Tower
- to setup, govern, and secure a multi-account environment
- strongly recommended guardrails cover EBS encryption

Service Catalog
- allows organizations to create and manage catalogues of IT services that are approved for use on AWS with minimal permissions.
Trusted Advisor
- helps with cost optimization and service limits in addition to security, performance, and fault tolerance.
AWS Health Dashboard is the single place to learn about the availability and operations of AWS services.

Developer Tools

Know AWS Developer tools

CodeCommit is a secure, scalable, fully-managed source control service that helps to host secure and highly scalable private Git repositories.
- can help handle deployments of code to different environments using same repository and different branches.
CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy.

CodeDeploy helps automate code deployments to any instance, including EC2 instances and instances running on-premises, Lambda, and ECS.
- Understand CodeDeploy Lifecycle events hooks
- Understand CodeDeploy deployment configurations (hint : supports canary and linear deployment)
- Understand CodeDeploy redeploy and rollbacks
CodePipeline is a fully managed continuous delivery service that helps automate the release pipelines for fast and reliable application and infrastructure updates.
- CodePipeline pipeline structure (Hint : run builds parallelly using runorder)
- Understand how to configure notifications on events and failures
- CodePipeline supports Manual Approval
CodeArtifact is a fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process.

CodeGuru provides intelligent recommendations to improve code quality and identify an application’s most expensive lines of code. Reviewer helps improve code quality and Profiler helps optimize performance for applications
EC2 Image Builder helps to automate the creation, management, and deployment of customized, secure, and up-to-date server images that are pre-installed and pre-configured with software and settings to meet specific IT standards.

Disaster Recovery

Disaster recovery is mainly covered as a part of Re-silent cloud solutions.

Disaster Recovery whitepaper, although outdated, make sure you understand the differences and implementation for each type esp. pilot light, warm standby w.r.t RTO, and RPO.
Compute
- Make components available in an alternate region,
- Backup and Restore using either snapshots or AMIs that can be restored.
- Use minimal low-scale capacity running which can be scaled once the failover happens
- Use fully running compute in active-active confirmation with health checks.
- CloudFormation to create, and scale infra as needed
Storage
- S3 and EFS support cross-region replication
- DynamoDB supports Global tables for multi-master, active-active inter-region storage needs.
- Aurora Global Database provides cross-region read replicas and failover capabilities.
- RDS supports cross-region read replicas which can be promoted to master in case of a disaster. This can be done using Route 53, CloudWatch, and lambda functions.

Network
- Route 53 failover routing with health checks to failover across regions.
- CloudFront Origin Groups support primary and secondary endpoints with failover.

Networking & Content Delivery

Networking is covered very lightly.
VPC – Virtual Private Cloud
- Security Groups, NACLs
  - NACLs are stateless and need to open ephemeral ports for response traffic.
- VPC Gateway Endpoints to provide access to S3 and DynamoDB
- VPC Interface Endpoints or PrivateLink provide access to a variety of services like SQS, Kinesis, or Private APIs exposed through NLB.
- VPC Peering to enable communication between VPCs within the same or different regions.
- VPC Peering does not support overlapping CIDRs while PrivateLink does as only the endpoint is exposed.
- VPC Flow Logs to track network traffic and can be published to CloudWatch Logs, S3, or Kinesis Data Firehose.
- NAT Gateway provides managed NAT service that provides better availability, higher bandwidth, and requires less administrative effort.
Route 53
- Routing Policies
  - focus on Weighted, Latency, and failover routing policies
  - failover routing provides active-passive configuration for disaster recovery while the others are active-active configurations.
CloudFront
- fully managed, fast CDN service that speeds up the distribution of static, dynamic web or streaming content to end-users.
Load Balancer – ELB, ALB and NLB
- ELB with Auto Scaling to provide scalable and highly available applications
- Understand ALB vs NLB and their use cases.
- Access logs needs to be enabled and logs only to S3.
Direct Connect & VPN
- provide on-premises to AWS connectivity
- Understand Direct Connect vs VPN
- VPN can provide a cost-effective, quick failover for Direct Connect.
- VPN over Direct Connect provides a secure dedicated connection and requires a public virtual interface.

Security, Identity & Compliance

AWS Identity and Access Management
- IAM Roles and use cases. Understand the process for provisioning cross-account access.
- IAM Web Identity & Federation
- IAM Best Practices
AWS WAF
- protects from common attack techniques like SQL injection and XSS, Conditions based include IP addresses, HTTP headers, HTTP body, and URI strings.
- integrates with CloudFront, ALB, and API Gateway.
AWS KMS – Key Management Service
- managed encryption service that allows the creation and control of encryption keys to enable data encryption.
Secrets Manager
- helps protect secrets needed to access applications, services, and IT resources.
AWS GuardDuty
- is a threat detection service that continuously monitors the AWS accounts and workloads for malicious activity and delivers detailed security findings for visibility and remediation.
AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates alerts and enables automated remediation.
Firewall Manager helps centrally configure and manage firewall rules across the accounts and applications in AWS Organizations which includes a variety of protections, including WAF, Shield Advanced, VPC security groups, Network Firewall, and Route 53 Resolver DNS Firewall.

Storage

Simple Storage Service – S3
- S3 Permissions & S3 Data Protection
  - S3 bucket policies to control access to VPC Endpoints and provide cross-account access.
- S3 Storage Classes & Lifecycle policies
  - covers S3 Standard, Infrequent access, intelligent tier, and Glacier for archival and object transitions & deletions for cost management.
- S3 supports cross-region replication. Understand how the process works in terms of permissions.
- S3 can be used for static website hosting and integrates with CloudFront to improve performance and latency.
- S3 Security
  - S3 supports encryption using KMS
  - S3 supports Object Lock and Glacier supports Vault lock to prevent the deletion of objects, especially required for compliance requirements.
- S3 supports the same and cross-region replication for disaster recovery.
- S3 Access Logs enable tracking access requests to an S3 bucket.
- S3 Event Notification enables notifications to be triggered when certain events happen in the bucket and supports SNS, SQS, and Lambda as the destination. S3 needs permission to be able to integrate with the services.
Elastic Block Store
- EBS Backup using snapshots for HA and Disaster recovery
- Data Lifecycle Manager can be used to automate the creation, retention, and deletion of snapshots taken to back up the EBS volumes.
Elastic File System – EFS
- provides fully managed, scalable, serverless, shared, and cost-optimized file storage for use with AWS and on-premises resources.
- supports cross-region replication for disaster recovery
- supports storage classes like S3
- supports only Linux-based AMIs

Database

DynamoDB
- provides a fully managed NoSQL database service with fast and predictable performance with seamless scalability.
- DynamoDB Auto Scaling can be used to handle peaks or bursts.
- DynamoDB Streams for tracking changes in real time.
- Global tables for multi-master, active-active inter-region storage needs.
- Global tables do not support strong global consistency
- DynamoDB Accelerator – DAX for seamless caching to reduce the load on DynamoDB for read-heavy requirements.
RDS
- supports cross-region read replicas ideal for disaster recovery with low RTO and RPO.
- provides RDS proxy for effective database connection polling
- RDS Multi-AZ vs Read Replicas
Aurora
- fully managed, MySQL- and PostgreSQL-compatible, relational database engine
- Aurora Serverless provides on-demand, autoscaling configuration.
- Aurora Global Database consists of one primary AWS Region where the data is mastered, and up to five read-only, secondary AWS Regions.
- Aurora Endpoints supports Cluster (writer) and Reader endpoints.
Understand DynamoDB Global Tables vs Aurora Global Databases

Compute

EC2
Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application
- Auto Scaling Lifecycle events enable performing custom actions by pausing instances as an ASG launches or terminates them.
- Blue/green deployments with Auto Scaling – With new launch configurations, new auto-scaling groups, or CloudFormation update policies.
Lambda
- offers Serverless computing
- helps define reserved concurrency limits to reduce the impact
- Lambda Alias now supports canary deployments
- Reserved Concurrency guarantees the maximum number of concurrent instances for the function
- Provisioned Concurrency
  - provides greater control over the performance of serverless applications and helps keep functions initialized and hyper-ready to respond in double-digit milliseconds.
  - supports Application Auto Scaling.
Step Functions helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.
ECS – Elastic Container Service
- container management service that supports Docker containers
- supports two launch types
  - EC2 and
  - Fargate which provides the serverless capability
ECR provides a fully managed, secure, scalable, reliable container image registry service. It supports lifecycle policies for images.

Integration Tools

SQS in terms of loose coupling and scaling.
- Difference between SQS Standard and FIFO esp. with throughput and order
- SQS supports dead letter queues and redrive policy which specifies the source queue, the dead-letter queue, and the conditions under which SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times.
CloudWatch integration with SNS and Lambda for notifications.

Analytics

Kinesis
- for real-time data ingestion and analytics.
- Difference between Kinesis Data Streams and Kinesis Firehose
- Kinesis Data Firehose integrates with S3, Redshift, and OpenSearch.
OpenSearch (Elasticsearch) provides a managed search solution.

Whitepapers

AWS Certified DevOps Engineer – Professional (DOP-C02) Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Certified Developer – Associate DVA-C02 Exam Learning Path

AWS Certified Developer - Associate Certification

March 10, 2023 ~ Last updated on : October 4, 2023 ~ jayendrapatil

AWS Certified Developer – Associate DVA-C02 Exam Learning Path

AWS Certified Developer – Associate DVA-C02 exam is the latest AWS exam released on 27th February 2023 and has replaced the previous AWS Developer – Associate DVA-C01 certification exam.

I passed the AWS Developer – Associate DVA-C02 exam with a score of 835/1000.

AWS Certified Developer – Associate DVA-C02 Exam Content

DVA-C02 validates a candidate’s ability to demonstrate proficiency in developing, testing, deploying, and debugging AWS cloud-based applications.
DVA-C02 also validates a candidate’s ability to complete the following tasks:
- Develop and optimize applications on AWS
- Package and deploy by using continuous integration and continuous delivery (CI/CD) workflows
- Secure application code and data
- Identify and resolve application issues

Refer AWS Certified Developer – Associate Exam Blue Print

AWS Certified Developer - Associate Domains

AWS Certified Developer – Associate DVA-C02 Summary

DVA-C02 exam consists of 65 questions in 130 minutes, and the time is more than sufficient if you are well-prepared.
DVA-C02 exam includes two types of questions, multiple-choice and multiple-response.

DVA-C02 has a scaled score between 100 and 1,000. The scaled score needed to pass the exam is 720.
Associate exams currently cost $ 150 + tax.
You can get an additional 30 minutes if English is your second language by requesting Exam Accommodations. It might not be needed for Associate exams but is helpful for Professional and Specialty ones.

AWS exams can be taken either remotely or online, I prefer to take them online as it provides a lot of flexibility. Just make sure you have a proper place to take the exam with no disturbance and nothing around you.
Also, if you are taking the AWS Online exam for the first time try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.

AWS Certified Developer – Associate DVA-C02 Exam Resources

Online courses
- Stephane Maarek – Ultimate AWS Certified Developer Associate DVA-C02
- Adrian Cantrill – AWS Certified Developer – Associate
- Adrian Cantrill – All Associate Bundle
- DolfinEd – AWS Certified Developer Associate
- Whizlabs – AWS Certified Developer Associate Course
Practice tests
- Braincert AWS Certified Developer – Associate DVA-C02 Practice Exams
- Stephane Maarek – AWS Certified Developer – Associate Practice Exams
- Whizlabs – AWS Certified Developer Associate Practice Tests

Signed up with AWS for the Free Tier account which provides a lot of Services to be tried for free with certain limits which are more than enough to get things going. Be sure to decommission anything, if you using anything beyond the free limits, preventing any surprises 🙂
Read the FAQs at least for the important topics, as they cover important points and are good for quick review

AWS Certified Developer – Associate DVA-C02 Exam Topics

AWS DVA-C02 exam concepts cover solutions that fall within AWS Well-Architected framework to cover scalable, highly available, cost-effective, performant, and resilient pillars.

AWS Certified Developer – Associate DVA-C02 exam covers a lot of the latest AWS services like Amplify, X-Ray while focusing majorly on other services like Lambda, DynamoDB, Elastic Beanstalk, S3, EC2
AWS Certified Developer – Associate DVA-C02 exam is similar to DVA-C01 with more focus on the hands-on development and deployment concepts rather than just the architectural concepts.
If you had been preparing for the DVA-C01, DVA-C02 is pretty much similar except for the addition of some new services covering Amplify, X-Ray, etc.

Compute

Elastic Cloud Compute – EC2
Auto Scaling and ELB
- Auto Scaling provides the ability to ensure a correct number of EC2 instances are always running to handle the load of the application
- Elastic Load Balancer allows the incoming traffic to be distributed automatically across multiple healthy EC2 instances
Autoscaling & ELB
- work together to provide High Availability and Scalability.
- Span both ELB and Auto Scaling across Multi-AZs to provide High Availability
- Do not span across regions. Use Route 53 or Global Accelerator to route traffic across regions.
Lambda and serverless architecture, its features, and use cases.
- Lambda integrated with API Gateway to provide a serverless, highly scalable, cost-effective architecture.
- Lambda execution role needs the required permissions to integrate with other AWS services.
- Environment variables to keep functions configurable.
- Lambda Layers provide a convenient way to package libraries and other dependencies that you can use with your Lambda functions.
- Function versions can be used to manage the deployment of the functions.
- Function Alias supports creating aliases, which are mutable, for each function version.
- provides /tmp ephemeral scratch storage.
- Integrates with X-Ray for distributed tracing.
- Use RDS proxy for connection pooling.

Elastic Container Service – ECS with its ability to deploy containers and microservices architecture.
- ECS role for tasks can be provided through taskRoleArn
- ALB provides dynamic port mapping to allow multiple same tasks on the same node.

Elastic Kubernetes Service – EKS
- managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises data centers
- ideal for migration of an existing workload on Kubernetes

Elastic Beanstalk
- at a high level, what it provides, and its ability to get an application running quickly.
- Deployment types with their advantages and disadvantages

Databases

Understand relational and NoSQL data storage options which include RDS, DynamoDB, and Aurora with their use cases
Relational Database Service – RDS
- Read Replicas vs Multi-AZ
  - Read Replicas for scalability, Multi-AZ for High Availability
  - Multi-AZ is regional only
  - Read Replicas can span across regions and can be used for disaster recovery

RDS Proxy
- fully managed, highly available database proxy for RDS that makes applications more secure, scalable, more resilient to database failures.
- allows apps to pool and share DB connections established with the database

DynamoDB
- provides low latency performance, a key-value store
- is not a relational database
- Secondary indexes on a table allow efficient access to data with attributes other than the primary key.
- Know Local Secondary Indexes vs Global Secondary Indexes
- DynamoDB DAX provides caching for DynamoDB
- DynamoDB TTL helps expire data in DynamoDB without any cost or consuming any write throughput.
- DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table and integrates with Lambda.
- DynamoDB Best Practices around designing partition keys and secondary indexes.

ElastiCache use cases, mainly for caching performance
- ElastiCache Redis vs Memcached

Storage

Simple Storage Service – S3
- S3 storage classes with lifecycle policies
  - Understand the difference between SA Standard vs SA IA vs SA IA One Zone in terms of cost and durability
- S3 Data Protection
  - S3 Client-side encryption encrypts data before storing it in S3
  - S3 encryption in transit can be enforced with S3 bucket policies using secureTransport attributes.
  - S3 encryption at rest can be enforced with S3 bucket policies using x-amz-server-side-encryption attribute.
- S3 features including
  - S3 provides cost-effective static website hosting. However, it does not support HTTPS endpoint. Can be integrated with CloudFront for HTTPS, caching, performance, and low-latency access.
  - S3 versioning provides protection against accidental overwrites and deletions. Used with MFA Delete feature.
  - S3 Pre-Signed URLs for both upload and download provide access without needing AWS credentials.
  - S3 CORS allows cross-domain calls
  - S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket.
  - S3 Event Notifications to trigger events on various S3 events like objects added or deleted. Supports SQS, SNS, and Lambda functions.
  - Integrates with Amazon Macie to detect PII data
  - Replication that supports the same and cross-region replication required versioning to be enabled.
  - Integrates with Athena to analyze data in S3 using standard SQL.
Instance Store
- is physically attached to the EC2 instance and provides the lowest latency and highest IOPS

Elastic Block Storage – EBS
- EBS volume types and their use cases in terms of IOPS and throughput. SSD for IOPS and HDD for throughput
Elastic File System – EFS
- simple, fully managed, scalable, serverless, and cost-optimized file storage for use with AWS Cloud and on-premises resources.
- provides shared volume across multiple EC2 instances, while EBS can be attached to a single instance within the same AZ or EBS Multi-Attach can be attached to multiple instances within the same AZ
- can be mounted with Lambda functions
- supports the NFS protocol, and is compatible with Linux-based AMIs
- supports cross-region replication and storage classes for cost management.
Difference between EBS vs S3 vs EFS

Difference between EBS vs Instance Store
Would recommend referring Storage Options whitepaper, although a bit dated 90% still holds right

Security & Identity

Identity Access Management – IAM
- IAM role
  - provides permissions that are not associated with a particular user, group, or service and are intended to be assumable by anyone who needs it.
  - can be used for EC2 application access and Cross-account access
- IAM Best Practices
Cognito
- provides authentication, authorization, and user management for the web and mobile apps.
- User pools are user directories that provide sign-up and sign-in options for the app users.
- Identity pools enable you to grant the users access to other AWS services.
Key Management Services – KMS encryption service
- for key management and envelope encryption
- provides encryption at rest and does not handle encryption in transit.
Amazon Certificate Manager – ACM
- helps easily provision, manage, and deploy public and private SSL/TLS certificates for use with AWS services and internally connected resources.
AWS Secrets Manager
- helps protect secrets needed to access applications, services, and IT resources.
- supports automatic rotations of secrets
Secrets Manager vs Systems Manager Parameter Store for secrets management
- Secrets Manager supports automatic credentials rotation and is integrated with Lambda and other services like RDS, and DynamoDB.
- Systems Manager Parameter Store provides free standard parameters and is cost-effective as compared to Secrets Manager.

Front-end Web and Mobile

API Gateway
- is a fully managed service that makes it easy for developers to publish, maintain, monitor, and secure APIs at any scale.
- Powerful, flexible authentication mechanisms, such as AWS IAM policies, Lambda authorizer functions, and Amazon Cognito user pools.
- supports Canary release deployments for safely rolling out changes.
- define usage plans to meter, restrict third-party developer access, configure throttling, and quota limits on a per API key basis
- integrates with AWS X-Ray for understanding and triaging performance latencies.
- API Gateway CORS allows cross-domain calls
Amplify
- is a complete solution that lets frontend web and mobile developers easily build, ship, and host full-stack applications on AWS, with the flexibility to leverage the breadth of AWS services as use cases evolve.

Management Tools

CloudWatch
- monitoring to provide operational transparency
- is extendable with custom metrics
- does not capture memory metrics, by default, and can be done using the CloudWatch agent.
EventBridge
- is a serverless event bus service that makes it easy to connect applications with data from a variety of sources.
- enables building loosely coupled and distributed event-driven architectures.
CloudTrail
- helps enable governance, compliance, and operational and risk auditing of the AWS account.
- helps to get a history of AWS API calls and related events for the AWS account.
CloudFormation
- easy way to create and manage a collection of related AWS resources, and provision and update them in an orderly and predictable fashion.
- Supports Serverless Application Model – SAM for the deployment of serverless applications including Lambda.
- CloudFormation StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and Regions with a single operation.

Integration Tools

Simple Queue Service
- as message queuing service and SNS as pub/sub notification service
- as a decoupling service and provide resiliency
- SQS features like visibility, and long poll vs short poll
- provide scaling for the Auto Scaling group based on the SQS size.
- SQS Standard vs SQS FIFO difference
  - FIFO provides exactly-once delivery but with low throughput
Simple Notification Service – SNS
- is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients
- Fanout pattern can be used to push messages to multiple subscribers.
Understand SQS as a message queuing service and SNS as a pub/sub notification service.
Know AWS Developer tools
- CodeCommit is a secure, scalable, fully-managed source control service that helps to host secure and highly scalable private Git repositories.
- CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy.
- CodeDeploy helps automate code deployments to any instance, including EC2 instances and instances running on-premises.
- CodePipeline is a fully managed continuous delivery service that helps automate the release pipelines for fast and reliable application and infrastructure updates.
- CodeArtifact is a fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process.
X-Ray
- helps developers analyze and debug production, distributed applications for e.g. built using a microservices lambda architecture

Analytics

Redshift as a business intelligence tool
Kinesis
- for real-time data capture and analytics.
- Integrates with Lambda functions to perform transformations
AWS Glue
- fully-managed, ETL service that automates the time-consuming steps of data preparation for analytics

Networking

Does not cover much networking or designing networks, but be sure you understand VPC, Subnets, Routes, Security Groups, etc.

AWS Cloud Computing Whitepapers

Architecting for the AWS Cloud: Best Practices
AWS Well-Architected Framework whitepaper (This is theoretical paper, with loads of theory and is tiresome. If you cover the above topics, you can skip this one)
AWS Security Best Practices whitepaper, August 2016
Practicing Continuous Integration and Continuous Delivery on AWS Accelerating Software Delivery with DevOps whitepaper, June 2017
Microservices on AWS whitepaper, September 2017
Serverless Architectures with AWS Lambda whitepaper, November 2017
Optimizing Enterprise Economics with Serverless Architectures whitepaper, October 2017
Running Containerized Microservices on AWS whitepaper, November 2017
Blue/Green Deployments on AWS whitepaper, August 2016

On the Exam Day

Make sure you are relaxed and get some good night’s sleep. The exam is not tough if you are well-prepared.
If you are taking the AWS Online exam
- Try to join at least 30 minutes before the actual time as I have had issues with both PSI and Pearson with long wait times.
- The online verification process does take some time and usually, there are glitches.
- Remember, you would not be allowed to take the take if you are late by more than 30 minutes.
- Make sure you have your desk clear, no hand-watches, or external monitors, keep your phones away, and nobody can enter the room.

Finally, All the Best 🙂

AWS Database Services Cheat Sheet

February 8, 2023 ~ Last updated on : February 9, 2023 ~ jayendrapatil ~ 9 Comments

AWS Database Services Cheat Sheet

AWS Database Services

Relational Database Service – RDS

provides Relational Database service
supports MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, and the new, MySQL-compatible Amazon Aurora DB engine

as it is a managed service, shell (root ssh) access is not provided
manages backups, software patching, automatic failure detection, and recovery
supports use initiated manual backups and snapshots

daily automated backups with database transaction logs enables Point in Time recovery up to the last five minutes of database usage
snapshots are user-initiated storage volume snapshot of DB instance, backing up the entire DB instance and not just individual databases that can be restored as a independent RDS instance
RDS Security
- support encryption at rest using KMS as well as encryption in transit using SSL endpoints
- supports IAM database authentication, which prevents the need to store static user credentials in the database, because authentication is managed externally using IAM.
- supports Encryption only during creation of an RDS DB instance
- existing unencrypted DB cannot be encrypted and you need to create a snapshot, created a encrypted copy of the snapshot and restore as encrypted DB
- supports Secret Manager for storing and rotating secrets
- for encrypted database
  - logs, snapshots, backups, read replicas are all encrypted as well
  - cross region replicas and snapshots does not work across region (Note – this is possible now with latest AWS enhancement)
Multi-AZ deployment
- provides high availability and automatic failover support and is NOT a scaling solution
- maintains a synchronous standby replica in a different AZ
- transaction success is returned only if the commit is successful both on the primary and the standby DB
- Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring
- snapshots and backups are taken from standby & eliminate I/O freezes
- during automatic failover, its seamless and RDS switches to the standby instance and updates the DNS record to point to standby
- failover can be forced with the Reboot with failover option
Read Replicas
- uses the PostgreSQL, MySQL, and MariaDB DB engines’ built-in replication functionality to create a separate Read Only instance
- updates are asynchronously copied to the Read Replica, and data might be stale
- can help scale applications and reduce read only load
- requires automatic backups enabled
- replicates all databases in the source DB instance
- for disaster recovery, can be promoted to a full fledged database
- can be created in a different region for disaster recovery, migration and low latency across regions
- can’t create encrypted read replicas from unencrypted DB or read replica
RDS does not support all the features of underlying databases, and if required the database instance can be launched on an EC2 instance
RDS Components
- DB parameter groups contains engine configuration values that can be applied to one or more DB instances of the same instance type for e.g. SSL, max connections etc.
- Default DB parameter group cannot be modified, create a custom one and attach to the DB
- Supports static and dynamic parameters
  - changes to dynamic parameters are applied immediately (irrespective of apply immediately setting)
  - changes to static parameters are NOT applied immediately and require a manual reboot.
RDS Monitoring & Notification
- integrates with CloudWatch and CloudTrail
- CloudWatch provides metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance
- Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
- supports RDS Event Notification which uses the SNS to provide notification when an RDS event like creation, deletion or snapshot creation etc occurs

Aurora

is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases
is a managed services and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair

is a proprietary technology from AWS (not open sourced)
provides PostgreSQL and MySQL compatibility
is “AWS cloud optimized” and claims 5x performance improvement
over MySQL on RDS, over 3x the performance of PostgreSQL on RDS

scales storage automatically in increments of 10GB, up to 64 TB with no impact to database performance. Storage is striped across 100s of volumes.
no need to provision storage in advance.
provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.

provides instantaneous failover
replicates each chunk of my the database volume six ways across three Availability Zones i.e. 6 copies of the data across 3 AZ
- requires 4 copies out of 6 needed for writes
- requires 3 copies out of 6 need for reads
costs more than RDS (20% more) – but is more efficient
Read Replicas
- can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
- share the same data volume as the primary instance in the same AWS Region, there is virtually no replication lag
- supports Automated failover for master in less than 30 seconds
- supports Cross Region Replication using either physical or logical replication.
Security
- supports Encryption at rest using KMS
- supports Encryption in flight using SSL (same process as MySQL or Postgres)
- Automated backups, snapshots and replicas are also encrypted
- Possibility to authenticate using IAM token (same method as RDS)
- supports protecting the instance with security groups
- does not support SSH access to the underlying servers
Aurora Serverless
- provides automated database Client instantiation and on-demand autoscaling based on actual usage
- provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
- automatically starts up, shuts down, and scales capacity up or down based on the application’s needs. No capacity planning needed
- Pay per second, can be more cost-effective
Aurora Global Database
- allows a single Aurora database to span multiple AWS regions.
- provides Physical replication, which uses dedicated infrastructure that leaves the databases entirely available to serve the application
- supports 1 Primary Region (read / write)
- replicates across up to 5 secondary (read-only) regions, replication lag is less than 1 second
- supports up to 16 Read Replicas per secondary region
- recommended for low-latency global reads and disaster recovery with an RTO of < 1 minute
- failover is not automated and if the primary region becomes unavailable, a secondary region can be manually removed from an Aurora Global Database and promote it to take full reads and writes. Application needs to be updated to point to the newly promoted region.

Aurora Backtrack
- Backtracking “rewinds” the DB cluster to the specified time
- Backtracking performs in place restore and does not create a new instance. There is a minimal downtime associated with it.

Aurora Clone feature allows quick and cost-effective creation of Aurora Cluster duplicates
supports parallel or distributed query using Aurora Parallel Query, which refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.

DynamoDB

fully managed NoSQL database service

synchronously replicates data across three facilities in an AWS Region, giving high availability and data durability
runs exclusively on SSDs to provide high I/O performance
provides provisioned table reads and writes

automatically partitions, reallocates, and re-partitions the data and provisions additional server capacity as data or throughput changes
creates and maintains indexes for the primary key attributes for efficient access to data in the table
DynamoDB Table classes currently support
- DynamoDB Standard table class is the default and is recommended for the vast majority of workloads.
- DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA) table class which is optimized for tables where storage is the dominant cost.
supports Secondary Indexes
- allows querying attributes other than the primary key attributes without impacting performance.
- are automatically maintained as sparse objects
Local secondary index vs Global secondary index
- shares partition key + different sort key vs different partition + sort key
- search limited to partition vs across all partition
- unique attributes vs non-unique attributes
- linked to the base table vs independent separate index
- only created during the base table creation vs can be created later
- cannot be deleted after creation vs can be deleted
- consumes provisioned throughput capacity of the base table vs independent throughput
- returns all attributes for item vs only projected attributes
- Eventually or Strongly vs Only Eventually consistent reads
- size limited to 10Gb per partition vs unlimited
DynamoDB Consistency
- provides Eventually consistent (by default) or Strongly Consistent option to be specified during a read operation
- supports Strongly consistent reads for a few operations like Query, GetItem, and BatchGetItem using the ConsistentRead parameter
DynamoDB Throughput Capacity
- supports On-demand and Provisioned read/write capacity modes
- Provisioned mode requires the number of reads and writes per second as required by the application to be specified
- On-demand mode provides flexible billing option capable of serving thousands of requests per second without capacity planning
DynamoDB Auto Scaling helps dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.

DynamoDB Adaptive capacity is a feature that enables DynamoDB to run imbalanced workloads indefinitely.
DynamoDB Global Tables provide multi-master, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table

DynamoDB Time to Live (TTL)
- enables a per-item timestamp to determine when an item expiry
- expired items are deleted from the table without consuming any write throughput.

DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.
DynamoDB cross-region replication
- allows identical copies (called replicas) of a DynamoDB table (called master table) to be maintained in one or more AWS regions.
- using DynamoDB streams which leverages Kinesis and provides time-ordered sequence of item-level changes and can help for lower RPO, lower RTO disaster recovery
DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.
VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.

ElastiCache

managed web service that provides in-memory caching to deploy and run Memcached or Redis protocol-compliant cache clusters
ElastiCache with Redis,
- like RDS, supports Multi-AZ, Read Replicas and Snapshots
- Read Replicas are created across AZ within same region using Redis’s asynchronous replication technology
- Multi-AZ differs from RDS as there is no standby, but if the primary goes down a Read Replica is promoted as primary
- Read Replicas cannot span across regions, as RDS supports
- cannot be scaled out and if scaled up cannot be scaled down
- allows snapshots for backup and restore
- AOF can be enabled for recovery scenarios, to recover the data in case the node fails or service crashes. But it does not help in case the underlying hardware fails
- Enabling Redis Multi-AZ as a Better Approach to Fault Tolerance
ElastiCache with Memcached
- can be scaled up by increasing size and scaled out by adding nodes
- nodes can span across multiple AZs within the same region
- cached data is spread across the nodes, and a node failure will always result in some data loss from the cluster
- supports auto discovery
- every node should be homogenous and of same instance type
ElastiCache Redis vs Memcached
- complex data objects vs simple key value storage
- persistent vs non persistent, pure caching
- automatic failover with Multi-AZ vs Multi-AZ not supported
- scaling using Read Replicas vs using multiple nodes
- backup & restore supported vs not supported
can be used state management to keep the web application stateless

Redshift

fully managed, fast and powerful, petabyte scale data warehouse service

uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from node and component failures
provides Massive Parallel Processing (MPP) by distributing & parallelizing queries across multiple physical resources
columnar data storage improving query performance and allowing advance compression techniques
only supports Single-AZ deployments and the nodes are available within the same AZ, if the AZ supports Redshift clusters
spot instances are NOT an option