AWS SageMaker Built-in Algorithms Summary

SageMaker Built-in Algorothms

SageMaker AI Built-in Algorithms

📌 Naming Update (December 2024): On December 3, 2024, Amazon SageMaker was renamed to Amazon SageMaker AI. The “SageMaker” brand now refers to the next-generation unified platform for data, analytics, and AI. All built-in algorithms remain available under SageMaker AI.

  • SageMaker AI provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and ML practitioners get started on training and deploying ML models quickly.
  • SageMaker AI also provides SageMaker JumpStart with pre-trained foundation models (including LLMs like LLaMA, BLOOM, Falcon) for generative AI tasks such as text generation, summarization, and question answering.

SageMaker AI Built-in Algorithms

Tabular Data – Classification & Regression

AutoGluon-Tabular

  • is an open-source AutoML framework that succeeds by ensembling models and stacking them in multiple layers.
  • automatically performs data processing, model selection, and hyperparameter tuning.
  • used for both classification and regression tasks on tabular data.
  • supports CPU and GPU (single instance only) training.

CatBoost

  • is an implementation of the gradient-boosted trees algorithm that introduces ordered boosting and an innovative algorithm for processing categorical features.
  • used for both classification and regression tasks.
  • handles categorical features natively without requiring manual encoding.
  • supports CPU (single instance only) training.

LightGBM

  • is an implementation of the gradient-boosted trees algorithm that adds two novel techniques for improved efficiency and scalability.
  • uses Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB).
  • used for both classification and regression tasks.
  • supports CPU (single instance only) training.

TabTransformer

  • is a novel deep tabular data modeling architecture built on self-attention-based Transformers.
  • converts categorical features into contextual embeddings using Transformer layers.
  • used for both classification and regression tasks.
  • supports CPU and GPU (single instance only) training.

XGBoost (eXtreme Gradient Boosting)

  • is a popular and efficient open-source implementation of the gradient boosted trees algorithm.
  • Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler, weaker models.
  • supports both classification and regression tasks.
  • supports distributed training across multiple instances.

Linear Learner

  • are supervised learning algorithms used for solving either classification or regression problems.
  • learns a linear function for regression or a linear threshold function for classification.
  • supports distributed training.

K-nearest neighbors (k-NN) algorithm

  • is an index-based algorithm.
  • uses a non-parametric method for classification or regression.
  • For classification problems, the algorithm queries the k points that are closest to the sample point and returns the most frequently used label of their class as the predicted label.
  • For regression problems, the algorithm queries the k closest points to the sample point and returns the average of their feature values as the predicted value.

Factorization Machine

  • is a general-purpose supervised learning algorithm used for both classification and regression tasks.
  • extension of a linear model designed to capture interactions between features within high dimensional sparse datasets economically, such as click prediction and item recommendation.

Text-based

BlazingText algorithm

  • provides highly optimized implementations of the Word2vec and text classification algorithms.
  • Word2vec algorithm
    • useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc.
    • maps words to high-quality distributed vectors, whose representation is called word embeddings
    • word embeddings capture the semantic relationships between words.
  • Text classification
    • is an important task for applications performing web searches, information retrieval, ranking, and document classification
  • provides the Skip-gram and continuous bag-of-words (CBOW) training architectures

Text Classification – TensorFlow

  • is a supervised learning algorithm that supports transfer learning with many pretrained models from the TensorFlow Hub.
  • uses deep learning networks such as BERT which are highly accurate for text classification.
  • takes text as input and outputs probability for each of the class labels.
  • useful for sentiment analysis, spam detection, and document categorization.

Sequence to Sequence – seq2seq

  • is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio), and the output generated is another sequence of tokens.
  • key uses cases are machine translation (input a sentence from one language and predict what that sentence would be in another language), text summarization (input a longer string of words and predict a shorter string of words that is a summary), speech-to-text (audio clips converted into output sentences in tokens)

Forecasting

DeepAR

  • is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN).
  • use the trained model to generate forecasts for new time series that are similar to the ones it has been trained on.
  • supports learning complex patterns from multiple related time series simultaneously.

Clustering

K-means algorithm

  • is an unsupervised learning algorithm for clustering
  • attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups

Topic Modelling

Latent Dirichlet Allocation (LDA)

  • is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories.
  • used to discover a user-specified number of topics shared by documents within a text corpus.

Neural Topic Model (NTM)

  • is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution
  • Topic modeling can be used to classify or summarize documents based on the topics detected or to retrieve information or recommend content based on topic similarities.

Feature Reduction

Object2Vec

  • is a general-purpose neural embedding algorithm that is highly customizable
  • can learn low-dimensional dense embeddings of high-dimensional objects.
  • useful for duplicate detection, finding similar items, and relationship prediction.

Principal Component Analysis – PCA

  • is an unsupervised ML algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible.
  • projects data points onto the first few principal components (eigenvectors of the data’s covariance matrix).

Anomaly Detection

Random Cut Forest (RCF)

  • is an unsupervised algorithm for detecting anomalous data points within a data set.
  • detects data points that diverge from otherwise well-structured or patterned data.

IP Insights

  • is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses.
  • designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers
  • useful for detecting suspicious login attempts from anomalous IP addresses.

Computer Vision – CV

Image Classification – MXNet

  • a supervised learning algorithm that supports multi-label classification
  • takes an image as input and outputs one or more labels
  • uses a convolutional neural network (ResNet) that can be trained from scratch or trained using transfer learning when a large number of training images are not available.
  • recommended input format is Apache MXNet RecordIO. Also supports raw images in .jpg or .png format.

Image Classification – TensorFlow

  • is a supervised learning algorithm that supports transfer learning with many pretrained models from the TensorFlow Hub.
  • uses deep learning networks such as MobileNet, ResNet, Inception, and EfficientNet for image classification.
  • takes an image as input and outputs probability for each of the class labels.
  • supports fine-tuning pretrained models for specific image classification tasks.

Object Detection – MXNet

  • detects and classifies objects in images using a single deep neural network.
  • is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene.

Object Detection – TensorFlow

  • is a supervised learning algorithm that supports transfer learning with many pretrained models from the TensorFlow Model Garden.
  • takes an image as input and predicts bounding boxes and object labels.
  • uses deep learning networks such as MobileNet, ResNet, Inception, and EfficientNet for object detection.

Semantic Segmentation

  • provides a fine-grained, pixel-level approach to developing computer vision applications.
  • tags every pixel in an image with a class label from a predefined set of classes and is critical to an increasing number of CV applications, such as self-driving vehicles, medical imaging diagnostics, and robot sensing.
  • also provides information about the shapes of the objects contained in the image. The segmentation output is represented as a grayscale image, called a segmentation mask.

SageMaker JumpStart – Pre-trained Models

  • SageMaker JumpStart provides pre-trained foundation models, pre-built solution templates, and example notebooks for popular ML problem types.
  • Foundation models include large language models (LLMs) such as LLaMA, Falcon, BLOOM, FLAN-T5, Mistral, and GPT-J for generative AI tasks.
  • Supports 15+ problem types including:
    • Text Generation, Text Summarization, Question Answering
    • Text Embedding, Named Entity Recognition
    • Image Classification, Object Detection, Instance Segmentation
    • Tabular Classification, Tabular Regression
    • Machine Translation, Sentence Pair Classification
  • Models can be fine-tuned on custom datasets and deployed directly from SageMaker Studio.

SageMaker Autopilot (AutoML)

  • SageMaker Autopilot automatically explores different solutions to find the best model for your data.
  • Analyzes data, selects algorithms, preprocesses data, trains models, and performs hyperparameter optimization.
  • Supports classification, regression, and time-series forecasting problem types.
  • Available as a no-code/low-code option through SageMaker Canvas for business analysts.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. An Analytics team is leading an organization and wants to use anomaly detection to identify potential risks. What Amazon SageMaker AI machine learning algorithms are best suited for identifying anomalies?
    1. Semantic segmentation
    2. K-nearest neighbors
    3. Latent Dirichlet Allocation (LDA)
    4. Random Cut Forest (RCF)
  2. A ML specialist team works for a marketing consulting firm wants to
    apply different marketing strategies per segment of their customer base. Online retailer purchase history from the last 5 years is available, it has been decided to segment the customers based on their purchase history. Which type of machine learning algorithm would give you segmentation based on purchase history in the most expeditious manner?

    1. K-Nearest Neighbors (KNN)
    2. K-Means
    3. Semantic Segmentation
    4. Neural Topic Model (NTM)
  3. A ML specialist team is looking to improve the quality of searches for their library of documents that are uploaded in PDF, Rich Text Format, or ASCII text. It is looking to use machine learning to automate the identification of key topics for each of the documents. What machine learning resources are best suited for this problem? (Select TWO)
    1. BlazingText algorithm
    2. Latent Dirichlet Allocation (LDA) algorithm
    3. Topic Finder (TF) algorithm
    4. Neural Topic Model (NTM) algorithm
  4. A manufacturing company has a large set of labeled historical sales data. The company would like to predict how many units of a particular part should be produced each quarter. Which machine learning approach should be used to solve this problem?
    1. BlazingText algorithm
    2. Random Cut Forest (RCF)
    3. Principal component analysis (PCA)
    4. Linear regression
  5. An agency collects census information with responses for approximately 500 questions from each citizen. Which algorithm would help reduce the number of features?
    1. Factorization machines (FM) algorithm
    2. Latent Dirichlet Allocation (LDA) algorithm
    3. Principal component analysis (PCA) algorithm
    4. Random Cut Forest (RCF) algorithm
  6. A store wants to understand some characteristics of visitors to the store. The store has security video recordings from the past several years. The store wants to group visitors by hair style and hair color. Which solution will meet these requirements with the LEAST amount of effort?
    1. Object detection algorithm
    2. Latent Dirichlet Allocation (LDA) algorithm
    3. Random Cut Forest (RCF) algorithm
    4. Semantic segmentation algorithm
  7. A data scientist needs to build a model that can automatically classify product reviews as positive or negative. The dataset contains millions of labeled reviews. Which SageMaker AI built-in algorithm is MOST suitable for this text classification task with transfer learning?
    1. Sequence-to-Sequence (seq2seq)
    2. BlazingText in Word2Vec mode
    3. Text Classification – TensorFlow
    4. Neural Topic Model (NTM)
  8. A company wants to predict customer churn using a tabular dataset with both numerical and categorical features. The team wants an AutoML approach that automatically ensembles multiple models. Which SageMaker AI built-in algorithm should they use?
    1. XGBoost
    2. Linear Learner
    3. AutoGluon-Tabular
    4. Factorization Machines
  9. A team needs to detect objects in images and draw bounding boxes around them. They want to leverage pretrained models and use transfer learning. Which SageMaker AI algorithm should they choose?
    1. Image Classification – MXNet
    2. Semantic Segmentation
    3. Image Classification – TensorFlow
    4. Object Detection – TensorFlow
  10. A company has tabular data with many categorical features and wants a gradient-boosted trees algorithm that handles categorical features natively without manual encoding. Which algorithm is BEST suited?
    1. XGBoost
    2. LightGBM
    3. CatBoost
    4. Linear Learner

References

AWS Organizations Service Control Policies – SCPs

AWS Organizations Service Control Policies

  • AWS Organizations Service control policies – SCPs offer central control over the maximum available permissions for all of the accounts in the organization, ensuring member accounts stay within the organization’s access control guidelines.
  • are one type of policy that help manage the organization.
  • are available only in an organization that has all features enabled, and aren’t available if the organization has enabled only the consolidated billing features.
  • are NOT sufficient for granting access to the accounts in the organization.
  • defines a guardrail for what actions accounts within the organization root or OU can do, but IAM policies need to be attached to the users and roles in the organization’s accounts to grant permissions to them.
  • Effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.
  • with an SCP attached to member accounts, identity-based and resource-based policies grant permissions to entities only if those policies and the SCP allow the action.
  • don’t affect users or roles in the management account. They affect only the member accounts in your organization.
  • SCPs also apply to member accounts that are designated as delegated administrators.
  • work alongside Resource Control Policies (RCPs) and Declarative Policies to provide comprehensive preventive controls across an organization.

SCPs Effects on Permissions

  • never grant permissions but define the maximum permissions for the affected accounts.
  • Users and roles must still be granted permissions with appropriate IAM permission policies. A user without any IAM permission policies has no access at all, even if the applicable SCPs allow all services and all actions.
  • limits permissions for entities in member accounts, including each AWS account root user.
  • does not limit actions performed by the management account.
  • does not affect any service-linked role. Service-linked roles enable other AWS services to integrate with AWS Organizations and can’t be restricted by SCPs.
  • affect only IAM users or roles that are managed by accounts that are part of the organization. They don’t affect users or roles from accounts outside the organization.
  • don’t affect resource-based policies directly.
  • SCPs focus on identity-based (principal) permissions, while RCPs focus on resource-based permissions. Together they establish a comprehensive data perimeter.

SCPs Strategies

  • By default, an SCP named FullAWSAccess is attached to every root, OU, and account, which allows all actions and all services.
  • Blacklist or Deny Strategy
    • actions are allowed by default and services and actions to be prohibited need to be specified.
    • blacklist permissions using deny statements can be assigned in combination with the default FullAWSAccess SCP.
    • using deny statements in SCPs require less maintenance because they don’t need to be updated when AWS adds new services.
    • deny statements usually use less space, thus making it easier to stay within SCP size limits.
  • Whitelist or Allow Strategy
    • actions are prohibited by default, and you specify what services and actions are allowed.
    • whitelist permissions can be assigned, by removing the default FullAWSAccess SCP.
    • allows SCP that explicitly permits only those allowed services and actions

SCP Full IAM Policy Language Support

  • As of September 2025, SCPs now support the full IAM policy language, removing previous limitations.
  • Newly supported capabilities include:
    • Condition element in Allow statements – enables contextual boundaries like restricting by Region or account.
    • NotAction in Allow statements – allows specifying exempt actions.
    • Resource with specific ARNs in Allow statements – enables scoped resource access.
    • NotResource in both Allow and Deny statements – simplifies exceptions for service-owned resources.
    • Wildcards (*, ?) anywhere in Action/NotAction elements (e.g., "servicename:*action", "servicename:some*action").
  • These enhancements enable more precise, concise, and scalable policies without complex workarounds.
  • AWS recommends using explicit Deny statements as best practice and avoiding overlapping Allow statements.
  • Use IAM Access Analyzer to validate SCPs before applying them.

SCP Quotas (Updated May 2026)

  • Maximum SCP size: 10,240 characters (doubled from previous 5,120 limit in May 2026).
  • Maximum SCPs per node (root, OU, or account): 10 (increased from previous limit of 5).
  • Maximum SCPs in an organization: 2,000.
  • Maximum nesting depth of OUs: 5 levels.
  • These increased quotas are automatically available across all commercial, GovCloud, and China Regions with no request needed.

SCPs Testing Effects

  • don’t attach SCPs to the root of the organization without thoroughly testing the impact that the policy has on accounts.
  • Create an OU that the accounts can be moved into one at a time, or at least in small numbers, to ensure that users are not inadvertently locked out of key services.
  • Use IAM Access Analyzer policy validation and custom policy checks to verify SCP correctness before deployment.

Resource Control Policies (RCPs)

  • Resource Control Policies (RCPs), launched in November 2024, are a new authorization policy type in AWS Organizations.
  • RCPs set the maximum available permissions on resources within your organization, complementing SCPs which set maximum permissions on principals.
  • Help centrally establish a data perimeter by restricting external access to resources at scale.
  • RCPs are evaluated when resources are accessed, irrespective of who is making the API request.
  • Use Deny statements to restrict access (similar to SCPs).
  • A default RCPFullAWSAccess policy is automatically attached to every entity when RCPs are enabled.
  • RCPs don’t affect resources in the management account.
  • Supported services (expanding): Amazon S3, AWS STS, AWS KMS, Amazon SQS, AWS Secrets Manager, Amazon ECR, Amazon OpenSearch Serverless, Amazon Cognito, Amazon CloudWatch Logs, and more being added.
  • SCPs and RCPs have independent quotas — each RCP can have up to 5,120 characters, with up to 5 RCPs per node and 1,000 RCPs per organization.
  • Neither SCPs nor RCPs grant permissions — they only restrict the maximum available permissions.

SCP vs RCP Comparison

FeatureSCP (Service Control Policy)RCP (Resource Control Policy)
ControlsMaximum permissions for principals (IAM users/roles)Maximum permissions on resources
ScopeWhat principals can doWho can access resources
EvaluationEvaluated based on who is making the requestEvaluated when resources are accessed, regardless of requester
Management accountNot affectedNot affected
Default policyFullAWSAccessRCPFullAWSAccess
Max size10,240 characters5,120 characters
Max per node105

Declarative Policies

  • Declarative Policies, launched in December 2024, are a new management policy type in AWS Organizations.
  • Allow you to declare and enforce desired configuration for AWS services at scale across the organization.
  • Unlike SCPs/RCPs (which restrict API actions), declarative policies enforce the desired state of service attributes.
  • Once set, the configuration is maintained even as new features or APIs are added — no policy maintenance overhead.
  • Enforcement applies regardless of whether the action was invoked by an IAM role or a service-linked role.
  • Support custom error messages so end users see actionable guidance when actions are restricted.
  • Provide an account status report to assess current state before applying policies.
  • Supported service attributes (at launch — EC2, VPC, EBS):
    • Enforce IMDSv2 for EC2 instances
    • Block public access for Amazon EBS snapshots
    • Block public access for Amazon EC2 AMIs
    • Block public access for Amazon VPC (internet gateway control)
    • Allowed AMI image settings (restrict to trusted providers)
    • Serial console access control
  • Can be applied at organization, OU, or account level.
  • Manageable via AWS Organizations console, CLI, CloudFormation, or AWS Control Tower.

AWS Organizations Policy Types Summary

Policy TypePurposeMechanism
SCPsRestrict maximum permissions for principalsAllow/Deny API actions for IAM users and roles
RCPsRestrict maximum permissions on resourcesDeny external access to resources
Declarative PoliciesEnforce desired service configurationSet desired state for service attributes

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Your company is planning on setting up multiple accounts in AWS. The IT Security department has a requirement to ensure that certain services and actions are not allowed across all accounts. How would the system admin achieve this in the most EFFECTIVE way possible?
    1. Create a common IAM policy that can be applied across all accounts
    2. Create an IAM policy per account and apply them accordingly​
    3. Deny the services to be used across accounts by contacting AWS​ support
    4. Use AWS Organizations and Service Control Policies
  2. You are in the process of implementing AWS Organizations for your company. At your previous company, you saw an Organizations implementation go bad when an SCP (Service Control Policy) was applied at the root of the organization before being thoroughly tested. In what way can an SCP be properly tested and implemented?
    1. Back up your entire Organization to S3 and restore rollback and restore if something goes wrong
    2. The SCP must be verified with AWS before it is implemented to avoid any problems.
    3. Mirror your Organizational Unit in another region. Apply the SCP and test it. Once testing is complete, attach the SCP to the root of your organization.
    4. Create an Organizational Unit (OU). Attach the SCP to this new OU. Move your accounts in one at a time to ensure that you don’t inadvertently lock users out of key services.
  3. A security team wants to prevent any external AWS accounts from accessing their organization’s S3 buckets, regardless of what resource-based policies individual developers might configure. Which approach should they use?
    1. Apply an SCP to deny all S3 actions from external principals
    2. Use AWS Config rules to detect non-compliant bucket policies
    3. Apply a Resource Control Policy (RCP) that restricts S3 access to principals within the organization
    4. Configure S3 Block Public Access at the account level
  4. An organization needs to ensure that all EC2 instances launched across hundreds of accounts use IMDSv2, even if new APIs or features are added in the future. They also want end users to see a custom error message explaining why their configuration was blocked. What is the BEST solution?
    1. Create an SCP denying ec2:RunInstances without the IMDSv2 metadata condition
    2. Use AWS Config with auto-remediation to terminate non-compliant instances
    3. Apply a Declarative Policy for EC2 that enforces IMDSv2 with a custom error message
    4. Create a Lambda function triggered by CloudTrail to stop non-compliant instances
  5. A company wants to implement a data perimeter strategy that controls both which principals can perform actions AND who can access their AWS resources. Which combination of AWS Organizations policies provides the most comprehensive data perimeter?
    1. SCPs and AWS Config rules
    2. SCPs and Resource Control Policies (RCPs)
    3. SCPs and VPC endpoint policies only
    4. IAM permission boundaries and SCPs
  6. An administrator needs to restrict EC2 actions to only 3 specific AWS Regions for all accounts. Previously this required both an Allow and a separate Deny statement. With recent SCP enhancements, what is the simplified approach?
    1. Use a Deny statement with StringNotEquals condition on aws:RequestedRegion
    2. Use an Allow statement with a Condition element specifying aws:RequestedRegion
    3. Create separate SCPs per region and attach them to respective OUs
    4. Use declarative policies to block EC2 access outside specific regions

AWS Data Pipeline – ETL Workflow Orchestration

⚠️ AWS Data Pipeline – Maintenance Mode (No Longer Available to New Customers)

AWS closed new customer access to AWS Data Pipeline effective July 25, 2024. The service is now in maintenance mode — no new features or region expansions are planned.

Existing customers can continue to use the service as normal, but AWS recommends migrating to modern alternatives.

Recommended Migration Alternatives:

  • AWS Glue – Serverless data integration service for ETL, Apache Spark applications, and data orchestration with visual editors and notebooks.
  • AWS Step Functions – Serverless orchestration service for building workflows integrating 250+ AWS services with visual designer and JSON-based workflow definitions.
  • Amazon MWAA – Managed Apache Airflow service for end-to-end data pipeline orchestration with Python-based DAGs and 1,000+ pre-built operators.

See Migrating workloads from AWS Data Pipeline for detailed migration guidance.

AWS Data Pipeline

  • AWS Data Pipeline is a web service that makes it easy to automate and schedule regular data movement and data processing activities in AWS
  • helps define data-driven workflows
  • integrates with on-premises and cloud-based storage systems
  • helps quickly define a pipeline, which defines a dependent chain of data sources, destinations, and predefined or custom data processing activities
  • supports scheduling where the pipeline regularly performs processing activities such as distributed data copy, SQL transforms, EMR applications, or custom scripts against destinations such as S3, RDS, or DynamoDB.
  • ensures that the pipelines are robust and highly available by executing the scheduling, retry, and failure logic for the workflows as a highly scalable and fully managed service.

AWS Data Pipeline features

  • Distributed, fault-tolerant, and highly available
  • Managed workflow orchestration service for data-driven workflows
  • Infrastructure management service, as it will provision and terminate resources as required
  • Provides dependency resolution
  • Can be scheduled
  • Supports Preconditions for readiness checks.
  • Grants control over retries, including frequency and number
  • Native integration with S3, DynamoDB, RDS, EMR, EC2 and Redshift
  • Support for both AWS based and external on-premise resources

AWS Data Pipeline Concepts

Pipeline Definition

  • Pipeline definition helps the business logic to be communicated to the AWS Data Pipeline
  • Pipeline definition defines the location of data (Data Nodes), activities to be performed, the schedule, resources to run the activities, per-conditions, and actions to be performed

Pipeline Components, Instances, and Attempts

  • Pipeline components represent the business logic of the pipeline and are represented by the different sections of a pipeline definition.
  • Pipeline components specify the data sources, activities, schedule, and preconditions of the workflow
  • When AWS Data Pipeline runs a pipeline, it compiles the pipeline components to create a set of actionable instances and contains all the information needed to perform a specific task
  • Data Pipeline provides durable and robust data management as it retries a failed operation depending on frequency & defined number of retries

Task Runners

  • A task runner is an application that polls AWS Data Pipeline for tasks and then performs those tasks
  • When Task Runner is installed and configured,
    • it polls AWS Data Pipeline for tasks associated with activated pipelines
    • after a task is assigned to Task Runner, it performs that task and reports its status back to Pipeline.
  • A task is a discreet unit of work that the Pipeline service shares with a task runner and differs from a pipeline, which defines activities and resources that usually yields several tasks
  • Tasks can be executed either on the AWS Data Pipeline managed or user-managed resources.

Data Nodes

  • Data Node defines the location and type of data that a pipeline activity uses as source (input) or destination (output)
  • supports S3, Redshift, DynamoDB, and SQL data nodes

Databases

  • supports JDBC, RDS, and Redshift database

Activities

  • An activity is a pipeline component that defines the work to perform
  • Data Pipeline provides pre-defined activities for common scenarios like sql transformation, data movement, hive queries, etc
  • Activities are extensible and can be used to run own custom scripts to support endless combinations

Preconditions

  • Precondition is a pipeline component containing conditional statements that must be satisfied (evaluated to True) before an activity can run
  • A pipeline supports
    • System-managed preconditions
      • are run by the AWS Data Pipeline web service on your behalf and do not require a computational resource
      • Includes source data and keys check for e.g. DynamoDB data, table exists or S3 key exists or prefix not empty
    • User-managed preconditions
      • run on user defined and managed computational resources
      • Can be defined as Exists check or Shell command

Resources

  • A resource is a computational resource that performs the work that a pipeline activity specifies
  • supports AWS Data Pipeline-managed and self-managed resources
  • AWS Data Pipeline-managed resources include EC2 and EMR, which are launched by the Data Pipeline service only when they’re needed
  • Self managed on-premises resources can also be used, where a Task Runner package is installed which continuously polls the AWS Data Pipeline service for work to perform
  • Resources can run in the same region as their working data set or even on a region different than AWS Data Pipeline
  • Resources launched by AWS Data Pipeline are counted within the resource limits and should be taken into account

Actions

  • Actions are steps that a pipeline takes when a certain event like success, or failure occurs.
  • Pipeline supports SNS notifications and termination action on resources

Migration to Modern Alternatives

AWS recommends migrating existing Data Pipeline workloads to one of the following services based on your use case:

Migrate to AWS Glue

  • Best for serverless ETL workloads, Apache Spark-based processing, and data integration
  • Supports visual editors, notebooks, and crawlers for data discovery
  • Natively supports DynamoDB export/import (common Data Pipeline use case)
  • Includes data quality, sensitive data detection, and Data Catalog capabilities
  • Ideal when migrating pipelines built from pre-defined Data Pipeline templates (e.g., DynamoDB to S3 export)

Migrate to AWS Step Functions

  • Best for orchestrating multi-service workflows with visual designer
  • Integrates with 250+ AWS services and 11,000+ actions out-of-the-box
  • Uses Amazon States Language (JSON-based), similar to Data Pipeline’s JSON definitions
  • Cost-effective with per-task granularity pricing
  • Supports on-premises resources via AWS Systems Manager Run Command
  • Ideal for workloads requiring EC2, EMR, or Lambda orchestration

Migrate to Amazon MWAA (Managed Apache Airflow)

  • Best for complex workflow orchestration using Python-based DAGs
  • Provides 1,000+ pre-built operators covering AWS and non-AWS services
  • Rich UI for observability, restarts, backfills, and lineage tracking
  • Fully managed, open source (Apache Airflow) for maximum portability
  • Ideal for teams already using Airflow or needing advanced orchestration features

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  • Note: AWS Data Pipeline is in maintenance mode and unlikely to appear on newer exam versions. However, questions about data orchestration concepts may reference modern alternatives like AWS Glue, Step Functions, or Amazon MWAA.
  1. An International company has deployed a multi-tier web application that relies on DynamoDB in a single region. For regulatory reasons they need disaster recovery capability in a separate region with a Recovery Time Objective of 2 hours and a Recovery Point Objective of 24 hours. They should synchronize their data on a regular basis and be able to provision the web application rapidly using CloudFormation. The objective is to minimize changes to the existing web application, control the throughput of DynamoDB used for the synchronization of data and synchronize only the modified elements. Which design would you choose to meet these requirements?
    1. Use AWS data Pipeline to schedule a DynamoDB cross region copy once a day. Create a ‘Lastupdated’ attribute in your DynamoDB table that would represent the timestamp of the last update and use it as a filter. (Refer Blog Post)
    2. Use EMR and write a custom script to retrieve data from DynamoDB in the current region using a SCAN operation and push it to DynamoDB in the second region. (No Schedule and throughput control)
    3. Use AWS data Pipeline to schedule an export of the DynamoDB table to S3 in the current region once a day then schedule another task immediately after it that will import data from S3 to DynamoDB in the other region. (With AWS Data pipeline the data can be copied directly to other DynamoDB table)
    4. Send each item into an SQS queue in the second region; use an auto-scaling group behind the SQS queue to replay the write in the second region. (Not Automated to replay the write)

    Note: For new implementations, consider DynamoDB Global Tables for cross-region replication, or AWS Glue for scheduled ETL workloads.

  2. Your company produces customer commissioned one-of-a-kind skiing helmets combining nigh fashion with custom technical enhancements. Customers can show off their Individuality on the ski slopes and have access to head-up-displays, GPS rear-view cams and any other technical innovation they wish to embed in the helmet. The current manufacturing process is data rich and complex including assessments to ensure that the custom electronics and materials used to assemble the helmets are to the highest standards. Assessments are a mixture of human and automated assessments you need to add a new set of assessment to model the failure modes of the custom electronics using GPUs with CUD across a cluster of servers with low latency networking. What architecture would allow you to automate the existing process using a hybrid approach and ensure that the architecture can support the evolution of processes over time?
    1. Use AWS Data Pipeline to manage movement of data & meta-data and assessments. Use an auto-scaling group of G2 instances in a placement group. (Involves mixture of human assessments)
    2. Use Amazon Simple Workflow (SWF) to manage assessments, movement of data & meta-data. Use an autoscaling group of G2 instances in a placement group. (Human and automated assessments with GPU and low latency networking)
    3. Use Amazon Simple Workflow (SWF) to manage assessments movement of data & meta-data. Use an autoscaling group of C3 instances with SR-IOV (Single Root I/O Virtualization). (C3 and SR-IOV won’t provide GPU as well as Enhanced networking needs to be enabled)
    4. Use AWS data Pipeline to manage movement of data & meta-data and assessments use auto-scaling group of C3 with SR-IOV (Single Root I/O virtualization). (Involves mixture of human assessments)

    Note: For modern implementations, AWS Step Functions has largely replaced SWF for workflow orchestration including human-in-the-loop tasks. Current GPU instances include P4, P5, and G5 families.

References

AWS Trusted Advisor

Trusted Advisor Categories

AWS Trusted Advisor

  • Trusted Advisor continuously evaluates the AWS environment using best practice checks and provides recommendations for cloud cost optimization, performance, resilience, security, operational excellence, and service limits.
  • Trusted Advisor checks the following six categories
    • Cost Optimization
      • Recommendations that can potentially save money by highlighting unused resources and opportunities to reduce the bill.
      • Integrates with AWS Cost Optimization Hub (since May 2025) for more accurate, personalized cost savings recommendations that account for specific commercial terms (RIs, Savings Plans).
    • Security
      • Identification of security settings and gaps, inline with best practices, that could make the AWS solution less secure.
      • Integrates with AWS Security Hub CSPM (Cloud Security Posture Management) controls for comprehensive security findings.
    • Resilience (previously known as Fault Tolerance)
      • Recommendations that help increase the resiliency and availability of the AWS solution by highlighting redundancy shortfalls, current service limits, and over-utilized resources.
      • Integrates with AWS Resilience Hub for application resiliency assessments.
    • Performance
      • Recommendations that can help improve the speed and responsiveness of applications.
      • Includes checks from AWS Compute Optimizer for right-sizing recommendations.
    • Operational Excellence (Added Oct 2023)
      • Checks that help apply AWS best practices to operate the AWS environment effectively and at scale.
      • Supports the AWS Well-Architected Framework Review, accelerating alignment with best practices.
      • Powered by AWS Config managed rules for continuous evaluation.
    • Service Limits
      • Checks for service usage that is more than 80% of the service limit.
      • Values are based on a snapshot, so the current usage might differ.
      • Limit and usage data can take up to 24 hours to reflect any changes.
  • Trusted Advisor currently offers 482 total checks across 56 AWS services.
    • 56 checks are available to all AWS account plans (Basic and above).
    • 482 checks (full set) are available with Business Support+ and above.

Trusted Advisor Categories

AWS Support Plan Access

⚠️ AWS Support Plan Restructuring (Effective Jan 1, 2027)

AWS has announced a simplified support portfolio (Dec 2025). The following plans are being discontinued on January 1, 2027:

  • Developer Support — Discontinued Jan 1, 2027
  • Business Support — Discontinued Jan 1, 2027
  • Enterprise On-Ramp — Customers auto-upgraded to Enterprise Support throughout 2026

New support plans: Basic, Business Support+, Enterprise Support, and Unified Operations.

  • AWS Basic support plan provides access to:
    • All checks in the Service Limits category
    • Selected checks in the Security and Resilience (Fault Tolerance) categories
    • Manual refresh only (no automatic check updates)
  • AWS Business Support+ (replacing Developer and Business plans) includes:
    • Full set of 482 checks across all categories
    • AWS Support API provides programmatic access to manage Support cases and Trusted Advisor check requests
    • Automatic weekly refresh of checks
    • Amazon EventBridge integration for automated monitoring and remediation
    • Starts at $29/month minimum per account
  • AWS Enterprise Support and Unified Operations plans additionally include:
    • Trusted Advisor Priority — provides prioritized and context-driven recommendations from your AWS account team as well as machine-generated checks
    • Enterprise Support minimum reduced from $15,000 to $5,000
    • Unified Operations offers 5-minute response times for mission-critical workloads

Trusted Advisor Key Features

AWS Config Integration

  • Trusted Advisor integrates with AWS Config managed rules to deliver best practice checks.
  • 64 checks powered by AWS Config were added in October 2023, including the new Operational Excellence category.
  • Provides continuous evaluation of resource configurations against desired settings.
  • Requires AWS Config to be enabled in the account.

AWS Security Hub Integration

  • Security Hub CSPM (Cloud Security Posture Management) controls automatically appear as checks in Trusted Advisor.
  • Requires the Foundational Security Best Practices security standard to be enabled in Security Hub.
  • Requires Business Support+ or higher plan.
  • Provides a consolidated view of security findings across both services.

Cost Optimization Hub Integration

  • 16 new cost optimization checks integrated from AWS Cost Optimization Hub (May 2025).
  • Legacy cost optimization checks (e.g., Low Utilization EC2, Underutilized EBS) were deprecated September 2025.
  • New checks provide more accurate savings estimates accounting for specific commercial terms (RIs, Savings Plans).
  • Provides actionable recommendations including right-sizing, Graviton migration, and idle resource detection.
  • Requires opt-in to Cost Optimization Hub and AWS Compute Optimizer (both free).

Amazon EventBridge Integration

  • Trusted Advisor emits events to Amazon EventBridge when check status changes (WARN or ERROR).
  • Enables automated remediation workflows using EventBridge rules + Lambda functions.
  • Can schedule automatic check refreshes using EventBridge Scheduler.
  • Requires Business Support+ or higher plan.

Organizational View

  • Allows viewing Trusted Advisor checks for all accounts in AWS Organizations.
  • Generate consolidated reports with detailed check results across multiple accounts.
  • View high-level summary of check status within the console.
  • Helps optimize security posture, performance, and cost efficiency across multi-account environments.

Trusted Advisor Priority

  • Available to Enterprise Support and Unified Operations customers only.
  • Provides prioritized and context-driven recommendations from the AWS account team.
  • Combines machine-generated checks with human expertise.
  • Helps focus on the most important recommendations for cloud optimization, resilience, and security.
  • Integrates with operational workflows for actionable guidance.

AWS Support API

  • API provides two different groups of operations:
    • Support case management operations to manage the entire life cycle of AWS support cases, from creating a case to resolving it, and includes
      • Open a support case
      • Get a list and detailed information about recent support cases
      • Filter your search for support cases by dates and case identifiers, including resolved cases
      • Add communications and file attachments to cases, and add the email recipients for case correspondence
      • Resolve cases
    • AWS Trusted Advisor operations to access checks
      • Get the names and identifiers for the checks
      • Request that a check be run against the AWS account and resources
      • Get summaries and detailed information for check results
      • Refresh the checks
      • Get the status of each check
  • Requires Business Support+ or higher plan (previously Business/Enterprise On-Ramp/Enterprise).
  • Must use US East (N. Virginia) endpoint for Trusted Advisor API operations.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. The Trusted Advisor service provides insight regarding which categories of an AWS account?
    1. Security, fault tolerance, high availability, and connectivity
    2. Security, access control, high availability, and performance
    3. Performance, cost optimization, security, and fault tolerance (NoteTrusted Advisor now has 6 categories: Cost Optimization, Security, Resilience, Performance, Operational Excellence, and Service Limits)
    4. Performance, cost optimization, access control, and connectivity
  2. Which of the following are categories of AWS Trusted Advisor? (Select TWO.)
    1. Loose Coupling
    2. Disaster recovery
    3. Infrastructure as a Code
    4. Security
    5. Service limits
  3. Which AWS tool will identify security groups that grant unrestricted Internet access to a limited list of ports?
    1. AWS Organizations
    2. AWS Trusted Advisor
    3. AWS Usage Report
    4. Amazon EC2 dashboard
  4. A company wants to receive recommendations to optimize their AWS environment for cost, performance, security, and resilience. Which AWS service provides these recommendations?
    1. AWS Config
    2. AWS Security Hub
    3. AWS Trusted Advisor
    4. AWS Well-Architected Tool
  5. Which AWS Trusted Advisor category was added in October 2023, bringing the total to six categories?
    1. Governance
    2. Compliance
    3. Operational Excellence
    4. Sustainability
  6. A company wants to automate remediation when AWS Trusted Advisor identifies a security issue. Which AWS service integration should they use?
    1. AWS CloudTrail
    2. Amazon EventBridge
    3. Amazon CloudWatch Alarms
    4. AWS Systems Manager
  7. Which AWS Trusted Advisor feature provides prioritized recommendations from your AWS account team and is available only to Enterprise Support and Unified Operations customers?
    1. Trusted Advisor Organizational View
    2. Trusted Advisor Priority
    3. Trusted Advisor Notifications
    4. Trusted Advisor API
  8. A company needs to view Trusted Advisor recommendations for all accounts in their AWS Organization. Which feature should they use?
    1. Trusted Advisor Priority
    2. AWS Config Aggregator
    3. Trusted Advisor Organizational View
    4. AWS Security Hub cross-account

References

AWS Database Services Cheat Sheet – RDS, DynamoDB, Aurora

AWS Database Services Cheat Sheet

AWS Database Services

📋 Last Updated: June 2026

This cheat sheet has been updated to include Aurora DSQL, Aurora storage increase to 256 TiB, ElastiCache for Valkey, ElastiCache Serverless, Redshift Multi-AZ and Serverless, DynamoDB multi-Region strong consistency, zero-ETL integrations, RDS Multi-AZ DB Clusters with readable standbys, and RDS Extended Support.

Relational Database Service – RDS

  • provides Relational Database service
  • supports MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server, Amazon Aurora, and IBM Db2 (added in 2023) DB engines
  • as it is a managed service, shell (root ssh) access is not provided
  • manages backups, software patching, automatic failure detection, and recovery
  • supports use initiated manual backups and snapshots
  • daily automated backups with database transaction logs enables Point in Time recovery up to the last five minutes of database usage
  • snapshots are user-initiated storage volume snapshot of DB instance, backing up the entire DB instance and not just individual databases that can be restored as a independent RDS instance
  • RDS Security
    • support encryption at rest using KMS as well as encryption in transit using SSL endpoints
    • supports IAM database authentication, which prevents the need to store static user credentials in the database, because authentication is managed externally using IAM.
    • supports Encryption only during creation of an RDS DB instance
    • existing unencrypted DB cannot be encrypted and you need to create a snapshot, create an encrypted copy of the snapshot and restore as encrypted DB
    • supports Secrets Manager for storing and rotating secrets
    • for encrypted database
      • logs, snapshots, backups, read replicas are all encrypted as well
      • cross region replicas and snapshots are supported for encrypted instances
  • Multi-AZ deployment
    • provides high availability and automatic failover support and is NOT a scaling solution
    • maintains a synchronous standby replica in a different AZ
    • transaction success is returned only if the commit is successful both on the primary and the standby DB
    • Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Always On Availability Groups
    • snapshots and backups are taken from standby & eliminate I/O freezes
    • during automatic failover, its seamless and RDS switches to the standby instance and updates the DNS record to point to standby
    • failover can be forced with the Reboot with failover option
  • Multi-AZ DB Cluster (Readable Standbys)
    • provides a primary DB instance and two readable standby DB instances in different AZs
    • standby instances can serve read traffic, providing additional read capacity
    • uses semi-synchronous replication with transaction log-based replication
    • provides faster failover (typically under 35 seconds) compared to Multi-AZ instance deployment
    • supports MySQL and PostgreSQL engines
    • offers lower write latency compared to Multi-AZ instance deployments
  • Read Replicas
    • uses the PostgreSQL, MySQL, and MariaDB DB engines’ built-in replication functionality to create a separate Read Only instance
    • updates are asynchronously copied to the Read Replica, and data might be stale
    • can help scale applications and reduce read only load
    • requires automatic backups enabled
    • replicates all databases in the source DB instance
    • for disaster recovery, can be promoted to a full fledged database
    • can be created in a different region for disaster recovery, migration and low latency across regions
    • can’t create encrypted read replicas from unencrypted DB or read replica
  • RDS does not support all the features of underlying databases, and if required the database instance can be launched on an EC2 instance
  • RDS Components
    • DB parameter groups contains engine configuration values that can be applied to one or more DB instances of the same instance type for e.g. SSL, max connections etc.
    • Default DB parameter group cannot be modified, create a custom one and attach to the DB
    • Supports static and dynamic parameters
      • changes to dynamic parameters are applied immediately (irrespective of apply immediately setting)
      • changes to static parameters are NOT applied immediately and require a manual reboot.
  • RDS Monitoring & Notification
    • integrates with CloudWatch and CloudTrail
    • CloudWatch provides metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance
    • Performance Insights is a database performance tuning and monitoring feature that helps illustrate the database’s performance and help analyze any issues that affect it
    • supports RDS Event Notification which uses the SNS to provide notification when an RDS event like creation, deletion or snapshot creation etc occurs
  • RDS Blue/Green Deployments
    • creates a staging (green) environment that mirrors the production (blue) environment
    • enables safer database updates, major version upgrades, and schema changes with minimal downtime (under 5 seconds)
    • supports Aurora MySQL, Aurora PostgreSQL, RDS for MySQL, RDS for MariaDB, and RDS for PostgreSQL
    • now supports Aurora Global Database (2025)
  • RDS Extended Support
    • allows running databases on a major engine version up to 3 years past its RDS end of standard support date at an additional cost
    • provides critical security and bug fixes after the community ends support for a major version
    • databases are automatically enrolled if not upgraded before the end of standard support date
  • Zero-ETL Integrations
    • RDS for MySQL and Aurora support zero-ETL integration with Amazon Redshift
    • enables near real-time analytics on transactional data without building ETL pipelines
    • data is automatically replicated to Amazon Redshift within seconds of being written

⚠️ RDS Custom for Oracle – End of Support (March 31, 2027)

AWS will end support for Amazon RDS Custom for Oracle on March 31, 2027. After this date, you will no longer be able to access the RDS Custom for Oracle console or resources.

Migration Options: Migrate to Amazon RDS for Oracle (standard) or run Oracle on Amazon EC2 bare metal instances.

Aurora

  • is a relational database engine that combines the speed and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases
  • is a managed service and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair
  • is a proprietary technology from AWS (not open sourced)
  • provides PostgreSQL and MySQL compatibility
  • is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of PostgreSQL on RDS
  • scales storage automatically in increments of 10GB, up to 256 TiB (increased from 128 TiB in July 2025) with no impact to database performance. Storage is striped across 100s of volumes.
  • no need to provision storage in advance.
  • provides self-healing storage. Data blocks and disks are continuously scanned for errors and repaired automatically.
  • provides instantaneous failover
  • replicates each chunk of the database volume six ways across three Availability Zones i.e. 6 copies of the data across 3 AZ
    • requires 4 copies out of 6 needed for writes
    • requires 3 copies out of 6 need for reads
  • costs more than RDS (20% more) – but is more efficient
  • Read Replicas
    • can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
    • share the same data volume as the primary instance in the same AWS Region, there is virtually no replication lag
    • supports Automated failover for master in less than 30 seconds
    • supports Cross Region Replication using either physical or logical replication.
  • Security
    • supports Encryption at rest using KMS
    • supports Encryption in flight using SSL (same process as MySQL or Postgres)
    • Automated backups, snapshots and replicas are also encrypted
    • Possibility to authenticate using IAM token (same method as RDS)
    • supports protecting the instance with security groups
    • does not support SSH access to the underlying servers
  • Aurora I/O-Optimized
    • a cluster configuration that provides predictable pricing with no charges for I/O operations
    • ideal for I/O-intensive applications such as e-commerce, payment processing, and SaaS applications
    • can deliver up to 40% cost savings for I/O-intensive workloads
    • supports both Aurora Serverless and provisioned instances
    • can switch between I/O-Optimized and Standard configurations (once every 30 days to I/O-Optimized, back to Standard anytime)
  • Aurora Serverless
    • provides automated database instantiation and on-demand autoscaling based on actual usage
    • provides a relatively simple, cost-effective option for infrequent, intermittent, or unpredictable workloads
    • automatically starts up, shuts down, and scales capacity up or down based on the application’s needs. No capacity planning needed
    • Pay per second, can be more cost-effective
    • Aurora Serverless v1 reached end of life on March 31, 2025 – all clusters have been migrated to Aurora Serverless v2 (now simply called “Aurora Serverless”)
    • Aurora Serverless (v2) supports features like read replicas, Multi-AZ, Global Database, and logical replication that v1 did not
    • supports scale to zero capability and up to 30% better performance with smarter scaling (2026 enhancement)
  • Aurora Global Database
    • allows a single Aurora database to span multiple AWS regions.
    • provides Physical replication, which uses dedicated infrastructure that leaves the databases entirely available to serve the application
    • supports 1 Primary Region (read / write)
    • replicates across up to 5 secondary (read-only) regions, replication lag is less than 1 second
    • supports up to 16 Read Replicas per secondary region
    • recommended for low-latency global reads and disaster recovery with an RTO of < 1 minute
    • supports managed failover (Global Database Failover) which automates the cross-Region failover process, reducing operational overhead (introduced August 2023)
    • supports Blue/Green Deployments for Global Database (2025) for safer major version upgrades across all regions
    • supports a global writer endpoint for simplified application connectivity
  • Aurora Backtrack
    • Backtracking “rewinds” the DB cluster to the specified time
    • Backtracking performs in place restore and does not create a new instance. There is a minimal downtime associated with it.
  • Aurora Clone feature allows quick and cost-effective creation of Aurora Cluster duplicates
  • supports parallel or distributed query using Aurora Parallel Query, which refers to the ability to push down and distribute the computational load of a single query across thousands of CPUs in Aurora’s storage layer.
  • Aurora Optimized Reads
    • delivers up to 8x improved query latency for applications with datasets exceeding instance memory
    • uses local NVMe-based storage on Graviton-based instances to extend caching capacity
    • available for both PostgreSQL and MySQL compatible editions

Amazon Aurora DSQL (New – GA May 2025)

  • a serverless, distributed SQL database optimized for transaction processing
  • the fastest serverless distributed SQL database with active-active high availability
  • provides PostgreSQL compatibility (subset of features)
  • designed for 99.99% availability in single-Region and 99.999% availability in multi-Region configurations
  • delivers strong consistency for all reads and writes to any Regional endpoint
  • provides virtually unlimited scalability with zero infrastructure management and zero downtime maintenance
  • offers the fastest distributed SQL reads and writes with 4x faster reads and writes compared to other popular distributed SQL databases
  • employs an active-active deployment model where all database resources function as peers capable of handling both read and write traffic
  • supports up to 256 TiB of storage per database cluster
  • ideal for globally distributed applications requiring strong consistency, such as financial transactions, gaming, and SaaS applications

DynamoDB

  • fully managed NoSQL database service
  • synchronously replicates data across three facilities in an AWS Region, giving high availability and data durability
  • runs exclusively on SSDs to provide high I/O performance
  • provides provisioned table reads and writes
  • automatically partitions, reallocates, and re-partitions the data and provisions additional server capacity as data or throughput changes
  • creates and maintains indexes for the primary key attributes for efficient access to data in the table
  • DynamoDB Table classes currently support
    • DynamoDB Standard table class is the default and is recommended for the vast majority of workloads.
    • DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA) table class which is optimized for tables where storage is the dominant cost.
  • supports Secondary Indexes
    • allows querying attributes other than the primary key attributes without impacting performance.
    • are automatically maintained as sparse objects
  • Local secondary index vs Global secondary index
    • shares partition key + different sort key vs different partition + sort key
    • search limited to partition vs across all partition
    • unique attributes vs non-unique attributes
    • linked to the base table vs independent separate index
    • only created during the base table creation vs can be created later
    • cannot be deleted after creation vs can be deleted
    • consumes provisioned throughput capacity of the base table vs independent throughput
    • returns all attributes for item vs only projected attributes
    • Eventually or Strongly vs Only Eventually consistent reads
    • size limited to 10Gb per partition vs unlimited
  • DynamoDB Consistency
    • provides Eventually consistent (by default) or Strongly Consistent option to be specified during a read operation
    • supports Strongly consistent reads for a few operations like Query, GetItem, and BatchGetItem using the ConsistentRead parameter
  • DynamoDB Throughput Capacity
    • supports On-demand and Provisioned read/write capacity modes
    • Provisioned mode requires the number of reads and writes per second as required by the application to be specified
    • On-demand mode provides flexible billing option capable of serving thousands of requests per second without capacity planning
    • On-demand pricing reduced by 50% in November 2024
    • supports switching from provisioned to on-demand up to 4 times in a rolling 24-hour period (2025 improvement)
  • DynamoDB Auto Scaling helps dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns.
  • DynamoDB Adaptive capacity is a feature that enables DynamoDB to run imbalanced workloads indefinitely.
  • DynamoDB Global Tables
    • provide multi-active, cross-region replication capability of DynamoDB to support data access locality and regional fault tolerance for database workloads.
    • provide up to 99.999% availability
    • Multi-Region Strong Consistency (MRSC) – GA June 2025
      • enables applications to always read the latest version of data from any Region in a global table
      • provides zero RPO (Recovery Point Objective) for the highest application resilience
      • removes the need to manage consistency across multiple Regions manually
      • slightly higher write latencies compared to eventually consistent (MREC) mode
    • Global tables pricing reduced by up to 67% in November 2024
  • DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table
  • DynamoDB Time to Live (TTL)
    • enables a per-item timestamp to determine when an item expiry
    • expired items are deleted from the table without consuming any write throughput.
  • DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.
  • DynamoDB Triggers (just like database triggers) are a feature that allows the execution of custom actions based on item-level updates on a table.
  • VPC Gateway Endpoints provide private access to DynamoDB from within a VPC without the need for an internet gateway or NAT gateway.
  • DynamoDB Zero-ETL Integrations
    • Zero-ETL with Amazon Redshift (GA October 2024) – automatically replicates DynamoDB tables into Redshift for SQL analytics without building ETL pipelines
    • Zero-ETL with Amazon OpenSearch Service – provides seamless, code-free data replication for vector search and near real-time analytics
    • enables analytics on DynamoDB data without impacting production workload performance

ElastiCache

  • managed web service that provides in-memory caching to deploy and run Valkey, Redis OSS, or Memcached protocol-compliant cache clusters
  • ElastiCache for Valkey (Recommended – default since October 2024)
    • Valkey is an open-source fork of Redis OSS 7.2, maintained by the Linux Foundation with contributions from AWS, Google, Microsoft, and others
    • is a drop-in replacement for Redis OSS – supports the same data structures, commands, and protocols
    • all features available with Redis OSS 7.2 are available in Valkey 7.2 and above
    • AWS recommends Valkey for new deployments and offers migration paths from existing Redis OSS clusters
    • like Redis OSS, supports Multi-AZ, Read Replicas and Snapshots
    • supports cluster mode for horizontal scaling
  • ElastiCache with Redis OSS
    • available up to version 7.1 (the last BSD-licensed release); now a maintenance track with no active new feature development from AWS
    • Redis 8.0+ is licensed under AGPLv3, which is not supported by ElastiCache
    • Standard support for versions 4 and 5 ends January 31, 2026; clusters will be enrolled in Extended Support after that date
    • like RDS, supports Multi-AZ, Read Replicas and Snapshots
    • Read Replicas are created across AZ within same region using Redis’s asynchronous replication technology
    • Multi-AZ differs from RDS as there is no standby, but if the primary goes down a Read Replica is promoted as primary
    • allows snapshots for backup and restore
    • AOF can be enabled for recovery scenarios, to recover the data in case the node fails or service crashes. But it does not help in case the underlying hardware fails
    • Enabling Redis Multi-AZ as a Better Approach to Fault Tolerance
  • ElastiCache with Memcached
    • can be scaled up by increasing size and scaled out by adding nodes
    • nodes can span across multiple AZs within the same region
    • cached data is spread across the nodes, and a node failure will always result in some data loss from the cluster
    • supports auto discovery
    • every node should be homogenous and of same instance type
  • ElastiCache Valkey/Redis vs Memcached
    • complex data objects vs simple key value storage
    • persistent vs non persistent, pure caching
    • automatic failover with Multi-AZ vs Multi-AZ not supported
    • scaling using Read Replicas vs using multiple nodes
    • backup & restore supported vs not supported
  • ElastiCache Serverless (launched November 2023)
    • creates a cache in under a minute with zero capacity planning
    • instantly scales capacity based on application traffic patterns
    • provides zero infrastructure management and zero downtime maintenance
    • supports Valkey 7.2+, Redis OSS 7.0+, and Memcached 1.6+
    • pay-per-use pricing based on data stored and requests executed
    • automatically provisions resources across multiple AZs for high availability
  • can be used for state management to keep the web application stateless

Redshift

  • fully managed, fast and powerful, petabyte scale data warehouse service
  • uses replication and continuous backups to enhance availability and improve data durability and can automatically recover from node and component failures
  • provides Massive Parallel Processing (MPP) by distributing & parallelizing queries across multiple physical resources
  • columnar data storage improving query performance and allowing advance compression techniques
  • now supports Multi-AZ deployments for RA3 clusters (GA 2024), running the data warehouse in two AZs simultaneously with 99.99% SLA
  • spot instances are NOT an option
  • Redshift Serverless
    • enables running and scaling analytics without provisioning or managing clusters
    • automatically scales compute up or down based on workload demands
    • AI-driven scaling and optimization (default for new workgroups since April 2026) uses machine learning to predict compute needs and automatically adjust resources
    • offers minimum capacity as low as 4 RPUs for cost-effective development workloads
    • supports Serverless Reservations (2025) for discounted pricing and cost predictability
    • pay-as-you-go pricing based on compute used
  • Zero-ETL Integrations
    • supports zero-ETL from Aurora MySQL, Aurora PostgreSQL, RDS for MySQL, DynamoDB, and self-managed databases
    • automatically replicates data from source to Redshift without building ETL pipelines
    • enables near real-time analytics on transactional data
  • Enhanced Security Defaults (2025)
    • new clusters default to public accessibility disabled, encryption enabled, and secure connections enforced

IAM Roles vs Resource-Based Policies – Comparison

AWS IAM Roles vs Resource-Based Policies

AWS allows granting cross-account access to AWS resources, which can be done using IAM Roles or Resource-Based Policies. Understanding the differences between these two mechanisms is critical for designing secure, multi-account architectures.

Cross-Account Access Methods

  • AWS provides four primary ways to grant cross-account access using resource-based policies:
    • Method 1: Grant access to a specific IAM role using the Principal element (most granular, but role deletion breaks access)
    • Method 2: Grant access to an entire account using the Principal element (delegates access control to the other account)
    • Method 3: Grant access to a specific IAM role using the aws:PrincipalArn condition key (balanced approach — survives role recreation)
    • Method 4: Grant access to an entire AWS Organizations organization using aws:PrincipalOrgId condition key
  • AWS recommends using IAM roles with temporary credentials for cross-account access instead of IAM users with long-term credentials (access keys).

IAM Roles

  • Roles can be created to act as a proxy to allow users or services to access resources.
  • Roles support
    • trust policy which helps determine who can access the resources and
    • permission policy which helps to determine what they can access.
  • Users who assume a role temporarily give up their own permissions and instead take on the permissions of the role. The original user permissions are restored when the user exits or stops using the role.
  • Roles can be used to provide access to almost all the AWS resources.
  • Permissions provided to the User through the Role can be further restricted per user by passing an optional session policy to the STS request. This session policy cannot be used to elevate privileges beyond what the assumed role is allowed to access.
  • When a role ARN is specified in a resource-based policy’s Principal element, AWS maps it to the role’s unique ID. If the role is deleted and recreated with the same name, the new role will have a different unique ID and will not have access — this is an intentional security feature.
  • Using the aws:PrincipalArn condition key in resource-based policies (instead of specifying the role in the Principal element) allows access to survive role recreation, as the condition compares by ARN string rather than unique ID.

IAM Roles Anywhere

  • IAM Roles Anywhere extends the short-term credential model beyond the cloud, allowing on-premises and multi-cloud workloads to authenticate using X.509 certificates issued by your existing PKI (Public Key Infrastructure).
  • Eliminates the need for long-term access keys for on-premises workloads.
  • Supports credentials valid for up to 12 hours (extended from the original shorter duration).
  • Integrates with enterprise PKI so non-AWS workloads can use the same IAM policies and roles as AWS workloads.
  • Use cases include on-premises Kubernetes clusters, CI/CD pipelines running outside AWS, and hybrid cloud environments.

Confused Deputy Prevention

  • The confused deputy problem is a security issue where a less-privileged entity coerces a more-privileged service to perform actions on its behalf.
  • AWS recommends using the following global condition context keys in role trust policies and resource-based policies:
    • aws:SourceArn — restrict to a specific resource ARN (most effective)
    • aws:SourceAccount — restrict to a specific AWS account
    • aws:SourceOrgID — restrict to an AWS Organizations organization
    • aws:SourceOrgPaths — restrict to specific organizational units
  • These condition keys should always be used when granting service principals access to your resources.

Resource-based Policies

  • Resource-based policy allows you to attach a policy directly to the resource you want to share, instead of using a role as a proxy.
  • Resource-based policy specifies the Principal, in the form of a list of AWS account ID numbers, IAM role ARNs, or IAM user ARNs, that can access that resource and what actions they can perform.
  • Using cross-account access with a resource-based policy, the User still works in the trusted account and does not have to give up their permissions in place of the role permissions.
  • Users can work on the resources from both accounts at the same time and this can be useful for scenarios e.g. copying objects from one bucket to the other bucket in a different AWS account.
  • For same-account access, policy evaluation requires either the identity-based policy or the resource-based policy (but not both) to allow the request. For cross-account access, both an identity-based policy in the principal’s account and the resource-based policy on the resource must allow the request.
  • Resources that support resource-based policies include (but are not limited to):
    • Amazon S3 — Bucket policies for bucket and object access
    • Amazon SNS (Simple Notification Service)
    • Amazon SQS (Simple Queue Service)
    • Amazon S3 Glacier — Vault access policies
    • AWS Lambda — Function policies
    • AWS KMS — Key policies (required for KMS, every key must have one)
    • Amazon DynamoDB — Table, index, and stream policies (added 2024)
    • AWS Secrets Manager — Secret resource policies
    • Amazon EventBridge — Event bus policies
    • AWS Backup — Vault access policies
    • Amazon ECR — Repository policies
    • AWS CodeArtifact — Domain and repository policies
    • Amazon Bedrock AgentCore — Runtime and endpoint policies
  • Resource-based policies need the trusted account to create users with permissions to be able to access the resources from the trusted account.
  • Only permissions equivalent to, or less than, the permissions granted to your account by the resource owning account can be delegated.

Resource Control Policies (RCPs)

  • Resource Control Policies (RCPs) are a new type of authorization policy in AWS Organizations, launched at re:Invent 2024.
  • RCPs provide central control over the maximum available permissions on AWS resources across your entire organization.
  • RCPs complement Service Control Policies (SCPs):
    • SCPs — set maximum permissions for IAM principals (users and roles)
    • RCPs — set maximum permissions for AWS resources
  • RCPs help establish a data perimeter by centrally restricting external access to your resources at scale.
  • Supported services include: Amazon S3, AWS STS, AWS KMS, Amazon SQS, AWS Secrets Manager, Amazon Cognito, and Amazon CloudWatch Logs (expanding).
  • RCPs are applied organization-wide through AWS Organizations and can be attached to the organization root, OUs, or individual accounts.
  • AWS Sign-in now supports both resource-based policies and RCPs for the AWS Management Console, enabling restriction of console sign-in to expected networks.

AWS Resource Access Manager (RAM)

  • AWS RAM enables you to share resources with other AWS accounts or within your AWS Organization without using resource-based policies directly.
  • RAM eliminates the need to provision and manage duplicate resources in every account.
  • When sharing a resource, the receiving account’s IAM policies and permissions apply to the shared resource.
  • Supported resources include: VPC subnets, Transit Gateway, Route 53 Resolver rules, License Manager configurations, Aurora DB clusters, and many more.
  • RAM integrates with AWS Organizations to enable sharing without requiring individual account acceptance.

IAM Roles vs Resource-Based Policies – Key Differences

  • Permission Delegation: With IAM roles, the user gives up their original permissions and takes on role permissions. With resource-based policies, the user retains their original permissions.
  • Simultaneous Access: Resource-based policies allow users to work with resources in both accounts simultaneously. Roles do not.
  • Coverage: IAM roles can provide access to almost all AWS resources. Resource-based policies are limited to services that support them.
  • Session Policies: IAM roles support session policies for further restricting permissions. Resource-based policies do not support this concept.
  • Policy Evaluation: For cross-account access via roles, only the role’s identity-based policy determines effective permissions. For cross-account access via resource-based policies, both the caller’s identity-based policy and the resource policy must allow the action.

Best Practices for Cross-Account Access

  • Use IAM roles with temporary credentials instead of IAM users with long-term access keys.
  • Use the aws:PrincipalArn condition key in resource-based policies for a balance of security and availability.
  • Use the aws:PrincipalOrgId condition key to restrict access to your AWS Organization.
  • Use External ID in trust policies when granting access to third parties to prevent confused deputy attacks.
  • Implement the principle of least privilege in all cross-account policies.
  • Use RCPs to enforce organization-wide data perimeters on resources.
  • Regularly audit cross-account access using IAM Access Analyzer.
  • Consider using IAM Identity Center (formerly AWS SSO) with permission sets for centralized multi-account access management.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. What are the two permission types used by AWS?
    1. Resource-based and Product-based
    2. Product-based and Service-based
    3. Service-based
    4. User-based and Resource-based
  2. What’s the policy used for cross-account access? (Choose 2)
    1. Trust policy
    2. Permissions Policy
    3. Key policy
  3. A company has two AWS accounts – Account A and Account B. Account A has an S3 bucket that Account B needs to access. The security team wants to ensure that if the IAM role in Account B is accidentally deleted and recreated, access is maintained. Which approach should be used in the bucket policy?
    1. Specify the IAM role ARN in the Principal element
    2. Specify the account number in the Principal element with an aws:PrincipalArn condition
    3. Specify the account number in the Principal element without any condition
    4. Use a service control policy
  4. An organization wants to centrally restrict external access to their AWS resources across all accounts. Which policy type should they use?
    1. Service Control Policies (SCPs)
    2. Identity-based policies
    3. Resource Control Policies (RCPs)
    4. Permission boundaries
  5. A developer needs to copy objects from an S3 bucket in Account A to an S3 bucket in Account B, and needs to access both buckets simultaneously. Which cross-account access method should be used?
    1. IAM Role in Account A
    2. IAM Role in Account B
    3. Resource-based policy on the S3 bucket in Account A
    4. AWS Resource Access Manager
  6. Which condition keys should be used to prevent the confused deputy problem when granting a service principal access to your resources? (Choose 2)
    1. aws:SourceArn
    2. aws:SourceAccount
    3. aws:PrincipalOrgId
    4. aws:RequestedRegion
  7. An on-premises server needs to access AWS resources using temporary credentials without managing long-term access keys. Which service should be used?
    1. AWS STS AssumeRole
    2. IAM User with MFA
    3. IAM Roles Anywhere
    4. AWS Directory Service

References

AWS Simple Notification Service – SNS

SNS Delivery Protocols

Simple Notification Service – SNS

  • Simple Notification Service – SNS is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients.
  • SNS provides the ability to create a Topic which is a logical access point and communication channel.
  • Each topic has a unique name that identifies the SNS endpoint for publishers to post messages and subscribers to register for notifications.
  • Producers and Consumers communicate asynchronously with subscribers by producing and sending a message on a topic.
  • Producers push messages to the topic, they created or have access to, and SNS matches the topic to a list of subscribers who have subscribed to that topic and delivers the message to each of those subscribers.
  • Subscribers receive all messages published to the topics to which they subscribe, and all subscribers to a topic receive the same messages.
  • Subscribers (i.e., web servers, email addresses, SQS queues, AWS Lambda functions) consume or receive the message or notification over one of the supported protocols (i.e., SQS, HTTP/S, email, SMS, Lambda) when they are subscribed to the topic.
  • SNS supports two types of topics:
    • Standard topics – provide best-effort message ordering and at-least-once delivery. Support up to 100,000 topics and 12.5 million subscriptions per topic.
    • FIFO topics – provide strict message ordering, exactly-once message delivery, and message deduplication. Support up to 1,000 topics and 100 subscriptions per topic.

SNS Delivery Protocols

Accessing SNS

  • Amazon Management console
    • Amazon Management console is the web-based user interface that can be used to manage SNS
  • AWS Command-line Interface (CLI)
    • Provides commands for a broad set of AWS products, and is supported on Windows, Mac, and Linux.
  • AWS Tools for Windows Powershell
    • Provides commands for a broad set of AWS products for those who script in the PowerShell environment
  • AWS SNS Query API
    • Query API allows for requests are HTTP or HTTPS requests that use the HTTP verbs GET or POST and a Query parameter named Action
  • AWS SDK libraries
    • AWS provides libraries in various languages which provide basic functions that automate tasks such as cryptographically signing your requests, retrying requests, and handling error responses

SNS Supported Transport Protocols

  • HTTP, HTTPS – Subscribers specify a URL as part of the subscription registration; notifications will be delivered through an HTTP POST to the specified URL.
  • Email, Email-JSON – Messages are sent to registered addresses as email. Email-JSON sends notifications as a JSON object, while Email sends text-based email.
  • SQS – Users can specify an SQS queue as the endpoint; SNS will enqueue a notification message to the specified queue (which subscribers can then process using SQS APIs such as ReceiveMessage, DeleteMessage, etc.)
  • SMS – Messages are sent to registered phone numbers as SMS text messages.
    • Note: As of September 2024, Amazon SNS delivers SMS text messages via AWS End User Messaging. Existing SNS SMS APIs continue to work, but new phone numbers requested after Sept 24, 2024 require explicit permissions to be granted to Amazon SNS.
  • Lambda – SNS can invoke Lambda functions with the payload of the published message.
  • Amazon Data Firehose – Deliver events to delivery streams for archiving and analysis purposes (formerly known as Kinesis Data Firehose, renamed Feb 2024).

SNS Supported Endpoints

  • Email Notifications
    • SNS provides the ability to send Email notifications
  • Mobile Push Notifications
    • SNS provides an ability to send push notification messages directly to apps on mobile devices. Push notification messages sent to a mobile endpoint can appear in the mobile app as message alerts, badge updates, or even sound alerts
    • Supported push notification services
      • Amazon Device Messaging (ADM)
      • Apple Push Notification Service (APNs)
      • Firebase Cloud Messaging (FCM) – previously Google Cloud Messaging (GCM), which was deprecated April 2019. SNS added FCM HTTP v1 API support in January 2024. The legacy FCM API was removed by Google in June 2024.
      • Windows Push Notification Service (WNS) for Windows 8+ and Windows Phone 8.1+
      • Baidu Cloud Push for Android devices in China
    • Note: Microsoft Push Notification Service (MPNS) for Windows Phone 7+ has been deprecated and is no longer supported.
  • SQS Queues
    • SNS with SQS provides the ability for messages to be delivered to applications that require immediate notification of an event, and also persist in an SQS queue for other applications to process at a later time
    • SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism, eliminating the need to periodically check or “poll” for updates.
    • SQS can be used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components, without requiring each component to be concurrently available.
  • SMS Notifications
    • SNS provides the ability to send and receive Short Message Service (SMS) notifications to SMS-enabled mobile phones and smart phones
    • SMS delivery is now handled through AWS End User Messaging, providing enhanced features like SMS resource management, two-way messaging, granular resource permissions, and country block rules.
  • HTTP/HTTPS Endpoints
    • SNS provides the ability to send notification messages to one or more HTTP or HTTPS endpoints. When you subscribe an endpoint to a topic, you can publish a notification to the topic and Amazon SNS sends an HTTP POST request delivering the contents of the notification to the subscribed endpoint
  • Lambda
    • SNS and Lambda are integrated so Lambda functions can be invoked with SNS notifications.
    • When a message is published to an SNS topic that has a Lambda function subscribed to it, the Lambda function is invoked with the payload of the published message
  • Amazon Data Firehose
    • Deliver events to delivery streams for archiving and analysis purposes.
    • Through delivery streams, events can be delivered to AWS destinations like S3, Redshift, and OpenSearch Service, or to third-party destinations such as Datadog, New Relic, MongoDB, and Splunk.
    • Note: Amazon Kinesis Data Firehose was renamed to Amazon Data Firehose in February 2024.

SNS FIFO Topics

  • SNS FIFO (First-In-First-Out) topics provide strict message ordering and exactly-once message delivery combined with deduplication.
  • Message Ordering – Messages are delivered in the exact order in which they are published to the topic, using message group IDs.
  • Message Deduplication – Prevents duplicate messages from being delivered within a 5-minute deduplication interval using either content-based deduplication or a deduplication ID.
  • Supported Subscriptions – FIFO topics can only deliver messages to SQS FIFO queues.
  • Message Filtering – FIFO topics support the same subscription filter policies as standard topics.
  • Message Archiving and Replay (launched Oct 2023) – Topic owners can set an archive policy with retention up to 365 days. Subscribers can set a replay policy to retrieve and redeliver archived messages using timestamps.
  • High Throughput Mode – Supports higher message throughput per message group with the FifoThroughputScope attribute.
  • Use Cases – Bank transaction logging, stock monitoring, flight tracking, inventory management, price update applications.

SNS Message Filtering

  • SNS message filtering allows subscribers to receive only a subset of messages published to a topic by setting subscription filter policies.
  • Attribute-based filtering – Filter messages based on message attributes (original capability).
  • Payload-based filtering (launched Nov 2022) – Filter messages based on message body content, enabling filtering of events from 60+ AWS services that publish to SNS without message attributes.
  • Filter policy scope can be set to MessageAttributes or MessageBody.
  • Total combination of values in a filter policy must not exceed 150.
  • If no filter policy is set, the subscriber receives all messages published to the topic.

SNS Message Security and Encryption

  • Server-Side Encryption (SSE) – SNS supports encryption at rest using AWS KMS. Messages are stored in encrypted form and only decrypted when delivered.
  • Only the message body is encrypted; message attributes, resource metadata, and metrics remain unencrypted.
  • All requests to SNS topics with SSE activated must use HTTPS and Signature Version 4.
  • In-transit encryption – All SNS API requests use HTTPS with TLS 1.2 or later recommended.

SNS Dead-Letter Queues

  • SNS supports dead-letter queues (DLQ) for capturing messages that cannot be delivered to subscribed endpoints.
  • Messages that fail delivery due to client errors or server errors are held in the DLQ for further analysis or reprocessing.
  • A DLQ is an Amazon SQS queue attached to an SNS subscription (not the topic itself).
  • Useful for debugging and recovering from delivery failures.

SNS Message Batching

  • The PublishBatch API allows publishing up to 10 messages in a single API request.
  • Reduces the number of API calls required for high-volume publishers.
  • Supports both standard and FIFO topics.

SNS Cross-Region Delivery

  • SNS supports cross-region delivery of messages to SQS queues and Lambda functions in other AWS Regions.
  • As of July 2025, SNS enhanced cross-region delivery capabilities to support delivery from default-enabled Regions to opt-in Regions.

SNS Message Data Protection

⚠️ Feature No Longer Available to New Customers

Amazon SNS message data protection is no longer available to new customers effective April 30, 2026.

Existing customers with configured data protection policies can continue to use the feature, but no new enhancements will be introduced.

Recommended Alternative: An AWS Lambda-based architecture using Amazon Bedrock Guardrails for real-time sensitive data detection and protection. See the AWS Samples repository for implementation guidance.

  • SNS message data protection could scan messages in real time for PII/PHI data and provide audit reports.
  • Supported operations: Audit (log sensitive data findings), Deny (block messages with sensitive data), and Redact (mask sensitive data).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which of the following notification endpoints or clients does Amazon Simple Notification Service support? Choose 2 answers
    1. Email
    2. CloudFront distribution
    3. File Transfer Protocol
    4. Short Message Service
    5. Simple Network Management Protocol
  2. What happens when you create a topic on Amazon SNS?
    1. The topic is created, and it has the name you specified for it.
    2. An ARN (Amazon Resource Name) is created
    3. You can create a topic on Amazon SQS, not on Amazon SNS.
    4. This question doesn’t make sense.
  3. A user has deployed an application on his private cloud. The user is using his own monitoring tool. He wants to configure that whenever there is an error, the monitoring tool should notify him via SMS. Which of the below mentioned AWS services will help in this scenario?
    1. None because the user infrastructure is in the private cloud/
    2. AWS SNS
    3. AWS SES
    4. AWS SMS
  4. A user wants to make so that whenever the CPU utilization of the AWS EC2 instance is above 90%, the redlight of his bedroom turns on. Which of the below mentioned AWS services is helpful for this purpose?
    1. AWS CloudWatch + AWS SES
    2. AWS CloudWatch + AWS SNS
    3. It is not possible to configure the light with the AWS infrastructure services
    4. AWS CloudWatch and a dedicated software turning on the light
  5. A user is trying to understand AWS SNS. To which of the below mentioned end points is SNS unable to send a notification?
    1. Email JSON
    2. HTTP
    3. AWS SQS
    4. AWS SES
  6. A user is running a webserver on EC2. The user wants to receive the SMS when the EC2 instance utilization is above the threshold limit. Which AWS services should the user configure in this case?
    1. AWS CloudWatch + AWS SES
    2. AWS CloudWatch + AWS SNS
    3. AWS CloudWatch + AWS SQS
    4. AWS EC2 + AWS CloudWatch
  7. A user is planning to host a mobile game on EC2 which sends notifications to active users on either high score or the addition of new features. The user should get this notification when he is online on his mobile device. Which of the below mentioned AWS services can help achieve this functionality?
    1. AWS Simple Notification Service
    2. AWS Simple Queue Service
    3. AWS Mobile Communication Service
    4. AWS Simple Email Service
  8. You are providing AWS consulting service for a company developing a new mobile application that will be leveraging amazon SNS push for push notifications. In order to send direct notification messages to individual devices each device registration identifier or token needs to be registered with SNS, however the developers are not sure of the best way to do this. You advise them to: –
    1. Bulk upload the device tokens contained in a CSV file via the AWS Management Console
    2. Let the push notification service (e.g. Amazon Device messaging) handle the registration
    3. Implement a token vending service to handle the registration
    4. Call the CreatePlatformEndpoint API function to register multiple device tokens. (Refer documentation)
  9. A company is running a batch analysis every hour on their main transactional DB running on an RDS MySQL instance to populate their central Data Warehouse running on Redshift. During the execution of the batch their transactional applications are very slow. When the batch completes they need to update the top management dashboard with the new data. The dashboard is produced by another system running on-premises that is currently started when a manually-sent email notifies that an update is required The on-premises system cannot be modified because is managed by another team. How would you optimize this scenario to solve performance issues and automate the process as much as possible?
    1. Replace RDS with Redshift for the batch analysis and SNS to notify the on-premises system to update the dashboard
    2. Replace RDS with Redshift for the batch analysis and SQS to send a message to the on-premises system to update the dashboard
    3. Create an RDS Read Replica for the batch analysis and SNS to notify the on-premises system to update the dashboard
    4. Create an RDS Read Replica for the batch analysis and SQS to send a message to the on-premises system to update the dashboard.
  10. Which of the following are valid SNS delivery transports? Choose 2 answers.
    1. HTTP
    2. UDP
    3. SMS
    4. DynamoDB
    5. Named Pipes
  11. What is the format of structured notification messages sent by Amazon SNS?
    1. An XML object containing MessageId, UnsubscribeURL, Subject, Message and other values
    2. An JSON object containing MessageId, DuplicateFlag, Message and other values
    3. An XML object containing MessageId, DuplicateFlag, Message and other values
    4. An JSON object containing MessageId, unsubscribeURL, Subject, Message and other values
  12. Which of the following are valid arguments for an SNS Publish request? Choose 3 answers.
    1. TopicArn
    2. Subject
    3. Destination
    4. Format
    5. Message
    6. Language
  13. A company requires strict message ordering for their financial transaction processing system. Which SNS feature should they use?
    1. Standard topics with message attributes
    2. FIFO topics with message group IDs
    3. Standard topics with delivery policies
    4. FIFO topics with dead-letter queues only
  14. An application publishes thousands of events per second to an SNS topic. Subscribers only need to process events matching specific criteria. What is the most efficient approach?
    1. Have each subscriber receive all messages and filter locally
    2. Create separate topics for each message type
    3. Use SNS subscription filter policies to deliver only matching messages
    4. Use SQS queues with consumer-side filtering
  15. Which of the following statements about SNS FIFO topics are correct? Choose 2 answers.
    1. FIFO topics provide exactly-once message delivery
    2. FIFO topics support delivery to HTTP/HTTPS endpoints
    3. FIFO topics can deliver to up to 12.5 million subscriptions
    4. FIFO topics support message archiving and replay
    5. FIFO topics can deliver to Lambda functions directly
  16. A development team needs to filter SNS messages based on message body content from S3 event notifications. Which feature should they use?
    1. Message attributes filtering with attribute-based scope
    2. Payload-based message filtering with MessageBody scope
    3. Lambda function to filter before forwarding
    4. SQS message filtering

References

AWS EBS Performance

AWS EBS Performance Tips

  • EBS Performance depends on several factors including I/O characteristics, instances and volumes configuration and can be improved using Provisioned IOPS (io2 Block Express), EBS-Optimized instances, proper volume type selection, and RAID configuration.

📢 Key Updates (2025-2026)

  • gp3 volumes enhanced (Sept 2025) – Now support up to 64 TiB size (4x increase), 80,000 IOPS (5x increase), and 2,000 MiB/s throughput (2x increase).
  • io2 Block Express – Delivers up to 256,000 IOPS, 4,000 MB/s throughput, 64 TiB capacity with sub-millisecond latency and 99.999% durability.
  • Instance Bandwidth Weighting – New feature allows shifting up to 25% of network bandwidth to EBS for I/O-intensive workloads.
  • io1 → io2 migration recommended – AWS recommends upgrading io1 to io2 for better performance and durability at the same cost.
  • gp2 → gp3 migration recommended – gp3 offers 20% lower cost with better baseline performance (3,000 IOPS, 125 MiB/s).
  • RAID 0 less necessary – With gp3’s increased limits (80,000 IOPS per volume), many workloads no longer require multi-volume striping.

EBS Volume Type Selection for Performance

  • Selecting the right volume type is the most impactful decision for EBS performance.
  • gp3 (General Purpose SSD) – Recommended default for most workloads.
    • Baseline: 3,000 IOPS and 125 MiB/s at any volume size (no burst credits needed)
    • Max: 80,000 IOPS and 2,000 MiB/s throughput
    • Size: up to 64 TiB
    • Performance is provisioned independently of storage capacity
    • 20% lower cost per GB than gp2
  • io2 Block Express (Provisioned IOPS SSD) – For mission-critical, latency-sensitive workloads.
    • Max: 256,000 IOPS, 4,000 MB/s throughput, 64 TiB capacity
    • Sub-millisecond latency (avg. under 500 microseconds)
    • 99.999% durability (vs 99.8-99.9% for gp3)
    • Up to 1,000 IOPS per GiB ratio
    • Multi-Attach support (up to 16 instances simultaneously)
    • Available on all Nitro-based EC2 instances
  • gp2 (Previous Generation General Purpose SSD) – Still available but migration to gp3 is recommended.
    • Max: 16,000 IOPS, 250 MB/s throughput, 16 TiB
    • IOPS scales with volume size at 3 IOPS/GiB
    • Burst credit model for volumes under 1 TiB
  • io1 (Previous Generation Provisioned IOPS SSD) – Migration to io2 Block Express is recommended.
    • Max: 64,000 IOPS, 1,000 MB/s throughput, 16 TiB
    • 50 IOPS per GiB ratio
    • 99.8-99.9% durability

EBS-Optimized or 10 Gigabit Network Instances

  • An EBS-Optimized instance uses an optimized configuration stack and provides additional, dedicated capacity for EBS I/O.
  • Optimization provides the best performance for the EBS volumes by minimizing contention between EBS I/O and other traffic from an instance.
  • EBS-Optimized instances deliver dedicated throughput to EBS depending on the instance type used.
  • All current-generation EC2 instance types are EBS-optimized by default at no additional cost.
  • Some previous-generation instance types support EBS-optimization as an optional feature with an additional hourly fee.
  • When attached to an EBS–optimized instance,
    • General Purpose (gp3) volumes are designed to deliver within 10% of their provisioned performance 99% of the time in a given year.
    • Provisioned IOPS (io2 Block Express) volumes are designed to deliver within 10% of their provisioned performance 99.9% of the time in a given year.
  • The maximum EBS throughput varies by instance type – for example, latest generation instances like C8gd/M8gd/R8gd provide up to 40 Gbps of EBS bandwidth.

Instance Bandwidth Weighting

  • EC2 instances on select Nitro-based instance types support configurable bandwidth weighting between EBS and VPC networking.
  • Using the ebs-1 bandwidth weighting option increases EBS bandwidth by up to 25%, which reduces VPC network bandwidth by the same amount.
  • This is beneficial for I/O-intensive workloads that require higher EBS throughput but have lower network requirements.
  • The total available baseline bandwidth for the instance remains the same; it only shifts the allocation.
  • Network PPS and EBS IOPS specifications are unaffected by bandwidth weighting changes.
  • Can be configured at launch time using launch templates or modified on running instances.

EBS Volume Initialization – Pre-warming

  • Empty EBS volumes receive their maximum performance the moment that they are available and DO NOT require initialization (pre-warming).
  • EBS volumes needed a pre-warming, previously, before being used to get maximum performance to start with. Pre-warming of the volume was possible by writing to the entire volume with 0 for new volumes or reading the entire volume for volumes from snapshots.
  • Storage blocks on volumes that were restored from snapshots must be initialized (pulled down from S3 and written to the volume) before the block can be accessed.
  • This preliminary action takes time and can cause a significant increase in the latency of an I/O operation the first time each block is accessed.
  • To avoid this initial performance hit in a production environment, the following options can be used:
    • Force the immediate initialization of the entire volume by using the dd or fio utilities to read from all of the blocks on a volume.
    • Enable Fast Snapshot Restore (FSR) on a snapshot to ensure that the EBS volumes created from it are fully-initialized at creation and instantly deliver all of their provisioned performance.
  • Fast Snapshot Restore (FSR) considerations:
    • FSR eliminates the latency of I/O operations on first access of snapshot-restored volumes.
    • Available in all commercial AWS regions (expanded to 6 additional regions in August 2024).
    • Not supported with AWS Outposts, Local Zones, and Wavelength Zones.
    • FSR is charged per AZ per hour per snapshot enabled, so cost should be considered.

Elastic Volumes

  • Elastic Volumes allows modifying EBS volume size, type, IOPS, and throughput without detaching the volume or stopping the instance.
  • Supported on all current-generation instances and several previous-generation instances (C1, C3, C4, G2, I2, M1, M3, M4, R3, R4).
  • Modifications include:
    • Increasing volume size (cannot decrease)
    • Changing volume type (e.g., gp2 → gp3, io1 → io2)
    • Adjusting provisioned IOPS and throughput (gp3, io1, io2)
  • Size increases take effect once the modification reaches the “optimizing” state (usually seconds).
  • The file system must be extended within the OS after a size increase.
  • A volume can only be modified once every 6 hours.

RAID Configuration

  • EBS volumes can be striped, if a single EBS volume does not meet the performance requirements.
  • Note: With gp3 volumes now supporting up to 80,000 IOPS and 2,000 MiB/s per volume, many workloads that previously required RAID 0 can now use a single volume, improving resiliency.
  • Striping volumes allows pushing tens of thousands of IOPS beyond single-volume limits.
  • EBS volumes are already replicated across multiple servers in an AZ for availability and durability, so AWS generally recommends striping for performance rather than durability.
  • For greater I/O performance than can be achieved with a single volume, RAID 0 can stripe multiple volumes together; for on-instance redundancy, RAID 1 can mirror two volumes together.
  • RAID 0 allows I/O distribution across all volumes in a stripe, allowing straight gains with each addition.
  • RAID 1 can be used for durability to mirror volumes, but in this case, it requires more EC2 to EBS bandwidth as the data is written to multiple volumes simultaneously and should be used with EBS–optimization.
  • EBS volume data is replicated across multiple servers in an AZ to prevent the loss of data from the failure of any single component.
  • AWS doesn’t recommend RAID 5 and 6 because the parity write operations of these modes consume the IOPS available for the volumes and can result in 20-30% fewer usable IOPS than RAID 0.
  • A 2-volume RAID 0 config can outperform a 4-volume RAID 6 that costs twice as much.
  • Durability consideration: Each additional volume in a RAID 0 stripe reduces effective durability (e.g., 4 gp3 volumes in RAID 0 = ~99.6% effective durability vs. 99.9% for a single volume). With increased gp3 limits, fewer volumes are needed for the same performance.

RAID Configuration

EBS Performance Best Practices Summary

  • Use gp3 as the default volume type – provides better baseline performance than gp2 at 20% lower cost.
  • Use io2 Block Express for critical databases – sub-millisecond latency, 99.999% durability, up to 256K IOPS.
  • Right-size your instance – ensure the instance’s EBS bandwidth limit is not the bottleneck (use CloudWatch EBSIOBalance% and EBSByteBalance% metrics).
  • Use EBS bandwidth weighting for I/O-intensive workloads with lower network needs.
  • Prefer single larger volumes over RAID 0 when gp3 limits (80,000 IOPS, 2,000 MiB/s) are sufficient – simpler and more durable.
  • Enable Fast Snapshot Restore for production volumes restored from snapshots to avoid first-access latency.
  • Monitor with CloudWatch – track VolumeReadOps, VolumeWriteOps, VolumeQueueLength, and BurstBalance (gp2) metrics.
  • Migrate legacy volumes – upgrade gp2 → gp3 and io1 → io2 using Elastic Volumes (no downtime required).

📖 Related: AWS EBS Volume Types – gp3, io2, st1, sc1 Comparison

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A user is trying to pre-warm a blank EBS volume attached to a Linux instance. Which of the below mentioned steps should be performed by the user?
    1. There is no need to pre-warm an EBS volume (with latest update no pre-warming is needed)
    2. Contact AWS support to pre-warm (This used to be the case before, but pre warming is not necessary now)
    3. Unmount the volume before pre-warming
    4. Format the device
  2. A user has created an EBS volume of 10 GB and attached it to a running instance. The user is trying to access EBS for first time. Which of the below mentioned options is the correct statement with respect to a first time EBS access?
    1. The volume will show a size of 8 GB
    2. The volume will show a loss of the IOPS performance the first time (the volume needed to be wiped cleaned before for new volumes, however pre warming is not needed any more)
    3. The volume will be blank
    4. If the EBS is mounted it will ask the user to create a file system
  3. You are running a database on an EC2 instance, with the data stored on Elastic Block Store (EBS) for persistence At times throughout the day, you are seeing large variance in the response times of the database queries Looking into the instance with the isolate command you see a lot of wait time on the disk volume that the database’s data is stored on. What two ways can you improve the performance of the database’s storage while maintaining the current persistence of the data? Choose 2 answers
    1. Move to an SSD backed instance
    2. Move the database to an EBS-Optimized Instance
    3. Use Provisioned IOPs EBS
    4. Use the ephemeral storage on an m2.4xLarge Instance Instead
  4. You have launched an EC2 instance with four (4) 500 GB EBS Provisioned IOPS volumes attached. The EC2 Instance is EBS-Optimized and supports 500 Mbps throughput between EC2 and EBS. The two EBS volumes are configured as a single RAID 0 device, and each Provisioned IOPS volume is provisioned with 4,000 IOPS (4000 16KB reads or writes) for a total of 16,000 random IOPS on the instance. The EC2 Instance initially delivers the expected 16,000 IOPS random read and write performance. Sometime later in order to increase the total random I/O performance of the instance, you add an additional two 500 GB EBS Provisioned IOPS volumes to the RAID. Each volume is provisioned to 4,000 IOPS like the original four for a total of 24,000 IOPS on the EC2 instance Monitoring shows that the EC2 instance CPU utilization increased from 50% to 70%, but the total random IOPS measured at the instance level does not increase at all. What is the problem and a valid solution?
    1. Larger storage volumes support higher Provisioned IOPS rates: increase the provisioned volume storage of each of the 6 EBS volumes to 1TB.
    2. EBS-Optimized throughput limits the total IOPS that can be utilized use an EBS-Optimized instance that provides larger throughput. (EC2 Instance types have limit on max throughput and would require larger instance types to provide 24000 IOPS)
    3. Small block sizes cause performance degradation, limiting the I’O throughput, configure the instance device driver and file system to use 64KB blocks to increase throughput.
    4. RAID 0 only scales linearly to about 4 devices, use RAID 0 with 4 EBS Provisioned IOPS volumes but increase each Provisioned IOPS EBS volume to 6.000 IOPS.
    5. The standard EBS instance root volume limits the total IOPS rate, change the instant root volume to also be a 500GB 4,000 Provisioned IOPS volume
  5. A user has deployed an application on an EBS backed EC2 instance. For a better performance of application, it requires dedicated EC2 to EBS traffic. How can the user achieve this?
    1. Launch the EC2 instance as EBS provisioned with PIOPS EBS
    2. Launch the EC2 instance as EBS enhanced with PIOPS EBS
    3. Launch the EC2 instance as EBS dedicated with PIOPS EBS
    4. Launch the EC2 instance as EBS optimized with PIOPS EBS
  6. A company is running an I/O-intensive database on a gp2 EBS volume and experiencing inconsistent performance. The DBA wants to achieve consistent 50,000 IOPS with the lowest cost. Which approach should they use?
    1. Use multiple gp2 volumes in RAID 0 configuration
    2. Migrate to a single gp3 volume and provision 50,000 IOPS
    3. Use an io2 Block Express volume with 50,000 provisioned IOPS
    4. Use multiple gp3 volumes in RAID 0 to aggregate IOPS

    (gp3 now supports up to 80,000 IOPS per volume at a lower cost than io2. For 50,000 IOPS without needing 99.999% durability, gp3 is the most cost-effective choice.)

  7. An application requires 256,000 IOPS with sub-millisecond latency for a critical Oracle database. Which EBS configuration provides the required performance?
    1. Four gp3 volumes with 64,000 IOPS each in RAID 0
    2. Multiple io1 volumes in RAID 0 configuration
    3. A single io2 Block Express volume with 256,000 provisioned IOPS on a Nitro-based instance
    4. Eight gp3 volumes with 32,000 IOPS each in RAID 0

    (io2 Block Express supports up to 256,000 IOPS per volume with sub-millisecond latency. A single volume approach is simpler and provides higher durability (99.999%) than RAID configurations.)

  8. A team has restored an EBS volume from a snapshot and needs to serve production traffic immediately with full provisioned IOPS. What should they do?
    1. Pre-warm the volume by reading all blocks using the dd utility
    2. Wait for 24 hours for background initialization to complete
    3. Enable Fast Snapshot Restore (FSR) on the snapshot before creating the volume
    4. Attach the volume to an EBS-optimized instance to speed up initialization

    (Fast Snapshot Restore ensures volumes created from the snapshot are fully initialized at creation, eliminating first-access latency. This must be enabled before creating the volume.)

  9. An EC2 instance is running an I/O-heavy analytics workload with low network traffic requirements. The team wants to maximize EBS throughput without changing the instance type. What feature can help?
    1. Enable Enhanced Networking on the instance
    2. Configure the instance with ebs-1 bandwidth weighting to increase EBS bandwidth by 25%
    3. Enable placement groups for the instance
    4. Attach additional network interfaces to the instance

    (Instance bandwidth weighting allows reallocating up to 25% of VPC network bandwidth to EBS, beneficial for workloads with high I/O but low networking needs.)

  10. A company wants to migrate their existing io1 volumes to a newer volume type with better durability and performance without application downtime. Which approach is recommended? (Select TWO)
    1. Create new io2 volumes from snapshots and switch
    2. Use Elastic Volumes to modify the volume type from io1 to io2 without detaching
    3. The migration provides 99.999% durability (up from 99.8-99.9%) at the same cost
    4. io1 to io2 migration requires stopping the instance
    5. io2 volumes cost 50% more than io1 for the same IOPS

    (Elastic Volumes supports online type change from io1 to io2. io2 provides higher durability and performance (1,000 IOPS/GiB vs 50 IOPS/GiB) at the same storage and IOPS pricing.)

AWS Identity Services Cheat Sheet

AWS Identity Services Cheat Sheet

AWS Identity and Security Services

IAM – Identity & Access Management

  • securely control access to AWS services and resources
  • helps create and manage user identities and grant permissions for those users to access AWS resources
  • helps create groups for multiple users with similar permissions
  • not appropriate for application authentication
  • is Global and does not need to be migrated to a different region
  • helps define Policies,
    • in JSON format
    • all permissions are implicitly denied by default
    • most restrictive policy wins
  • IAM Role
    • helps grants and delegate access to users and services without the need of creating permanent credentials
    • IAM users or AWS services can assume a role to obtain temporary security credentials that can be used to make AWS API calls
    • needs Trust policy to define who and Permission policy to define what the user or service can access
    • used with Security Token Service (STS), a lightweight web service that provides temporary, limited privilege credentials for IAM users or for authenticated federated users
    • IAM role scenarios
      • Service access for e.g. EC2 to access S3 or DynamoDB
      • Cross Account access for users
        • with user within the same account
        • with user within an AWS account owned the same owner
        • with user from a Third Party AWS account with External ID for enhanced security
      • Identity Providers & Federation
        • AssumeRoleWithWebIdentity – Web Identity Federation, where the user can be authenticated using external authentication Identity providers like Amazon, Google or any OpenId IdP
        • AssumeRoleWithSAML – Identity Provider using SAML 2.0, where the user can be authenticated using on premises Active Directory, Open Ldap or any SAML 2.0 compliant IdP
        • AssumeRole (recommended) or GetFederationToken – For other Identity Providers, use Identity Broker to authenticate and provide temporary Credentials
  • IAM MFA (Multi-Factor Authentication)
    • AWS supports FIDO2 passkeys, virtual MFA devices (authenticator apps), and hardware MFA tokens
    • SMS MFA has been discontinued – use FIDO2 passkeys or virtual/hardware MFA devices instead
    • AWS enforces MFA for root users across all account types (rolled out 2024-2025)
    • FIDO2 passkeys use public key cryptography for phishing-resistant authentication
    • Up to 8 MFA devices can be registered per IAM user
  • IAM Best Practices
    • Do not use Root account for anything other than billing
    • Create Individual IAM users
    • Use groups to assign permissions to IAM users
    • Grant least privilege
    • Use IAM roles for applications on EC2
    • Delegate using roles instead of sharing credentials
    • Rotate credentials regularly
    • Use Policy conditions for increased granularity
    • Use CloudTrail to keep a history of activity
    • Enforce a strong IAM password policy for IAM users
    • Remove all unused users and credentials
    • Enable MFA for all users, especially root accounts – use FIDO2 passkeys for strongest protection
    • Use IAM Access Analyzer to identify unused access and overly permissive policies
  • Increased IAM Quotas (May 2026)
    • Roles per account: up to 10,000
    • Managed policies per account: up to 10,000
    • Role trust policy size: up to 8,192 characters

IAM Roles Anywhere

  • enables workloads running outside of AWS (on-premises, hybrid, multi-cloud) to access AWS resources using temporary credentials
  • eliminates the need for long-term AWS access keys for external workloads
  • uses X.509 certificates from your Certificate Authority (CA) for authentication
  • integrates with existing enterprise PKI infrastructure
  • key components:
    • Trust Anchor – establishes trust between IAM Roles Anywhere and your CA
    • Profile – specifies the IAM roles and session policies
    • Credential Helper – tool that runs on the workload to obtain temporary credentials
  • supports workloads on-premises, in containers, or in other cloud providers
  • uses the same IAM policies and roles as AWS workloads for consistent access control

IAM Access Analyzer

  • helps identify resources shared with external entities and validate IAM policies
  • provides External Access Analysis – identifies resources accessible from outside your account or organization
  • provides Unused Access Analysis – continuously monitors for:
    • Unused IAM roles
    • Unused access keys for IAM users
    • Unused passwords for IAM users
    • Unused services and actions for active roles/users
  • supports Custom Policy Checks – validates policies before deployment against best practices
  • generates policy recommendations based on access activity (least privilege)
  • integrates with AWS Security Hub for centralized findings
  • zone of trust can be set at account or organization level

AWS Organizations

  • is an account management service that enables consolidating multiple AWS accounts into an organization that can be centrally managed.
  • include consolidated billing and account management capabilities that enable one to better meet the budgetary, security, and compliance needs of your business.
  • As an administrator of an organization, new accounts can be created in an organization and invite existing accounts to join the organization.
  • enables you to
    • Automate AWS account creation and management, and provision resources with AWS CloudFormation Stacksets.
    • Maintain a secure environment with policies and management of AWS security services
    • Govern access to AWS services, resources, and regions
    • Centrally manage policies across multiple AWS accounts
    • Audit your environment for compliance
    • View and manage costs with consolidated billing
    • Configure AWS services across multiple accounts
  • supports Service Control Policies – SCPs
    • offer central control over the maximum available permissions for all of the accounts in your organization, ensuring member accounts stay within the organization’s access control guidelines.
    • are available only in an organization that has all features enabled, and aren’t available if the organization has enabled only the consolidated billing features.
    • are NOT sufficient for granting access to the accounts in the organization.
    • defines a guardrail for what actions accounts within the organization root or OU can do, but IAM policies need to be attached to the users and roles in the organization’s accounts to grant permissions to them.
    • Effective permissions are the logical intersection between what is allowed by the SCP and what is allowed by the IAM and resource-based policies.
    • with an SCP attached to member accounts, identity-based and resource-based policies grant permissions to entities only if those policies and the SCP allow the action
    • don’t affect users or roles in the management account. They affect only the member accounts in your organization.
  • supports Resource Control Policies (RCPs)launched Nov 2024
    • a new authorization policy type that sets the maximum available permissions on resources within the organization
    • complement SCPs – SCPs control what principals can do, RCPs control what can be done on resources
    • help centrally restrict external access to AWS resources at scale (establish data perimeters)
    • don’t affect resources in the management account – only affect resources in member accounts
    • work alongside SCPs to provide comprehensive authorization guardrails
    • supported by AWS Control Tower for managed preventive controls
  • supports Declarative Policieslaunched Dec 2024 at re:Invent
    • a new management policy type that declares and enforces desired configuration for AWS services at scale
    • different from SCPs/RCPs – declarative policies enforce service configurations, not just permissions
    • configuration is always maintained even when the service adds new features or APIs
    • simplifies governance by defining durable intent for baseline service configurations

AWS Directory Services

  • gives applications in AWS access to Active Directory services
  • different from SAML + AD, where the access is granted to AWS services through Temporary Credentials
  • AWS Managed Microsoft AD
    • fully managed Microsoft Active Directory powered by Windows Server
    • available in Standard and Enterprise editions
    • supports self-service API-driven edition upgrades (Standard to Enterprise) – Oct 2025
    • supports dual-stack networking (IPv4 and IPv6) – Sep 2025
    • includes Directory Service Data API for built-in object management (users, groups, attributes) – Sep 2024
    • Hybrid Edition (Aug 2025) – extends your existing self-managed AD domain to AWS Managed Microsoft AD
      • automatically handles replication between on-premises AD and AWS
      • preserves existing identity and access infrastructure
      • simplifies migration of AD-dependent workloads to AWS
      • supports extending domains from on-premises, AWS, or multi-cloud
  • Simple AD
    • least expensive but does not support Microsoft AD advanced features
    • provides a Samba 4 Microsoft Active Directory compatible standalone directory service on AWS
    • No single point of Authentication or Authorization, as a separate copy is maintained
    • trust relationships cannot be setup between Simple AD and other Active Directory domains
    • Don’t use it, if the requirement is to leverage access and control through centralized authentication service
  • AD Connector
    • acts just as an hosted proxy service for instances in AWS to connect to on-premises Active Directory
    • enables consistent enforcement of existing security policies, such as password expiration, password history, and account lockouts, whether users are accessing resources on-premises or in the AWS cloud
    • needs VPN connectivity (or Direct Connect)
    • integrates with existing RADIUS-based MFA solutions to enabled multi-factor authentication
    • does not cache data which might lead to latency
  • Read-only Domain Controllers (RODCs)
    • works out as a Read-only Active Directory
    • holds a copy of the Active Directory Domain Service (AD DS) database and respond to authentication requests
    • they cannot be written to and are typically deployed in locations where physical security cannot be guaranteed
    • helps maintain a single point to authentication & authorization controls, however needs to be synced
  • Writable Domain Controllers
    • are expensive to setup
    • operate in a multi-master model; changes can be made on any writable server in the forest, and those changes are replicated to servers throughout the entire forest

AWS IAM Identity Center (formerly AWS Single Sign-On)

  • is the recommended service for managing workforce access to AWS accounts and applications (formerly known as AWS SSO, renamed July 2022)
  • provides centralized SSO access to all AWS accounts and cloud applications
  • helps manage access and permissions to commonly used third-party software as a service (SaaS) applications, AWS-integrated applications as well as custom applications that support SAML 2.0.
  • includes a user portal where end-users can find and access all their assigned AWS accounts, cloud applications, and custom applications in one place.
  • supports connecting external identity providers (Okta, Microsoft Entra ID, Ping Identity) or using built-in directory
  • Trusted Identity Propagation
    • enables administrators to grant permissions based on user attributes (user ID, group associations) across AWS service boundaries
    • eliminates the need for service-specific identity mapping
    • supports services like Amazon Redshift, Amazon Q Business, Amazon EMR, and more
  • Multi-Region Replication (Feb 2026)
    • replicate identity configurations across multiple AWS Regions
    • provides active access portal endpoints in multiple Regions for improved availability
    • available for organization instances connected to external identity providers
    • currently available in 17 enabled-by-default commercial AWS Regions
  • supports customer managed policies and permission boundaries in permission sets

Amazon Cognito

  • Amazon Cognito provides authentication, authorization, and user management for the web and mobile apps.
  • Users can sign in directly with a username and password, or through a third party such as Facebook, Amazon, Google, or Apple.
  • Cognito has two main components.
    • User pools are user directories that provide sign-up and sign-in options for the app users.
    • Identity pools enable you to grant the users access to other AWS services.
  • Feature Tiers (Nov 2024) – User pools now offer three tiers:
    • Lite – basic authentication features (existing user pools default to this)
    • Essentials – includes Managed Login, passwordless authentication (passkeys, email, SMS), access token customization, password reuse prevention (new user pools default to this)
    • Plus – adds advanced security features including adaptive authentication, threat protection, and compromised credentials detection
  • Managed Login (Nov 2024) – fully managed, hosted sign-in/sign-up experience with rich branding customization
  • Passwordless Authentication (Nov 2024)
    • supports passkeys (FIDO standards, public key cryptography) for phishing-resistant sign-in
    • supports email and SMS one-time passwords
    • available in the Essentials tier
  • Refresh Token Rotation (Apr 2025) – enables automatic rotation of OAuth 2.0 refresh tokens for improved security
  • Client Secret Management (Feb 2026) – custom client secrets, on-demand rotation, up to two active secrets per app client
  • Multi-Region Replication (2026) – replicate user pools across Regions for business continuity and reduced latency
  • Customer-Managed Keys – full control over data encryption at rest using your own KMS keys
  • Cognito SyncNote: AWS recommends using AWS AppSync instead of Cognito Sync for new implementations. AppSync provides similar data synchronization with additional real-time and offline capabilities.

Amazon Verified Permissions

  • a fully managed, fine-grained authorization service for applications (GA 2023)
  • uses Cedar, an open-source policy language purpose-built for authorization
  • externalizes authorization logic from application code for consistent access control
  • supports both Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC)
  • key components:
    • Policy Store – container for Cedar policies, logically isolated from other stores
    • Policies and Templates – define who can do what on which resources
    • Schema – defines entity types, actions, and their relationships
    • Authorization Requests – real-time evaluation of user access against policies
  • integrates natively with Amazon Cognito for identity context
  • aligns with Zero Trust principles – least privilege and continuous verification
  • supports multi-tenant authorization with multiple identity providers
  • enables security teams to audit and analyze application-level access centrally