AWS AI & ML Services Cheat Sheet

AWS AI & ML Services Cheat Sheet

  • AWS provides a comprehensive suite of AI and Machine Learning services spanning generative AI, ML platforms, AI services, and responsible AI.
  • Services range from pre-trained APIs requiring no ML expertise to fully managed platforms for custom model training and deployment.
  • This cheat sheet covers services relevant to the AWS AI Practitioner (AIF-C01), ML Engineer Associate (MLA-C01), and Solutions Architect certifications.
🎓 Build AI Skills with Google
Learn practical AI skills and earn a Google Certificate. No experience required – learn at your own pace.
Start the Google AI Essentials Learning Path →

Generative AI Services

Amazon Bedrock

  • Fully managed service to build generative AI applications using foundation models (FMs).
  • Access models from AI21 Labs, Anthropic (Claude), Cohere, Meta (Llama), Mistral, Stability AI, and Amazon (Titan).
  • No infrastructure to manage – serverless API access to foundation models.
  • Knowledge Bases – implement RAG (Retrieval Augmented Generation) by connecting FMs to your data sources (S3, databases).
  • Agents – create AI agents that can plan, execute multi-step tasks, and call APIs/Lambda functions.
  • Guardrails – control model outputs with content filters, denied topics, PII redaction, and word filters.
  • Model Evaluation – evaluate and compare FM performance on your specific tasks.
  • Fine-tuning – customize models with your data (continued pre-training or instruction fine-tuning).
  • Provisioned Throughput – reserve model capacity for consistent performance.
  • Data is not used to train base models – data privacy by default.

Amazon Q

  • Amazon Q Business – AI assistant for enterprise that connects to company data (S3, SharePoint, Confluence, Salesforce, etc.).
  • Amazon Q Developer – AI coding assistant for IDEs with code generation, debugging, transformation, and security scanning.
  • Amazon Q in QuickSight – natural language queries for BI dashboards.
  • Amazon Q in Connect – AI-powered agent assistance for contact centers.
  • Respects existing access controls and permissions – users only see answers from data they can access.

Amazon Titan Models

  • Titan Text – text generation, summarization, classification, Q&A.
  • Titan Embeddings – convert text to numerical vectors for search, RAG, and recommendations.
  • Titan Image Generator – generate and edit images from text prompts.
  • Titan Multimodal Embeddings – embeddings for both text and images.
  • All Titan models include built-in watermarking for generated content.

ML Platform

Amazon SageMaker

  • Fully managed ML platform for building, training, and deploying models at scale.
  • SageMaker Studio – integrated IDE for ML development (notebooks, experiments, pipelines).
  • Built-in algorithms – XGBoost, Linear Learner, K-Means, Image Classification, Object Detection, etc.
  • Training – managed training infrastructure with spot instances (up to 90% savings).
  • SageMaker Pipelines – CI/CD for ML (MLOps) with automated workflow orchestration.
  • Model Registry – catalog, version, and manage trained models.
  • Endpoints – real-time inference, batch transform, async inference, serverless inference.
  • SageMaker Canvas – no-code ML for business analysts (visual interface).
  • SageMaker JumpStart – pre-trained foundation models and ML solutions ready to deploy.
  • SageMaker Clarify – detect bias in data/models and explain model predictions (SHAP values).
  • SageMaker Data Wrangler – visual data preparation and feature engineering.
  • SageMaker Feature Store – centralized repository for ML features (online + offline store).
  • SageMaker Ground Truth – data labeling with human annotators and active learning.
  • SageMaker Model Monitor – detect data drift, model quality drift, and bias drift in production.

AI Services (Pre-trained APIs)

Natural Language Processing (NLP)

  • Amazon Comprehend – NLP service for sentiment analysis, entity recognition, key phrases, language detection, PII detection, topic modeling.
  • Amazon Comprehend Medical – extract medical entities (conditions, medications, dosages) from clinical text.
  • Amazon Translate – neural machine translation for 75+ languages with custom terminology support.
  • Amazon Transcribe – speech-to-text (ASR) with speaker identification, custom vocabulary, PII redaction.
  • Amazon Transcribe Medical – medical speech-to-text for clinical documentation.

Vision

  • Amazon Rekognition – image and video analysis (object/scene detection, face analysis, text in images, content moderation, celebrity recognition, custom labels).
  • Amazon Textract – extract text, tables, and forms from documents (beyond basic OCR). Supports invoices, receipts, ID documents.

Speech

  • Amazon Polly – text-to-speech with neural and standard voices, SSML support, speech marks for lip-sync.
  • Amazon Lex – build conversational chatbots with automatic speech recognition (ASR) and natural language understanding (NLU). Powers Alexa technology.

Search & Recommendations

  • Amazon Kendra – intelligent enterprise search powered by ML with natural language queries and document ranking.
  • Amazon Personalize – real-time personalized recommendations (similar to Amazon.com) without ML expertise.

Forecasting & Other

  • Amazon Forecast – time-series forecasting using ML (demand planning, resource planning).
  • Amazon Fraud Detector – identify potentially fraudulent online activities using ML.
  • Amazon CodeWhisperer (now Amazon Q Developer) – AI-powered code suggestions in IDEs.

Data & Analytics for ML

  • AWS Glue – serverless ETL with built-in ML transforms (FindMatches for deduplication).
  • Amazon Athena ML – run ML inference from SQL queries using SageMaker models.
  • Amazon Redshift ML – create, train, and deploy ML models using SQL (uses SageMaker Autopilot).
  • Amazon Kinesis – real-time data streaming for ML inference on streaming data.
  • AWS Lake Formation – build secure data lakes as training data sources.

Responsible AI

  • Amazon Bedrock Guardrails – content filters, denied topics, PII redaction, hallucination reduction (grounding checks).
  • SageMaker Clarify – pre-training bias detection (CI, DPL, KL metrics) and post-training bias detection (DPPL, DI, AD).
  • SageMaker Model Monitor – continuous monitoring for data quality, model quality, bias drift, and feature attribution drift.
  • Model Explainability – SHAP values for feature importance and individual prediction explanations.
  • Amazon Titan watermarking – invisible watermarks in generated images for content authenticity.
  • AWS AI Service Cards – transparency documentation for AWS AI services.
  • Human-in-the-loop – Amazon Augmented AI (A2I) for human review of ML predictions.

Infrastructure for AI/ML

  • AWS Trainium – custom chip optimized for deep learning training (used in EC2 Trn1 instances).
  • AWS Inferentia – custom chip optimized for inference (used in EC2 Inf2 instances). Up to 40% better price-performance than GPU.
  • Amazon EC2 P5/P4d instances – NVIDIA GPU instances for training and inference.
  • Amazon EC2 G5/G6 instances – GPU instances for graphics and ML inference.
  • AWS Neuron SDK – compile and optimize models for Trainium and Inferentia chips.
  • Amazon S3 – primary storage for training data, model artifacts, and outputs.
  • FSx for Lustre – high-throughput file system for training data (integrates with S3).

Key Concepts for Certification

ML Workflow

  • Data CollectionData Preparation (cleaning, feature engineering) → Model TrainingEvaluationDeploymentMonitoring

Model Types

  • Supervised Learning – labeled data (classification, regression). Examples: fraud detection, price prediction.
  • Unsupervised Learning – no labels (clustering, anomaly detection). Examples: customer segmentation, topic modeling.
  • Reinforcement Learning – agent learns through rewards (robotics, game playing, recommendations).
  • Foundation Models – large pre-trained models fine-tuned or used via prompting (GPT, Claude, Llama, Titan).

RAG (Retrieval Augmented Generation)

  • Combines a foundation model with external knowledge retrieval to provide accurate, up-to-date, and cited answers.
  • AWS implementation: Bedrock Knowledge Bases + vector database (OpenSearch Serverless, Aurora PostgreSQL, Pinecone).
  • Process: Query → Retrieve relevant chunks from knowledge base → Augment prompt with context → Generate answer.

Prompt Engineering

  • Zero-shot – ask directly without examples.
  • Few-shot – provide examples in the prompt.
  • Chain-of-thought – instruct the model to reason step by step.
  • System prompts – set behavior, persona, and constraints.

AWS Certification Exam Practice Questions

  1. A company wants to build a chatbot that answers questions using their internal documentation stored in S3 and Confluence. The answers must cite sources. Which AWS service and feature combination is most appropriate?
    1. Amazon Lex with Lambda
    2. Amazon Bedrock with Knowledge Bases (RAG)
    3. Amazon Kendra with Lex
    4. Amazon Comprehend with Q Business
  2. A team needs to detect if their ML model exhibits bias against a protected demographic group before deploying to production. Which service should they use?
    1. Amazon Bedrock Guardrails
    2. Amazon Rekognition
    3. SageMaker Clarify
    4. Amazon Comprehend
  3. An application needs to extract structured data (tables, key-value pairs) from scanned invoices and receipts. Which service is purpose-built for this?
    1. Amazon Rekognition
    2. Amazon Comprehend
    3. Amazon Textract
    4. Amazon Bedrock
  4. A generative AI application must prevent the model from discussing competitor products and must redact any PII in responses. Which feature provides these controls?
    1. SageMaker Model Monitor
    2. Amazon Bedrock Guardrails
    3. Amazon Comprehend PII detection
    4. AWS WAF
  5. A company needs the lowest cost per inference for deploying a trained deep learning model at high throughput. Which AWS hardware is optimized for this?
    1. EC2 P5 instances (NVIDIA GPU)
    2. EC2 G5 instances
    3. EC2 Inf2 instances (AWS Inferentia2)
    4. EC2 Trn1 instances (AWS Trainium)

Related Posts

References

Amazon Bedrock User Guide

Amazon SageMaker Developer Guide

Amazon Q Business User Guide

AWS AI/ML Services Overview

AWS Transit Gateway vs VPC Peering vs PrivateLink

AWS Transit Gateway vs VPC Peering vs PrivateLink

  • AWS provides multiple VPC connectivity options, each designed for different network topologies and use cases.
  • VPC Peering is point-to-point, Transit Gateway is a hub for many-to-many connectivity, and PrivateLink is for private service access without network exposure.
  • Choice depends on number of VPCs, routing requirements, security posture, and cost.

Transit Gateway vs VPC Peering vs PrivateLink Comparison

Feature VPC Peering Transit Gateway PrivateLink
Topology Point-to-point (1:1) Hub-and-spoke (many:many) Service endpoint (consumer:provider)
Transitive Routing No Yes No (service access only)
Scale 125 peering connections per VPC 5,000 attachments per TGW Unlimited endpoints
Cross-Region Yes (inter-region peering) Yes (inter-region peering) Yes (with inter-region support)
Cross-Account Yes Yes (RAM sharing) Yes
CIDR Overlap Not allowed Not allowed (per attachment) Allowed (uses ENI in consumer VPC)
Network Exposure Full VPC network visible to peer Full VPC network via route tables Only the service endpoint exposed
Bandwidth No limit (same as inter-AZ) Up to 50 Gbps per attachment Up to 100 Gbps per endpoint
Cost Data transfer only (no hourly charge) Hourly per attachment + data processing Hourly per endpoint + data processing
Use Case Few VPCs, simple connectivity Many VPCs, centralized routing, VPN/DX aggregation Expose service privately, SaaS connectivity, zero-trust
Route Management Update route tables in both VPCs Centralized route tables on TGW No route table changes needed
Security Security groups + NACLs Security groups + NACLs + TGW route tables Minimum exposure (only service port)

When to Choose Which

  • Choose VPC Peering – Small number of VPCs (2-5), simple point-to-point connectivity, lowest cost, no transitive routing needed.
  • Choose Transit Gateway – Many VPCs needing full mesh connectivity, centralized VPN/Direct Connect, shared services VPC, network segmentation with route tables.
  • Choose PrivateLink – Expose a specific service to other accounts/VPCs without full network access, overlapping CIDRs, SaaS service consumption, zero-trust architecture.
  • Combine TGW + PrivateLink – Transit Gateway for general connectivity between VPCs, PrivateLink for specific service access with minimal exposure.

AWS Certification Exam Practice Questions

  1. A company has 50 VPCs that all need to communicate with a shared services VPC and a centralized Direct Connect connection. Which connectivity solution scales best?
    1. VPC Peering (50 connections)
    2. Transit Gateway
    3. PrivateLink
    4. VPN to each VPC
  2. A SaaS provider needs to expose their service to customers in different AWS accounts without exposing their entire VPC network. The customer VPCs have overlapping CIDR ranges. Which solution works?
    1. VPC Peering
    2. Transit Gateway
    3. PrivateLink
    4. Site-to-Site VPN
  3. Two VPCs in the same region need connectivity. The traffic volume is minimal, cost is a priority, and no transitive routing is needed. What is the most cost-effective solution?
    1. Transit Gateway
    2. VPC Peering
    3. PrivateLink
    4. AWS Cloud WAN
  4. An organization needs VPC A to route traffic through VPC B to reach VPC C. Which service supports this transitive routing?
    1. VPC Peering
    2. PrivateLink
    3. Transit Gateway
    4. Internet Gateway

Related Posts

References

AWS Transit Gateway Guide

VPC Peering Guide

AWS PrivateLink Guide

AWS CloudFront vs Global Accelerator

AWS CloudFront vs Global Accelerator

  • Both CloudFront and Global Accelerator use the AWS global edge network to improve performance, but they serve different purposes.
  • CloudFront is a Content Delivery Network (CDN) that caches content at edge locations.
  • Global Accelerator routes traffic over the AWS backbone to the optimal regional endpoint without caching.

CloudFront vs Global Accelerator Comparison

Feature CloudFront Global Accelerator
Purpose Content caching and delivery (CDN) Network traffic acceleration (no caching)
Layer Layer 7 (HTTP/HTTPS/WebSocket) Layer 4 (TCP/UDP)
Caching Yes – caches content at 400+ edge locations No caching – proxies all requests to origin
Static IP No (uses DNS-based routing) Yes – 2 anycast static IPs
IP Whitelisting Difficult (IPs change) Easy (fixed anycast IPs)
Protocols HTTP, HTTPS, WebSocket TCP, UDP
DNS Propagation Affected by DNS TTL during failover Instant failover (no DNS change, same IPs)
Failover Origin failover (primary/secondary) Automatic endpoint failover (<30 seconds)
DDoS Protection AWS Shield Standard (Shield Advanced optional) AWS Shield Standard (Shield Advanced optional)
WAF Integration Yes (AWS WAF) No
Origins/Endpoints S3, ALB, EC2, custom HTTP origins ALB, NLB, EC2, Elastic IP
Client Affinity No (stateless caching) Yes (source IP based)
Best For Static/dynamic web content, APIs, video streaming Gaming, IoT, VoIP, non-HTTP apps needing fixed IPs

When to Choose Which

  • Choose CloudFront – Web applications, static asset delivery, video streaming, API acceleration with caching, need WAF integration.
  • Choose Global Accelerator – Non-HTTP protocols (TCP/UDP gaming, IoT), need static IP for whitelisting, instant failover without DNS propagation, multi-region active-active.
  • Use Both Together – CloudFront for cacheable content + Global Accelerator for the origin endpoint requiring fixed IPs and fast failover.

AWS Certification Exam Practice Questions

  1. A gaming company needs to route UDP traffic to the nearest regional server with sub-30-second failover and static IPs for firewall whitelisting. Which service should they use?
    1. CloudFront
    2. Route 53 latency-based routing
    3. Global Accelerator
    4. ALB with multi-region
  2. A media company wants to serve video content and static assets globally with lowest latency and reduce origin load. Which service is most appropriate?
    1. Global Accelerator
    2. CloudFront
    3. S3 Transfer Acceleration
    4. Route 53 geolocation routing
  3. An application needs both fast failover (no DNS propagation delay) and static anycast IP addresses for a TCP-based service. The content is dynamic and cannot be cached. Which service fits?
    1. CloudFront with cache disabled
    2. Global Accelerator
    3. Route 53 with health checks
    4. NLB with Elastic IPs

Related Posts

References

Amazon CloudFront Developer Guide

AWS Global Accelerator Developer Guide

AWS S3 vs EBS vs EFS vs FSx

AWS S3 vs EBS vs EFS vs FSx

  • AWS offers four primary storage services, each designed for different access patterns and use cases.
  • S3 is object storage, EBS is block storage for EC2, EFS is managed NFS file storage, and FSx is managed file systems for specific workloads (Windows, Lustre, NetApp, OpenZFS).
  • Choice depends on access patterns, performance needs, sharing requirements, and protocol compatibility.

S3 vs EBS vs EFS vs FSx Comparison

Feature S3 EBS EFS FSx
Type Object storage Block storage File storage (NFS) File storage (multiple protocols)
Access HTTP/HTTPS API Attached to single EC2 (multi-attach for io2) Concurrent access from multiple EC2/ECS/Lambda Concurrent access from multiple instances
Protocol REST API, S3 API Block device (no protocol) NFSv4.1 SMB, NFS, Lustre, iSCSI
Max Size Unlimited (5TB per object) 64TB per volume Unlimited (petabyte scale) Varies (up to petabytes)
Durability 99.999999999% (11 nines) 99.999% (within AZ) 99.999999999% (across AZs) 99.999999999% (Multi-AZ option)
Availability Zone Regional (across 3+ AZs) Single AZ Regional (Multi-AZ) Single AZ or Multi-AZ
Performance High throughput, higher latency Low latency (sub-ms for io2) Low latency, scales with file system Sub-ms latency (Lustre/OpenZFS)
Use Case Backup, data lake, static hosting, archive Databases, boot volumes, transactional workloads Shared content, CMS, home dirs, containers HPC, Windows shares, ML training, ONTAP
Pricing Model Per GB stored + requests + transfer Per GB provisioned + IOPS (io2) Per GB used (no provisioning) Per GB provisioned + throughput
Lifecycle/Tiering Yes (Intelligent-Tiering, Glacier) No Yes (EFS IA, Archive) Limited (FSx Lustre S3 integration)
Encryption SSE-S3, SSE-KMS, SSE-C AES-256 (KMS) AES-256 (KMS) AES-256 (KMS)
Backup Versioning, Cross-Region Replication Snapshots (to S3) AWS Backup AWS Backup, automatic backups

Amazon S3

  • Object storage – stores data as objects (key-value) with unlimited capacity.
  • Best for data lakes, backups, static website hosting, media storage, and archive.
  • Multiple storage classes: Standard, IA, One Zone-IA, Intelligent-Tiering, Glacier Instant/Flexible/Deep Archive.
  • Not a file system – cannot be mounted directly (use S3 Mountpoint for read-heavy workloads).
  • Supports event notifications to Lambda, SQS, SNS, EventBridge for event-driven processing.

Amazon EBS

  • Block-level storage designed for single EC2 instance attachment (io2 supports multi-attach).
  • Best for databases, boot volumes, and applications requiring low-latency block access.
  • Volume types: gp3 (general purpose), io2 Block Express (high IOPS), st1 (throughput), sc1 (cold).
  • Single AZ – volume and instance must be in the same AZ; use snapshots for cross-AZ/region copies.
  • Supports live volume resizing and type changes without downtime.

Amazon EFS

  • Fully managed NFS file system – multiple EC2, ECS, and Lambda can access simultaneously.
  • Best for shared content repositories, CMS, home directories, container storage, and ML training data.
  • Elastic – automatically grows and shrinks; pay only for storage used.
  • Performance modes: General Purpose (latency-sensitive) and Max I/O (high throughput).
  • EFS Infrequent Access and Archive tiers reduce costs for rarely accessed files.
  • Supports cross-region replication for DR.

Amazon FSx

  • Managed file systems for specialized workloads – four options available.
  • FSx for Windows File Server – fully managed Windows-native (SMB) with Active Directory integration.
  • FSx for Lustre – high-performance parallel file system for HPC, ML training; integrates with S3.
  • FSx for NetApp ONTAP – multi-protocol (NFS, SMB, iSCSI) with NetApp features (SnapMirror, FlexClone).
  • FSx for OpenZFS – high-performance NFS with snapshots, cloning, and data compression.

When to Choose Which

  • Choose S3 – Unstructured data, backups, data lake, web assets, archive, serverless data processing.
  • Choose EBS – Database storage, single-instance applications requiring low-latency block I/O.
  • Choose EFS – Shared Linux file system, containers needing shared storage, Lambda file access.
  • Choose FSx for Windows – Windows applications, SMB shares, Active Directory integration.
  • Choose FSx for Lustre – HPC, ML training, video processing needing sub-ms latency with S3 integration.
  • Choose FSx for ONTAP – Multi-protocol access, hybrid cloud, applications needing NetApp features.

AWS Certification Exam Practice Questions

  1. A web application running on multiple EC2 instances across AZs needs shared access to uploaded user files. Which storage service is most appropriate?
    1. S3 with Mountpoint
    2. EBS Multi-Attach
    3. EFS
    4. FSx for OpenZFS
  2. A database workload requires consistent sub-millisecond latency with 64,000 IOPS on a single EC2 instance. Which storage should be used?
    1. EFS Max I/O mode
    2. S3 Express One Zone
    3. EBS io2 Block Express
    4. FSx for Lustre
  3. A company needs to migrate Windows file shares with Active Directory permissions to AWS. Which service maintains full compatibility?
    1. EFS
    2. S3
    3. FSx for OpenZFS
    4. FSx for Windows File Server
  4. An ML training job needs to process a 50TB dataset stored in S3 with the highest possible throughput. Which storage approach minimizes training time?
    1. Copy data to EBS gp3
    2. Access directly from S3
    3. FSx for Lustre linked to S3 bucket
    4. Copy to EFS

Related Posts

References

Amazon S3 User Guide

Amazon EBS User Guide

Amazon EFS User Guide

Amazon FSx for Windows File Server User Guide

AWS Lambda vs Fargate vs App Runner

AWS Lambda vs Fargate vs App Runner

  • AWS offers multiple serverless/managed compute options for running application code without managing servers.
  • Lambda is for event-driven functions, Fargate is for containerized workloads, and App Runner is for web applications and APIs with zero configuration.
  • All three eliminate server management but differ in execution model, duration limits, and use cases.

Lambda vs Fargate vs App Runner Comparison

Feature Lambda Fargate App Runner
Execution Model Event-driven functions Long-running containers Always-running web service
Max Duration 15 minutes Unlimited Unlimited
Scale to Zero Yes (no charge when idle) No (minimum 1 task running) Yes (pause instances when no traffic)
Cold Start Yes (mitigated with Provisioned Concurrency) Container startup time (seconds) Minimal (keeps warm instances)
Memory 128MB – 10GB 512MB – 120GB 512MB – 12GB
vCPU Proportional to memory (up to 6 vCPU) 0.25 – 16 vCPU 0.25 – 4 vCPU
Deployment ZIP or container image Container image Source code (GitHub) or container image
Auto Scaling Automatic per-request concurrency Task-level with ECS/EKS Auto Scaling Automatic based on concurrent requests
Networking Optional VPC access Full VPC integration VPC Connector (optional)
Pricing Per request + per GB-second Per vCPU/hour + per GB/hour Per vCPU/hour + per GB/hour (pause charge lower)
Configuration Moderate (triggers, IAM, layers) High (task definitions, clusters, services) Minimal (source → running service)
Load Balancer Not needed (API Gateway or Function URL) ALB/NLB required Built-in (managed HTTPS endpoint)
Custom Runtime Yes (custom runtime or container) Any container Limited runtimes (Python, Node, Java, Go, .NET, Ruby) or container

AWS Lambda

  • Event-driven, function-as-a-service – runs code in response to events (S3, API Gateway, SQS, DynamoDB, etc.).
  • Scales automatically to thousands of concurrent executions per second.
  • Pay only for execution time – billed per millisecond with a generous free tier (1M requests/month).
  • 15-minute maximum execution – not suitable for long-running processes.
  • Supports Provisioned Concurrency to eliminate cold starts for latency-sensitive workloads.
  • Lambda@Edge and CloudFront Functions for edge computing.
  • Best for: event processing, API backends, data transformations, scheduled tasks, and glue logic.

AWS Fargate

  • Serverless compute for containers – runs Docker containers without managing EC2 instances.
  • Works with ECS or EKS as the orchestration layer.
  • No duration limit – suitable for long-running services, background workers, batch jobs.
  • Full VPC networking – each task gets its own ENI with security group control.
  • Supports persistent storage via EFS for shared file systems.
  • Fargate Spot for up to 70% savings on fault-tolerant workloads.
  • Best for: microservices, APIs needing persistent connections, background processing, and containerized applications.

AWS App Runner

  • Fully managed service – goes from source code or container to running web service in minutes.
  • Zero infrastructure configuration – no load balancer, no cluster, no task definition to manage.
  • Provides a managed HTTPS endpoint with automatic TLS certificate.
  • Auto-deploys from GitHub repository or ECR on code push.
  • Can pause instances when no traffic – lower cost than Fargate minimum.
  • VPC Connector for private resource access (RDS, ElastiCache, etc.).
  • Best for: web applications, REST APIs, and teams that want the simplest deployment experience.

When to Choose Which

  • Choose Lambda – Short-lived event processing (<15 min), API backends with variable traffic, integrations between AWS services, cost optimization for sporadic workloads.
  • Choose Fargate – Long-running containers, persistent connections (WebSocket, gRPC), complex multi-container applications, need full ECS/EKS ecosystem.
  • Choose App Runner – Simple web apps/APIs, want fastest time-to-deploy, team unfamiliar with containers/orchestration, predictable HTTP request-response workloads.

AWS Certification Exam Practice Questions

  1. A startup needs to deploy a REST API that handles unpredictable traffic spikes (0 to thousands of requests per second) and they want to pay nothing during idle periods. Which service is most cost-effective?
    1. App Runner
    2. Fargate with ECS
    3. Lambda with API Gateway
    4. EC2 with Auto Scaling
  2. A team needs to run a containerized web application that maintains WebSocket connections and requires persistent EFS storage. Which compute option supports this?
    1. Lambda
    2. Fargate with ECS
    3. App Runner
    4. Lambda with container image
  3. A developer wants to deploy a Python web application directly from a GitHub repository with automatic HTTPS, auto-scaling, and zero infrastructure management. Which service requires the least configuration?
    1. Lambda with API Gateway
    2. Fargate with ALB
    3. App Runner
    4. Elastic Beanstalk

Related Posts

References

AWS Lambda Developer Guide

AWS Fargate Documentation

AWS App Runner Developer Guide

AWS Aurora vs RDS – Performance & Cost Comparison

AWS Aurora vs RDS – MySQL & PostgreSQL

  • Both Amazon Aurora and Amazon RDS provide managed relational database services, but Aurora is AWS’s cloud-native redesign with significantly different architecture.
  • Aurora is compatible with MySQL and PostgreSQL but delivers up to 5x throughput of MySQL and 3x of PostgreSQL on the same hardware.
  • Choice depends on performance requirements, cost tolerance, and need for advanced features like global databases or serverless scaling.

Aurora vs RDS Comparison

Feature Amazon Aurora Amazon RDS (MySQL/PostgreSQL)
Architecture Cloud-native, distributed storage (6 copies across 3 AZs) Traditional architecture with EBS-based storage
Performance 5x MySQL, 3x PostgreSQL throughput Standard MySQL/PostgreSQL performance
Storage Auto-scales 10GB to 128TB, no pre-provisioning Provision up to 64TB (gp3/io1/io2)
Replication Up to 15 read replicas, millisecond lag Up to 15 read replicas, seconds to minutes lag
Failover <30 seconds (shared storage, no data sync needed) 60-120 seconds (Multi-AZ DNS failover)
Multi-AZ Built-in (storage spans 3 AZs by default) Synchronous standby in separate AZ
Global Database Yes – cross-region replication <1 second lag Cross-region read replicas (minutes lag)
Serverless Aurora Serverless v2 (scales to 0 ACU, instant scaling) Not available
Backtrack Yes – rewind DB to any point in seconds (MySQL only) Not available (use point-in-time restore)
Cloning Fast clone using copy-on-write (seconds, no storage cost initially) Snapshot restore (minutes to hours)
Blue/Green Deployments Yes Yes
Cost ~20-30% more than RDS for same instance size Lower base cost
I/O Cost Standard (pay per I/O) or I/O-Optimized (included) Included in storage (gp3) or provisioned (io1/io2)
Max Connections Higher (optimized connection handling) Based on instance memory
Engines MySQL-compatible, PostgreSQL-compatible only MySQL, PostgreSQL, MariaDB, Oracle, SQL Server

When to Choose Aurora

  • High availability is critical – built-in 6-way replication across 3 AZs, <30s failover.
  • Read-heavy workloads – up to 15 read replicas with millisecond replication lag.
  • Variable workloads – Aurora Serverless v2 scales compute automatically (even to zero).
  • Global applications – Aurora Global Database provides <1 second cross-region replication.
  • Development/testing – fast cloning creates copies in seconds without additional storage cost.
  • Need to undo mistakes quickly – Backtrack rewinds the database without restoring from backup.

When to Choose RDS

  • Cost-sensitive workloads – 20-30% cheaper for equivalent instance sizes.
  • Non-MySQL/PostgreSQL engines – need Oracle, SQL Server, or MariaDB.
  • Simple workloads – don’t need Aurora’s advanced features (global DB, serverless, cloning).
  • Lift-and-shift migrations – exact MySQL/PostgreSQL compatibility without Aurora-specific behavior.
  • Predictable I/O costs – gp3 storage includes I/O in the storage price.

AWS Certification Exam Practice Questions

  1. A company needs a database that automatically scales storage and provides sub-second failover with no data loss. Which service meets these requirements?
    1. RDS MySQL with Multi-AZ
    2. Amazon Aurora
    3. RDS PostgreSQL with read replicas
    4. DynamoDB
  2. A developer accidentally ran a DELETE query on a production Aurora MySQL database 5 minutes ago. What is the fastest recovery method?
    1. Restore from automated backup
    2. Point-in-time recovery
    3. Aurora Backtrack
    4. Promote a read replica
  3. A global application needs a relational database with cross-region read access and less than 1 second replication lag for disaster recovery. Which solution is appropriate?
    1. RDS with cross-region read replicas
    2. Aurora Global Database
    3. DynamoDB Global Tables
    4. RDS Multi-AZ with manual failover
  4. A startup has unpredictable traffic – sometimes zero users, sometimes thousands. They need a relational database that scales to zero during idle periods to minimize costs. What should they use?
    1. RDS with Scheduled Scaling
    2. Aurora Provisioned with Auto Scaling
    3. Aurora Serverless v2
    4. DynamoDB On-Demand

Related Posts

References

Amazon Aurora User Guide

Amazon RDS User Guide

AWS ECS vs EKS vs Fargate

AWS ECS vs EKS vs Fargate

  • AWS provides multiple container orchestration options: ECS (AWS-native), EKS (managed Kubernetes), and Fargate (serverless compute engine for containers).
  • Fargate is not an orchestrator – it is a launch type/compute engine that works with both ECS and EKS to eliminate server management.
  • Choice depends on Kubernetes requirement, operational complexity tolerance, and existing expertise.

ECS vs EKS vs Fargate Comparison

Feature ECS EKS Fargate
Type Container orchestrator Managed Kubernetes Serverless compute engine
Orchestration AWS-proprietary Kubernetes (open source) Works with ECS or EKS
Server Management EC2 launch type: you manage; Fargate: serverless Managed nodes, self-managed nodes, or Fargate No servers to manage
Learning Curve Low (AWS-native concepts) High (Kubernetes knowledge required) Lowest (just define task/pod)
Portability AWS-only Multi-cloud, on-premises (EKS Anywhere) AWS-only
Pricing No control plane charge; pay for EC2/Fargate $0.10/hour per cluster + compute Pay per vCPU and memory per second
Scaling Service Auto Scaling HPA, VPA, Karpenter, Cluster Autoscaler Automatic (no capacity planning)
Networking awsvpc, bridge, host modes VPC CNI (pod-level IPs) awsvpc (ENI per task/pod)
Service Mesh AWS App Mesh, VPC Lattice Istio, Linkerd, App Mesh, VPC Lattice Supported with both
CI/CD CodePipeline, CodeDeploy (blue/green) ArgoCD, Flux, CodePipeline Same as orchestrator used
Persistent Storage EFS, EBS (EC2 only) EBS CSI, EFS CSI, FSx EFS only (no EBS)
GPU Support EC2 launch type only Yes (managed/self-managed nodes) No
Windows Containers Yes Yes Yes (ECS Fargate, limited on EKS)

Amazon ECS – Elastic Container Service

  • AWS-native container orchestrator – deeply integrated with AWS services (IAM, CloudWatch, ALB, VPC).
  • Task Definition defines containers, resources, networking, and IAM roles (similar to a pod spec).
  • Two launch types: EC2 (you manage instances) and Fargate (serverless).
  • No control plane cost – unlike EKS, there is no hourly charge for the ECS cluster.
  • Native blue/green deployments with CodeDeploy integration.
  • Service Connect for simplified service-to-service communication.
  • Capacity Providers for automatic EC2 scaling and Fargate Spot integration.
  • Best for teams that want container orchestration without Kubernetes complexity.

Amazon EKS – Elastic Kubernetes Service

  • Fully managed Kubernetes – runs upstream Kubernetes, certified conformant.
  • Three compute options: Managed Node Groups, Self-Managed Nodes, and Fargate.
  • Supports the full Kubernetes ecosystem – Helm, Karpenter, Istio, ArgoCD, Prometheus, etc.
  • EKS Anywhere – run Kubernetes on-premises with AWS management tools.
  • EKS Auto Mode – AWS manages nodes, scaling, and upgrades automatically.
  • Multi-cluster management with EKS Connector for hybrid environments.
  • Control plane costs $0.10/hour ($72/month) per cluster.
  • Best for teams with Kubernetes expertise or multi-cloud/portability requirements.

AWS Fargate

  • Serverless compute engine for containers – no EC2 instances to provision or manage.
  • Works with both ECS and EKS as the compute layer.
  • Per-task/pod pricing – pay only for vCPU and memory resources consumed per second.
  • No over-provisioning – no idle EC2 capacity to pay for.
  • Each task/pod runs in its own isolated kernel runtime environment (microVM via Firecracker).
  • Fargate Spot – up to 70% discount for fault-tolerant workloads.
  • Limitations: no GPU support, no EBS volumes, no daemon sets, no privileged containers.
  • Best for variable workloads, teams wanting zero infrastructure management, and security-sensitive workloads needing isolation.

When to Choose Which

  • Choose ECS + EC2 – Need GPU, full control over instances, cost optimization with Reserved Instances/Savings Plans, or Windows containers with EBS.
  • Choose ECS + Fargate – Want simplest container experience, no Kubernetes, variable workloads, minimal ops overhead.
  • Choose EKS – Need Kubernetes compatibility, multi-cloud portability, rich ecosystem (Istio, ArgoCD, Helm), or team already knows Kubernetes.
  • Choose EKS + Fargate – Kubernetes workloads that don’t need GPU or DaemonSets and want serverless node management.

AWS Certification Exam Practice Questions

  1. A startup wants to run containers with minimal operational overhead and no server management. They don’t need Kubernetes and have variable traffic patterns. Which combination is recommended?
    1. EKS with Managed Node Groups
    2. ECS with Fargate
    3. ECS with EC2
    4. EKS with Fargate
  2. A company needs to run the same containerized workloads both on AWS and in their on-premises data center using the same tooling. Which service supports this?
    1. ECS Anywhere
    2. EKS Anywhere
    3. Fargate
    4. AWS Outposts with ECS
  3. An ML training workload requires GPU instances for container-based processing. Which option supports this?
    1. ECS with Fargate
    2. EKS with Fargate
    3. EKS with Managed Node Groups (GPU AMI)
    4. Fargate Spot
  4. A team needs to minimize container orchestration costs for development environments with intermittent usage. Which approach is most cost-effective?
    1. EKS with On-Demand Managed Node Groups
    2. ECS with Reserved EC2 instances
    3. ECS with Fargate Spot
    4. EKS with self-managed Spot instances

Related Posts

References

Amazon ECS Developer Guide

Amazon EKS User Guide

AWS Fargate Documentation

AWS SQS vs SNS vs EventBridge

AWS SQS vs SNS vs EventBridge

  • AWS provides multiple messaging and event-driven services for decoupling application components.
  • SQS is a message queue for point-to-point communication, SNS is a pub/sub notification service, and EventBridge is a serverless event bus for event-driven architectures.
  • These services are often used together but serve different purposes.

SQS vs SNS vs EventBridge Comparison

Feature SQS SNS EventBridge
Pattern Queue (point-to-point) Pub/Sub (fan-out) Event Bus (event-driven)
Delivery Pull-based (consumers poll) Push-based (pushes to subscribers) Push-based (routes to targets)
Consumers Single consumer per message Multiple subscribers Multiple targets per rule
Filtering No native filtering Message attribute filtering Content-based filtering (event patterns)
Retention 1 min to 14 days (default 4 days) No retention (immediate delivery) No retention (replay via archive up to indefinite)
Ordering FIFO queue guarantees order FIFO topic with SQS FIFO No ordering guarantee
Throughput Standard: unlimited; FIFO: 3,000 msg/sec (batching) Standard: unlimited; FIFO: 300 msg/sec Default: varies by region, scalable
Dead Letter Queue Yes Yes (for failed deliveries) Yes (DLQ on target failures)
Targets/Subscribers Consumer applications SQS, Lambda, HTTP/S, Email, SMS, Kinesis Firehose 200+ AWS services, APIs, SaaS apps
Event Sources Producers send messages Producers publish messages 90+ AWS services, custom apps, SaaS partners
Schema No schema enforcement No schema enforcement Schema Registry with discovery
Replay No (message deleted after processing) No Yes (Event Archive and Replay)
Cross-account Yes (resource policy) Yes (resource policy) Yes (cross-account event bus)
Scheduling Delay queues (up to 15 min) No Yes (EventBridge Scheduler)

Amazon SQS – Simple Queue Service

  • Fully managed message queue for decoupling producers from consumers.
  • Standard Queue – at-least-once delivery, best-effort ordering, unlimited throughput.
  • FIFO Queue – exactly-once processing, strict ordering, up to 3,000 msg/sec with batching.
  • Messages are retained up to 14 days – acts as a buffer for traffic spikes.
  • Supports visibility timeout to prevent multiple consumers processing the same message.
  • Supports long polling to reduce empty receives and costs.
  • Dead Letter Queue (DLQ) for messages that fail processing after max retries.
  • Integrates natively with Lambda (event source mapping) for serverless processing.

Amazon SNS – Simple Notification Service

  • Fully managed pub/sub service for fan-out messaging to multiple subscribers.
  • A single message published to a topic is delivered to all subscribers simultaneously.
  • Supports multiple protocols – SQS, Lambda, HTTP/S, Email, SMS, Kinesis Data Firehose, mobile push.
  • Message filtering – subscribers can set filter policies on message attributes to receive only relevant messages.
  • FIFO topics – strict ordering and deduplication when paired with SQS FIFO queues.
  • Fan-out pattern – SNS + multiple SQS queues for parallel processing of the same event.
  • Supports message encryption (SSE-KMS) and cross-account subscriptions.

Amazon EventBridge

  • Serverless event bus for building event-driven architectures at scale.
  • Receives events from 90+ AWS services automatically (no configuration needed).
  • Content-based filtering with event patterns – filter on any field in the event JSON body.
  • Routes events to 200+ AWS service targets including Lambda, Step Functions, API Gateway, SQS, SNS.
  • Schema Registry – automatically discovers and stores event schemas for code generation.
  • Event Archive and Replay – store events indefinitely and replay them for debugging or reprocessing.
  • EventBridge Scheduler – create one-time or recurring schedules (replaces CloudWatch Events cron).
  • EventBridge Pipes – point-to-point integration between sources and targets with filtering, enrichment, and transformation.
  • SaaS partner integrations – receive events from Zendesk, Datadog, Auth0, Shopify, etc.
  • Global endpoints – automatic failover to a secondary region for high availability.

When to Choose Which

  • Choose SQS – Decouple a producer from a single consumer, buffer traffic spikes, guarantee message processing with retries, maintain message ordering (FIFO).
  • Choose SNS – Fan-out a message to multiple subscribers simultaneously, send notifications (email/SMS), simple pub/sub without complex routing.
  • Choose EventBridge – React to AWS service events, route events based on content to different targets, integrate with SaaS applications, need schema discovery, event replay, or scheduling.
  • Combine SNS + SQS – Fan-out pattern where each subscriber needs independent processing with buffering and retry.
  • Combine EventBridge + SQS – Route events to SQS for buffered, reliable processing with backpressure handling.

AWS Certification Exam Practice Questions

  1. A company needs to process orders where each order must be processed exactly once and in the order received. Which service and configuration is most appropriate?
    1. SNS Standard topic
    2. SQS FIFO queue
    3. EventBridge with ordering
    4. SQS Standard queue
  2. An application needs to fan out a single event to three different microservices for parallel processing, each requiring independent retry logic. Which architecture is recommended?
    1. EventBridge with three targets
    2. SQS with three consumers
    3. SNS topic with three SQS queue subscriptions
    4. Three separate SQS queues with direct publishing
  3. A team needs to automatically trigger a Lambda function whenever an S3 object is created, an EC2 instance changes state, or a CodePipeline deployment fails. Which service requires the least configuration?
    1. SNS with S3 event notifications
    2. SQS with CloudWatch Events
    3. EventBridge (receives AWS events automatically)
    4. Lambda with direct triggers
  4. A SaaS application needs to react to events from Shopify and route them to different Lambda functions based on the event type (order_created vs order_cancelled). Which service is best suited?
    1. SNS with message filtering
    2. SQS with message attributes
    3. EventBridge with event pattern rules
    4. API Gateway with Lambda
  5. After a production incident, a team needs to replay all events from the past 7 days to reprocess failed orders. Which service supports this natively?
    1. SQS (messages already consumed)
    2. SNS (no retention)
    3. EventBridge (Archive and Replay)
    4. Kinesis Data Streams

Related Posts

References

Amazon SQS Developer Guide

Amazon SNS Developer Guide

Amazon EventBridge User Guide

AWS ELB – ALB vs NLB vs GWLB Comparison

AWS ELB – ALB vs NLB vs GWLB

  • AWS Elastic Load Balancing (ELB) automatically distributes incoming application traffic across multiple targets.
  • AWS offers four types of load balancers: Application Load Balancer (ALB), Network Load Balancer (NLB), Gateway Load Balancer (GWLB), and Classic Load Balancer (CLB – deprecated).
  • Choosing the right load balancer depends on the use case – Layer 7 routing, ultra-low latency, or third-party appliance integration.

ALB vs NLB vs GWLB Comparison

Feature ALB NLB GWLB
OSI Layer Layer 7 (HTTP/HTTPS) Layer 4 (TCP/UDP/TLS) Layer 3 (IP)
Use Case Web apps, microservices, content-based routing Ultra-low latency, static IP, TCP/UDP traffic Third-party virtual appliances (firewalls, IDS/IPS)
Routing Host, path, header, query string, HTTP method, source IP Port-based Transparent (GENEVE encapsulation)
Performance Handles millions of requests/sec Millions of requests/sec with ultra-low latency High throughput for appliance traffic
Static IP No (use Global Accelerator for fixed IPs) Yes (one static IP per AZ, Elastic IP supported) No
Preserve Source IP Via X-Forwarded-For header Yes (natively preserved) Yes (GENEVE encapsulation)
SSL/TLS Termination Yes Yes (TLS listener) No
WebSocket Yes Yes No
Target Types Instance, IP, Lambda Instance, IP, ALB Instance, IP
Health Checks HTTP, HTTPS TCP, HTTP, HTTPS TCP, HTTP, HTTPS
Cross-zone LB Enabled by default Disabled by default Disabled by default
Sticky Sessions Yes (cookie-based) Yes (source IP based) Yes (5-tuple/3-tuple/2-tuple)
PrivateLink Support No Yes Yes (via GWLB endpoints)
Mutual TLS (mTLS) Yes No No

Application Load Balancer – ALB

  • Operates at Layer 7 (HTTP/HTTPS) and is best suited for web applications.
  • Supports content-based routing – routes requests based on URL path, host header, HTTP headers, query strings, HTTP method, and source IP.
  • Supports multiple target groups per listener with weighted routing for blue/green and canary deployments.
  • Native integration with AWS WAF for web application security.
  • Supports authentication – integrates with Amazon Cognito and OIDC-compliant identity providers.
  • Supports Lambda functions as targets for serverless architectures.
  • Provides detailed access logs and integration with CloudWatch metrics.
  • Supports HTTP/2 and gRPC protocols.
  • Supports mutual TLS (mTLS) for client certificate authentication.
  • Supports fixed response actions and redirect actions at the listener level.

Network Load Balancer – NLB

  • Operates at Layer 4 (TCP/UDP/TLS) and handles millions of requests per second with ultra-low latency.
  • Provides a static IP address per Availability Zone and supports Elastic IP assignment.
  • Preserves source IP natively – no X-Forwarded-For header needed for TCP traffic.
  • Supports AWS PrivateLink – expose services to other VPCs or AWS accounts privately.
  • Can target ALB as a target – combines NLB’s static IP with ALB’s Layer 7 routing.
  • Supports TLS termination and centralized certificate management via ACM.
  • Handles volatile workloads and sudden traffic spikes without pre-warming.
  • Supports long-lived TCP connections – ideal for IoT, gaming, and real-time applications.
  • Supports UDP for DNS, SIP, and IoT protocols.
  • Cross-zone load balancing disabled by default – enable for even distribution across AZs.

Gateway Load Balancer – GWLB

  • Operates at Layer 3 (IP packets) using GENEVE protocol for transparent network traffic inspection.
  • Designed for deploying third-party virtual appliances – firewalls, IDS/IPS, deep packet inspection.
  • Creates a single entry/exit point for all traffic using Gateway Load Balancer Endpoints (GWLBe).
  • Traffic is transparently routed through appliances – source and destination IPs preserved.
  • Uses 5-tuple flow stickiness (source IP, dest IP, protocol, source port, dest port) by default.
  • Scales horizontally – automatically distributes traffic across multiple appliance instances.
  • Supports cross-VPC inspection via AWS PrivateLink (GWLBe in service consumer VPC).
  • Integrates with AWS Marketplace appliances from vendors like Palo Alto, Fortinet, Check Point.

When to Choose Which

  • Choose ALB – Web applications, microservices needing URL-based routing, gRPC, Lambda targets, OIDC authentication, WAF integration.
  • Choose NLB – TCP/UDP applications requiring ultra-low latency, static IPs, PrivateLink exposure, gaming/IoT, volatile traffic patterns.
  • Choose GWLB – Network traffic inspection via third-party appliances, centralized firewall deployments, compliance-driven packet inspection.
  • Combine NLB + ALB – When you need both static IPs (NLB) and content-based routing (ALB), use NLB with ALB as a target.

AWS Certification Exam Practice Questions

  1. A company needs to expose a microservices application that routes traffic based on URL paths and requires integration with AWS WAF. Which load balancer should they use?
    1. Network Load Balancer
    2. Application Load Balancer
    3. Gateway Load Balancer
    4. Classic Load Balancer
  2. An application requires a static IP address for whitelisting by partner organizations while maintaining ultra-low latency for TCP traffic. Which load balancer is most appropriate?
    1. Application Load Balancer with Global Accelerator
    2. Network Load Balancer
    3. Gateway Load Balancer
    4. Classic Load Balancer
  3. A security team needs to route all VPC traffic through a centralized fleet of third-party firewall appliances for deep packet inspection. Which AWS service should they use?
    1. Network Load Balancer
    2. AWS Network Firewall
    3. Gateway Load Balancer
    4. Application Load Balancer with AWS WAF
  4. A company needs to provide a service to multiple AWS accounts privately, with clients connecting using a static IP. Which combination is required?
    1. ALB + VPC Peering
    2. NLB + AWS PrivateLink
    3. GWLB + Transit Gateway
    4. ALB + Global Accelerator
  5. An architect needs to implement a blue/green deployment strategy with weighted routing between two versions of an application. Which load balancer feature supports this?
    1. NLB with multiple target groups
    2. ALB with weighted target groups
    3. GWLB with flow stickiness
    4. NLB with cross-zone load balancing

Related Posts

References

AWS Elastic Load Balancing User Guide

AWS ELB Features Comparison

Amazon DynamoDB Backup and Restore

DynamoDB Backup and Restore

  • DynamoDB Backup and Restore provides fully automated on-demand backup, restore, and point-in-time recovery for data protection and archiving.
  • On-demand backup allows the creation of full backups of DynamoDB table for data archiving, helping you meet corporate and governmental regulatory requirements.
  • Point-in-time recovery (PITR) provides continuous backups of your DynamoDB table data with per-second granularity.
  • All backups are automatically encrypted, cataloged, and easily discoverable.
  • Backups can be created for tables from a few megabytes to hundreds of terabytes of data, with no impact on performance and availability of production applications.

On-demand Backups

  • DynamoDB on-demand backup helps create full backups of the tables for long-term retention, and archiving for regulatory compliance needs.
  • On-demand backups create a snapshot of the table that DynamoDB stores and manages.
  • Backup and restore actions run with no impact on table performance or availability.
  • Backups process in seconds regardless of the size of the tables.
  • Backups are preserved regardless of table deletion and retained until they are explicitly deleted.
  • On-demand backups are cataloged, and discoverable.
  • Charged based on the size and duration of the backups.
  • Can restore the entire DynamoDB table to the exact state it was in when the backup was created.

Creating On-demand Backups

  • On-demand backups can be created using two methods:

DynamoDB Native Backup

  • Can be used to backup and restore DynamoDB tables.
  • Create backups via AWS Management Console, AWS CLI, or API.
  • Limitation: DynamoDB on-demand backups cannot be copied to a different account or Region.
  • Suitable for simple backup and restore within the same account and region.

AWS Backup (Recommended)

  • AWS Backup is a fully managed data protection service that makes it easy to centralize and automate backups across AWS services, in the cloud, and on-premises.
  • Provides enhanced backup features beyond native DynamoDB backups.
  • Key Advantages:
    • Centralized Management: Configure backup schedules & policies and monitor activity for AWS resources and on-premises workloads in one place.
    • Cross-Region Backup: Copy on-demand backups across AWS Regions.
    • Cross-Account Backup: Copy on-demand backups across AWS accounts (requires enabling advanced features).
    • Independent Encryption: Encryption using an AWS KMS key that is independent of the DynamoDB table encryption key.
    • Vault Lock (WORM): Apply write-once-read-many (WORM) setting for backups using AWS Backup Vault Lock policy for compliance.
    • Cost Allocation Tags: Add cost allocation tags to on-demand backups for better cost tracking.
    • Cold Storage Tier: Transition on-demand backups to cold storage for lower costs (requires opting in to advanced features).
    • Automated Backup Plans: Create scheduled backup plans with retention policies.

Cross-Region and Cross-Account Restore

  • DynamoDB table data can be restored across AWS Regions such that the restored table is created in a different Region from where the source table resides.
  • Cross-Region restores are supported between:
    • AWS commercial Regions
    • AWS China Regions
    • AWS GovCloud (US) Regions
  • Cross-Account Backup and Restore: Using AWS Backup, backups can be copied across AWS accounts for disaster recovery or data migration scenarios.
  • Pricing: Pay for data transfer out of the source Region and for restoring to a new table in the destination Region.

PITR – Point-In-Time Recovery

  • DynamoDB point-in-time recovery – PITR enables automatic, continuous, incremental backup of the table with per-second granularity.
  • PITR backups are fully managed by DynamoDB.
  • PITR helps protect against accidental writes and deletes.
  • PITR can back up tables with hundreds of terabytes of data with no impact on the performance or availability of the production applications.

Configurable Recovery Period (January 2025)

  • Announced in January 2025, DynamoDB now supports a configurable recovery period for PITR.
  • Recovery period can be set to any value between 1 and 35 days on a per-table basis.
  • Default: Recovery period is 35 days if not explicitly configured.
  • Can restore to any given second from within the configured recovery period.
  • Use Cases:
    • Shorter retention (e.g., 7 days) for cost optimization when long-term recovery is not needed.
    • Compliance requirements that mandate specific retention periods.
    • Development/test environments where shorter recovery windows are acceptable.
  • Pricing Impact: Shortening the recovery period has no impact on PITR pricing because the price is based on the size of table and local secondary indexes, not the retention period.

PITR Restore Capabilities

  • Can restore to any point in time between EarliestRestorableDateTime and LatestRestorableDateTime.
  • LatestRestorableDateTime is typically five minutes before the current time.
  • PITR-enabled tables that were deleted can be recovered in the preceding 35 days (or configured retention period) and restored to their state just before they were deleted.
  • Restored table is created as a new, independent table (not part of the original global table if applicable).

PITR with Global Tables

  • Can enable point-in-time recovery on each local replica of a global table.
  • When restoring a global table replica, the backup restores to an independent table that is not part of the global table.
  • If using Global Tables version 2019.11.21 (Current), a new global table can be created from the restored table.

PITR Considerations

  • If PITR is disabled and later re-enabled on a table, the start time for recovery is reset.
  • After re-enabling, can only immediately restore using the LatestRestorableDateTime.
  • AWS CloudTrail logs all console and API actions for PITR for auditing and compliance.
  • PITR can be enabled or disabled at any time without impacting table performance.

Backup and Restore Best Practices

  • Use AWS Backup for Production: Leverage AWS Backup for centralized management, cross-region/cross-account capabilities, and advanced features.
  • Enable PITR for Critical Tables: Always enable PITR for production tables to protect against accidental data loss.
  • Configure Appropriate Retention: Set PITR retention period based on recovery requirements and compliance needs.
  • Test Restore Procedures: Regularly test backup restoration to ensure recovery processes work as expected.
  • Use Vault Lock for Compliance: Apply AWS Backup Vault Lock for immutable backups when required by regulations.
  • Implement Cross-Region Backups: Copy critical backups to another region for disaster recovery.
  • Tag Backups: Use cost allocation tags to track backup costs by project, environment, or team.
  • Automate Backup Plans: Create scheduled backup plans with AWS Backup for consistent data protection.
  • Monitor Backup Status: Use CloudWatch and AWS Backup monitoring to track backup success and failures.
  • Consider Cold Storage: Transition long-term backups to cold storage tier for cost savings.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A sysops engineer must create nightly backups of an Amazon DynamoDB table. Which backup methodology should the database specialist use to MINIMIZE management overhead?
    1. Install the AWS CLI on an Amazon EC2 instance. Write a CLI command that creates a backup of the DynamoDB table. Create a scheduled job or task that runs the command on a nightly basis.
    2. Create an AWS Lambda function that creates a backup of the DynamoDB table. Create an Amazon CloudWatch Events rule that runs the Lambda function on a nightly basis.
    3. Create a backup plan using AWS Backup, specify a backup frequency of every 24 hours, and give the plan a nightly backup window.
    4. Configure DynamoDB backup and restore for an on-demand backup frequency of every 24 hours.
  2. A company needs to copy DynamoDB table backups to a different AWS account for disaster recovery purposes. What is the BEST solution?
    1. Use DynamoDB native backup and manually export/import data to the other account.
    2. Use AWS Backup to create backups and copy them across accounts after enabling advanced features and cross-account backup.
    3. Enable PITR and restore the table in the other account.
    4. Use AWS Data Pipeline to copy data between accounts.
  3. A company wants to protect a DynamoDB table against accidental deletions with the ability to recover data from any point in the last 7 days. What should a solutions architect recommend?
    1. Create daily on-demand backups and retain them for 7 days.
    2. Enable PITR with a recovery period configured to 7 days.
    3. Use AWS Backup with a 7-day retention policy.
    4. Enable DynamoDB Streams and store data in S3 for 7 days.
  4. A company needs to restore a DynamoDB table to a different AWS Region. The table is currently in us-east-1 and needs to be restored to eu-west-1. What is the correct approach?
    1. Enable PITR and restore directly to eu-west-1.
    2. Use DynamoDB native backup and restore to eu-west-1.
    3. Create a backup and perform a cross-Region restore to eu-west-1.
    4. Create a Global Table with a replica in eu-west-1.
  5. A company has enabled PITR on a DynamoDB table with a 35-day retention period. They want to reduce costs by shortening the retention to 14 days. What will be the impact on PITR pricing?
    1. PITR costs will be reduced by approximately 60%.
    2. PITR costs will be reduced proportionally to the retention period.
    3. There will be no impact on PITR pricing as it is based on table size, not retention period.
    4. PITR costs will increase due to more frequent backup cycles.
  6. Which of the following are advantages of using AWS Backup over DynamoDB native backups? (Select THREE)
    1. Cross-account backup and restore capabilities
    2. Faster backup creation time
    3. Ability to transition backups to cold storage tier
    4. Lower backup storage costs
    5. Centralized backup management across multiple AWS services
    6. Automatic PITR enablement
  7. A DynamoDB table with PITR enabled was accidentally deleted. How long does the company have to recover the table?
    1. 7 days from deletion
    2. 24 hours from deletion
    3. Up to 35 days (or the configured retention period) from deletion
    4. PITR cannot recover deleted tables

References