Amazon Bedrock – Generative AI Service

🎓 Build AI Skills with Google
Learn practical AI skills and earn a Google Certificate. No experience required – learn at your own pace.
Start the Google AI Essentials Learning Path →

Amazon Bedrock Overview

  • Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI companies through a single API.
  • Bedrock enables building and scaling generative AI applications without managing infrastructure or training models from scratch.
  • All data remains private — Bedrock does NOT use customer data to train or improve base models.
  • Supports text generation, image generation, embeddings, chat, and multi-modal use cases.

Foundation Models

  • Amazon — Nova (Micro, Lite, Pro, Premier), Titan (Text, Embeddings, Image, Multimodal)
  • Anthropic — Claude (Haiku, Sonnet, Opus) family
  • Meta — Llama 3.x and Llama 4 models
  • Mistral AI — Mistral Large, Mistral Small
  • Cohere — Command R, Command R+, Embed
  • AI21 Labs — Jamba models
  • Stability AI — Stable Diffusion (image generation)
  • DeepSeek — DeepSeek-R1 (reasoning model)
  • Models are accessed via the InvokeModel API — no need to provision instances.
  • Cross-Region Inference — automatically routes requests to available regions for higher throughput.
  • Inference Profiles — predefined configurations for consistent model behavior.

Amazon Bedrock Agents

  • Build autonomous AI agents that can plan, orchestrate, and execute multi-step tasks.
  • Agents can invoke APIs, query databases, and interact with enterprise systems.
  • Action Groups — define what actions an agent can take (Lambda functions, API schemas).
  • Knowledge Bases — give agents access to company data for RAG (Retrieval-Augmented Generation).
  • Multi-agent collaboration — agents can delegate tasks to other specialized agents.
  • Return of control — pause agent execution and return control to the application for human-in-the-loop workflows.
  • Code Interpreter — agents can generate and execute code to perform calculations and data analysis.
  • Memory — agents retain context across conversations for personalized interactions.

Amazon Bedrock Knowledge Bases

  • Implements Retrieval-Augmented Generation (RAG) — connects FMs to company data sources.
  • Data sources: S3, Confluence, SharePoint, Salesforce, Web Crawler, custom connectors.
  • Vector stores: OpenSearch Serverless, Aurora PostgreSQL, Pinecone, Redis Enterprise, MongoDB Atlas, Neptune Analytics.
  • Chunking strategies: Fixed-size, semantic, hierarchical, no chunking.
  • Parsing: Built-in parsers for PDF, Word, HTML, Markdown, CSV, Excel.
  • Advanced RAG:
    • Metadata filtering — filter results by document attributes
    • Hybrid search — combines semantic + keyword search
    • Re-ranking — uses a re-ranker model to improve result relevance
    • Query decomposition — breaks complex queries into sub-queries
  • GraphRAG — uses knowledge graphs (Neptune) for relationship-aware retrieval.

Amazon Bedrock Guardrails

  • Implement safeguards for generative AI applications — works with any FM on Bedrock or custom models.
  • Content filters — block harmful content categories (hate, insults, sexual, violence, misconduct) with configurable thresholds.
  • Denied topics — define topics the model should refuse to discuss.
  • Word filters — block specific words, phrases, or profanity.
  • Sensitive information filters (PII) — detect and redact/mask PII (names, SSN, credit cards, etc.).
  • Contextual grounding checks — detect hallucinations by verifying responses against source material.
  • Automated Reasoning checks — uses formal logic to validate factual accuracy.
  • Guardrails can be applied to both inputs and outputs.
  • Works with Bedrock Agents, Knowledge Bases, and direct InvokeModel calls.
  • ApplyGuardrail API — apply guardrails to any text, even outside Bedrock.

Amazon Bedrock Model Customization

  • Fine-tuning — train a model on your specific data to improve performance for your use case.
  • Continued Pre-training — train a model on domain-specific unlabeled data for deeper domain knowledge.
  • Model Distillation — transfer capabilities from a larger teacher model to a smaller, faster student model.
  • Custom models are private — only accessible in your account.
  • Training data stored in S3, encrypted with KMS.
  • Provisioned Throughput — purchase dedicated capacity for custom or base models for consistent performance.

Amazon Bedrock Model Evaluation

  • Compare model performance using automatic evaluation (built-in metrics) or human evaluation (human reviewers).
  • Automatic metrics: accuracy, robustness, toxicity, BERTScore, ROUGE.
  • LLM-as-a-judge — use a foundation model to evaluate outputs of other models.
  • Compare multiple models side-by-side for your specific use case.
  • Results stored in S3 for analysis.

Amazon Bedrock Flows

  • Visual workflow builder for creating generative AI pipelines.
  • Chain prompts, knowledge bases, agents, guardrails, and Lambda functions into workflows.
  • Supports conditional branching, parallel execution, and iterative loops.
  • Version and deploy flows independently.

Amazon Bedrock Prompt Management

  • Create, version, and manage prompts centrally.
  • Prompt variables — use placeholders for dynamic content.
  • Prompt Caching — cache context for frequently used long prompts to reduce latency and cost.
  • Intelligent Prompt Routing — automatically routes requests to the optimal model based on prompt complexity.

Amazon Bedrock Studio

  • Web-based playground for non-technical users to build and test generative AI applications.
  • Create projects with shared resources (agents, knowledge bases, guardrails).
  • SSO integration via IAM Identity Center for team collaboration.

Amazon Nova Models

  • Amazon’s own family of foundation models, purpose-built for Bedrock.
  • Nova Micro — text-only, lowest latency, lowest cost (ideal for simple tasks).
  • Nova Lite — multimodal (text, image, video input), fast and cost-effective.
  • Nova Pro — multimodal, best balance of accuracy, speed, and cost.
  • Nova Premier — most capable, best for complex reasoning and agentic workflows.
  • Nova Canvas — image generation with watermark detection.
  • Nova Reel — video generation (up to 6 seconds).
  • Nova Sonic — speech-to-speech model for natural conversations.

Bedrock Security

  • Data privacy — customer data is NOT used to train base models; model inputs/outputs are not shared.
  • Encryption — data encrypted in transit (TLS 1.2+) and at rest (KMS). Customer-managed keys supported.
  • VPC connectivity — access Bedrock via VPC endpoints (PrivateLink) for private network traffic.
  • IAM integration — fine-grained access control with IAM policies, resource-based policies.
  • Model access control — explicitly enable which models are available in your account.
  • CloudTrail logging — all API calls logged for auditing.
  • Model Invocation Logging — log prompts and responses to S3/CloudWatch for compliance.
  • Service Control Policies — restrict Bedrock usage at the organization level.

Bedrock Pricing

  • On-Demand — pay per input/output token (no commitment, most flexible).
  • Batch Inference — up to 50% cheaper for non-time-sensitive workloads.
  • Provisioned Throughput — reserved capacity with committed model units (1-month or 6-month terms).
  • Model Customization — charged per training token processed.
  • Knowledge Bases — charged per storage and retrieval query.
  • Guardrails — charged per 1,000 text units processed.

AWS Certification Exam Practice Questions

  1. A company wants to build a customer support chatbot that can access company documentation stored in S3 to answer questions accurately. Which Bedrock feature should they use?
    1. Fine-tuning
    2. Knowledge Bases (RAG)
    3. Guardrails
    4. Model Evaluation
  2. An organization needs to ensure their generative AI application never discusses competitor products and redacts any PII from responses. Which Bedrock feature provides this?
    1. Knowledge Bases
    2. Model Customization
    3. Guardrails (denied topics + PII filters)
    4. Prompt Management
  3. A startup wants to reduce the latency and cost of their Bedrock application that uses Claude for simple classification tasks. Which approach is most cost-effective?
    1. Provisioned Throughput for Claude
    2. Fine-tune Claude on classification data
    3. Use Intelligent Prompt Routing or switch to Nova Micro
    4. Use Batch Inference
  4. A financial services company requires that all Bedrock API traffic stays within their private network and all prompts/responses are logged for regulatory compliance. Which features should they enable?
    1. CloudTrail + S3 encryption
    2. VPC endpoints (PrivateLink) + Model Invocation Logging
    3. Guardrails + Knowledge Bases
    4. IAM policies + Batch Inference
  5. A company wants an AI agent that can look up customer orders in DynamoDB, check shipping status via an API, and send email notifications. Which Bedrock feature enables this?
    1. Knowledge Bases
    2. Bedrock Flows
    3. Bedrock Agents with Action Groups
    4. Model Customization
  6. An enterprise needs to transfer the capabilities of a large, expensive model to a smaller model for production use to reduce inference costs while maintaining quality. Which feature supports this?
    1. Fine-tuning
    2. Continued Pre-training
    3. Model Distillation
    4. Provisioned Throughput

Related Posts

References

Amazon Bedrock User Guide

Amazon Bedrock Agents

Amazon Bedrock Knowledge Bases

Amazon Bedrock Guardrails

Amazon Bedrock Pricing

AWS WAF vs Shield vs Firewall Manager

AWS WAF vs Shield vs Firewall Manager

  • AWS provides three complementary security services for protecting web applications and AWS resources from attacks.
  • WAF filters web traffic at Layer 7, Shield protects against DDoS attacks at Layer 3/4, and Firewall Manager centrally manages security policies across accounts.
  • These services work together — Shield protects the infrastructure, WAF filters application-layer attacks, and Firewall Manager enforces policies at scale.

WAF vs Shield vs Firewall Manager Comparison

Feature AWS WAF AWS Shield AWS Firewall Manager
Purpose Web application firewall (Layer 7) DDoS protection (Layer 3/4/7) Central security policy management
Protection Layer Layer 7 (HTTP/HTTPS) Standard: L3/4; Advanced: L3/4/7 Manages WAF, Shield, SG, Network Firewall, Route 53 DNS Firewall
Attacks Blocked SQL injection, XSS, bot traffic, rate limiting, geo blocking SYN flood, UDP reflection, volumetric attacks Policy violations across accounts
Scope Per resource (CloudFront, ALB, API Gateway, AppSync, Cognito, App Runner, Verified Access) Per resource (Standard) or account-wide (Advanced) AWS Organization-wide
Pricing Per web ACL + per rule + per million requests Standard: Free; Advanced: $3,000/month + data transfer Per policy per region + per resource
Automatic Rules must be configured Standard: automatic; Advanced: automatic + DRT team Auto-applies policies to new resources
Managed Rules AWS + Marketplace (Fortinet, F5, Imperva) N/A (automatic detection) Enforces WAF rule groups across accounts
Cost Protection No Advanced: credits for scaling costs during DDoS No
Response Team No Advanced: 24/7 Shield Response Team (SRT) No
Prerequisite None Standard: automatic; Advanced: subscription AWS Organizations + WAF/Shield Advanced

AWS WAF (Web Application Firewall)

  • Layer 7 firewall that filters HTTP/HTTPS requests based on configurable rules.
  • Web ACLs contain rules that inspect requests (headers, body, URI, query strings, IP).
  • Rule types:
    • Rate-based rules – block IPs exceeding request threshold (per 5-min window)
    • IP set rules – allow/block specific IP ranges
    • Geo match – allow/block by country
    • String/regex match – inspect request components
    • SQL injection / XSS detection – built-in detection statements
    • Size constraints – block oversized requests
  • Managed Rule Groups:
    • AWS Managed Rules – Core Rule Set, Known Bad Inputs, SQL/Linux/Windows/PHP, Bot Control, Account Takeover Prevention, Account Creation Fraud Prevention
    • Marketplace rules – Fortinet, F5, Imperva, Trend Micro
  • Bot Control – identify and manage bot traffic (common bots, targeted bots, AI scrapers).
  • Account Takeover Prevention (ATP) – detect credential stuffing on login pages.
  • Account Creation Fraud Prevention (ACFP) – prevent fake account creation.
  • CAPTCHA and Challenge – silent browser challenges or visible CAPTCHA.
  • Applies to: CloudFront, ALB, API Gateway, AppSync, Cognito User Pools, App Runner, Verified Access.
  • Best for: Protecting web applications from SQL injection, XSS, bot abuse, brute force, and application-layer attacks.

AWS Shield

  • DDoS protection service – defends against volumetric, protocol, and application-layer attacks.
  • Shield Standard (free, automatic):
    • Automatically protects ALL AWS customers at no extra cost
    • Protects against common Layer 3/4 DDoS attacks (SYN/UDP floods, reflection attacks)
    • Applied to CloudFront, Route 53, and Global Accelerator automatically
  • Shield Advanced ($3,000/month):
    • Enhanced detection for EC2, ELB, CloudFront, Global Accelerator, Route 53
    • DDoS cost protection – credits for Auto Scaling, CloudFront, ELB scaling costs during attacks
    • Shield Response Team (SRT) – 24/7 expert team to assist during attacks
    • Advanced metrics and reporting – real-time visibility into attacks
    • Automatic application-layer mitigation – creates WAF rules automatically during L7 attacks
    • Health-based detection – uses Route 53 health checks for faster detection
    • Proactive engagement – SRT contacts you when health checks trigger
    • Protection applies to all resources in the account (and org with Firewall Manager)
  • Best for: High-profile applications needing DDoS protection guarantees, SLA credits, and 24/7 expert support.

AWS Firewall Manager

  • Central security policy management across AWS Organizations accounts and resources.
  • Manages policies for:
    • AWS WAF – deploy WAF rules across all accounts/resources
    • AWS Shield Advanced – enable protection organization-wide
    • Security Groups – audit and enforce SG rules
    • AWS Network Firewall – deploy firewall rules across VPCs
    • Route 53 Resolver DNS Firewall – enforce DNS filtering
    • Third-party firewalls – Palo Alto, Fortigate via Marketplace
  • Auto-remediation – automatically applies policies to new resources/accounts as they’re created.
  • Compliance dashboard – view which resources are non-compliant across the organization.
  • Prerequisites: AWS Organizations, AWS Config enabled in all accounts.
  • Best for: Multi-account organizations needing consistent security policies, compliance auditing, and automatic enforcement.

How They Work Together

  • Shield Standard (always on) protects infrastructure from volumetric L3/4 DDoS.
  • Shield Advanced adds L7 protection and automatically creates WAF rules during application-layer attacks.
  • WAF handles ongoing application-layer threats (bots, injections, rate limiting).
  • Firewall Manager ensures WAF + Shield Advanced policies are consistently deployed across all accounts and resources.
  • Typical stack: Firewall Manager → Shield Advanced → WAF → Application.

When to Choose Which

  • Every application gets Shield Standard – it’s free and automatic.
  • Add WAF when – you need to filter application-layer traffic (SQL injection, bot management, rate limiting, geo-blocking).
  • Add Shield Advanced when – you need DDoS cost protection, 24/7 SRT support, enhanced detection, and SLA guarantees for business-critical applications.
  • Add Firewall Manager when – you manage multiple AWS accounts and need consistent security policies automatically applied across the organization.

AWS Certification Exam Practice Questions

  1. A company experiences a sudden spike in traffic that appears to be a DDoS attack targeting their ALB. They want AWS experts to help mitigate the attack in real-time. Which service provides this?
    1. AWS WAF with rate-based rules
    2. AWS Shield Advanced (Shield Response Team)
    3. AWS Firewall Manager
    4. AWS Network Firewall
  2. An e-commerce site needs to block SQL injection attacks and limit login attempts to 100 per IP per 5 minutes. Which service provides these capabilities?
    1. AWS Shield Advanced
    2. Security Groups
    3. AWS WAF (SQL injection rule + rate-based rule)
    4. Network ACLs
  3. A large enterprise with 50 AWS accounts needs to ensure every ALB in every account has the same WAF rules applied, including new ALBs created in the future. Which service automates this?
    1. AWS WAF with central web ACL
    2. AWS Shield Advanced
    3. AWS Config Rules
    4. AWS Firewall Manager
  4. After a DDoS attack, a company’s CloudFront and Auto Scaling costs spiked significantly due to the attack traffic. Which service provides credits for these scaling costs?
    1. AWS WAF
    2. AWS Shield Advanced (DDoS cost protection)
    3. AWS Firewall Manager
    4. AWS Trusted Advisor
  5. A company wants to detect and block AI web scrapers from crawling their content while allowing legitimate search engine bots. Which WAF feature addresses this?
    1. Rate-based rules
    2. Geo match rules
    3. AWS WAF Bot Control (AI Content Scraper category)
    4. IP reputation lists

Related Posts

References

AWS WAF Developer Guide

AWS Shield Developer Guide

AWS Firewall Manager Developer Guide

AWS Route 53 Routing Policies Comparison

AWS Route 53 Routing Policies Comparison

  • Amazon Route 53 supports 7 routing policies that determine how DNS queries are answered.
  • Choosing the right routing policy depends on whether you need failover, latency optimization, geographic restrictions, or traffic distribution.
  • Multiple policies can be combined using alias records and health checks for complex routing architectures.

Route 53 Routing Policies Comparison

Policy Use Case How It Works Health Checks
Simple Single resource, no special routing Returns all values in random order No (can’t attach)
Weighted Traffic distribution, blue/green, A/B testing Routes based on assigned weights (0-255) Yes
Latency-based Best performance for global users Routes to region with lowest latency Yes
Failover Active-passive disaster recovery Primary until unhealthy, then secondary Yes (required for primary)
Geolocation Content localization, compliance, restrictions Routes based on user’s geographic location Yes
Geoproximity Route based on resource location + bias Routes to nearest resource; bias expands/shrinks coverage Yes
Multivalue Answer Simple load balancing with health checks Returns up to 8 healthy records randomly Yes
IP-based Route by client IP/CIDR (ISP optimization) Routes based on client subnet CIDR mapping Yes

Simple Routing

  • Routes traffic to a single resource (or multiple values returned in random order).
  • Cannot attach health checks to simple routing records.
  • If multiple values are returned, client chooses one randomly (client-side load balancing).
  • Can only have one record per name with simple routing.
  • Best for: Single server, single resource behind a load balancer.

Weighted Routing

  • Routes traffic based on weights assigned to records (0-255).
  • Traffic proportion = record weight / sum of all weights for the same name.
  • Setting weight to 0 stops traffic to that resource (useful for maintenance).
  • If all records have weight 0, traffic is distributed equally.
  • Supports health checks – unhealthy records removed from responses.
  • Best for: Blue/green deployments (90/10 split), A/B testing, gradual migrations, load distribution across regions.

Latency-based Routing

  • Routes traffic to the region with the lowest network latency for the user.
  • Latency is measured between the user’s DNS resolver and AWS regions.
  • Requires resources in multiple AWS regions.
  • Supports health checks – if lowest-latency resource is unhealthy, routes to next-best.
  • Latency data is updated periodically by AWS (not real-time per request).
  • Best for: Global applications deployed in multiple regions needing best user experience.

Failover Routing

  • Routes traffic to primary resource when healthy, secondary when primary fails health check.
  • Active-passive configuration – only one designation per record set (primary or secondary).
  • Health check required on primary record; optional on secondary.
  • Secondary can point to a static S3 website (maintenance page) or another resource.
  • Can be combined with other routing policies using alias records.
  • Best for: Disaster recovery, maintenance pages, active-passive HA architectures.

Geolocation Routing

  • Routes traffic based on geographic location of the user (continent, country, or US state).
  • Most specific match wins – state > country > continent > default.
  • A default record is recommended – users from unmapped locations get this response.
  • If no default and no match, Route 53 returns “no answer”.
  • Does NOT route to closest resource – routes to the location you configure (use geoproximity for nearest).
  • Best for: Content localization (language), compliance (restrict access by country), serving region-specific content.

Geoproximity Routing

  • Routes traffic based on geographic distance between user and resources.
  • Bias values (-99 to +99) expand or shrink the geographic area that routes to a resource.
  • Positive bias = attracts more traffic (expands coverage area).
  • Negative bias = repels traffic (shrinks coverage area).
  • Supports both AWS resources (auto-detects region) and non-AWS resources (specify latitude/longitude).
  • Requires Route 53 Traffic Flow to use geoproximity routing.
  • Best for: Routing to nearest resource with ability to shift traffic between regions using bias.

Multivalue Answer Routing

  • Returns up to 8 healthy records in response to each DNS query.
  • Similar to simple routing but supports health checks – only healthy resources returned.
  • Not a substitute for a load balancer but provides basic DNS-level load balancing with health checking.
  • Each record can have its own health check.
  • Best for: Basic load distribution with health checking when you don’t need ELB.

IP-based Routing

  • Routes traffic based on client’s source IP address mapped to CIDR blocks.
  • Create CIDR collections with locations, then map records to locations.
  • Useful when you know the IP ranges of your users (corporate networks, ISPs).
  • More precise than geolocation – routes based on actual network, not estimated location.
  • Best for: ISP-specific routing, enterprise users with known IP ranges, optimizing costs by routing to specific endpoints.

Combining Routing Policies

  • Alias records can point to other Route 53 record sets, enabling policy combinations.
  • Example: Latency → Weighted (route to nearest region, then split between blue/green within region).
  • Example: Failover → Latency (primary is latency-based across regions, secondary is S3 static page).
  • Example: Geolocation → Failover (per-country routing with DR fallback).
  • Traffic Flow – visual editor for building complex routing trees with multiple policies.

AWS Certification Exam Practice Questions

  1. A company wants to gradually migrate traffic from an on-premises data center to AWS by sending 10% of traffic to AWS initially, increasing over time. Which routing policy supports this?
    1. Latency-based
    2. Weighted
    3. Failover
    4. Geolocation
  2. An application deployed in us-east-1 and eu-west-1 should route users to whichever region provides the fastest response. Which routing policy is appropriate?
    1. Geolocation
    2. Geoproximity
    3. Latency-based
    4. Weighted (50/50)
  3. A streaming service must serve different content libraries to users in different countries due to licensing restrictions. Which routing policy enforces this?
    1. Latency-based
    2. Geolocation
    3. Geoproximity
    4. IP-based
  4. A company needs to route traffic to the nearest data center but temporarily shift more traffic to a new region during a migration. Which routing policy allows adjusting the geographic coverage area?
    1. Geolocation with failover
    2. Weighted with latency
    3. Geoproximity with bias
    4. Multivalue answer
  5. An architect needs DNS-level health checking where unhealthy endpoints are automatically removed from DNS responses, but a full load balancer is not required. Which policy provides this with multiple IPs?
    1. Simple routing
    2. Weighted routing
    3. Failover routing
    4. Multivalue answer routing

Related Posts

References

Route 53 Routing Policies

Route 53 Traffic Flow

AWS DynamoDB vs DocumentDB vs Neptune

AWS DynamoDB vs DocumentDB vs Neptune

  • AWS offers multiple purpose-built NoSQL database services, each optimized for different data models and access patterns.
  • DynamoDB is a key-value/document database for high-scale low-latency workloads, DocumentDB is MongoDB-compatible for document workloads, and Neptune is a graph database for highly connected data.
  • Choice depends on data model, query patterns, scale requirements, and existing application compatibility.

DynamoDB vs DocumentDB vs Neptune Comparison

Feature DynamoDB DocumentDB Neptune
Data Model Key-value + Document Document (JSON) Graph (property graph + RDF)
Compatibility AWS proprietary API MongoDB 3.6/4.0/5.0 compatible Gremlin, SPARQL, openCypher
Architecture Serverless, fully managed Cluster-based (primary + replicas) Cluster-based (primary + replicas)
Scaling Automatic, unlimited (horizontal) Vertical (instance size) + read replicas (up to 15) Vertical (instance size) + read replicas (up to 15)
Serverless Option Yes (On-Demand or Provisioned) Yes (DocumentDB Elastic Clusters) Yes (Neptune Serverless)
Latency Single-digit milliseconds Low milliseconds Milliseconds for traversals
Max Item/Document Size 400KB 16MB N/A (graph edges/vertices)
Query Flexibility Limited (partition key + sort key, GSI/LSI) Rich (MongoDB query language, aggregation pipelines) Graph traversals (multi-hop relationships)
Transactions Yes (up to 100 items, 4MB) Yes (multi-document ACID) Yes (ACID)
Global Replication Global Tables (multi-region active-active) Global Clusters (up to 5 regions, read replicas) Global Database (up to 5 read regions)
Change Streams DynamoDB Streams / Kinesis Data Streams Change Streams (MongoDB compatible) Neptune Streams
Caching DAX (microsecond reads) No built-in (use ElastiCache) No built-in (use ElastiCache)
Full-Text Search No (integrate OpenSearch) Basic text indexes Neptune Analytics (vector + full-text)
Vector Search No No Yes (Neptune Analytics)
Pricing Per request (on-demand) or per RCU/WCU (provisioned) Per instance-hour + storage + I/O Per instance-hour + storage + I/O

Amazon DynamoDB

  • Fully serverless key-value and document database – single-digit millisecond latency at any scale.
  • Capacity modes: On-Demand (pay per request, zero capacity planning) or Provisioned (with Auto Scaling).
  • Designed for massive scale – handles 10+ trillion requests per day, peaks above 100 million requests/second.
  • Global Tables – multi-region, multi-active replication with Multi-Region Strong Consistency (MRSC, GA 2025).
  • DynamoDB Accelerator (DAX) – in-memory cache for microsecond read latency.
  • DynamoDB Streams – capture item-level changes for event-driven processing (Lambda integration).
  • TTL – automatic item expiration at no cost.
  • Zero-ETL to Redshift – replicate data to Redshift for analytics without pipelines.
  • Limitations: 400KB item size, limited query flexibility (must know partition key), no joins, no aggregations.
  • Best for: High-scale applications with known access patterns – gaming leaderboards, session stores, IoT, e-commerce carts, serverless backends.

Amazon DocumentDB

  • MongoDB-compatible document database – supports MongoDB 3.6, 4.0, and 5.0 API compatibility.
  • Purpose-built storage – separates compute from storage (similar to Aurora); storage auto-scales to 128TB.
  • Rich queries – full MongoDB query language, aggregation pipelines, secondary indexes, geospatial queries.
  • Elastic Clusters – shard collections across multiple nodes for horizontal scaling (millions of reads/writes per second).
  • Global Clusters – cross-region disaster recovery with up to 5 read regions.
  • Change Streams – MongoDB-compatible change data capture for event-driven architectures.
  • 16MB document size – suitable for complex nested documents.
  • Not 100% MongoDB compatible – some features differ (check compatibility matrix).
  • Best for: MongoDB workloads migrating to AWS, content management, catalogs, user profiles, applications needing flexible schemas with rich querying.

Amazon Neptune

  • Fully managed graph database – purpose-built for storing and querying highly connected data.
  • Supports three query languages: Apache TinkerPop Gremlin (property graphs), SPARQL (RDF/linked data), and openCypher (declarative graph queries).
  • Neptune Analytics – analyze graph data with vector search, graph algorithms, and full-text search.
  • Neptune Serverless – automatically scales compute based on workload.
  • Neptune ML – machine learning predictions on graph data using GNNs (Graph Neural Networks) via SageMaker.
  • Global Database – cross-region read replicas for low-latency reads and disaster recovery.
  • Neptune Streams – capture graph changes for downstream processing.
  • Up to 15 read replicas – scale reads across multiple instances.
  • Best for: Relationship-heavy data – social networks, recommendation engines, fraud detection, knowledge graphs, network topology, identity graphs, supply chain.

When to Choose Which

  • Choose DynamoDB when:
    • You need extreme scale with single-digit ms latency
    • Access patterns are well-defined (key-value lookups)
    • You want fully serverless with zero management
    • Use cases: session stores, gaming, IoT, e-commerce, serverless apps
  • Choose DocumentDB when:
    • You’re migrating from MongoDB or need MongoDB compatibility
    • Documents are complex/nested and need flexible querying
    • You need aggregation pipelines and secondary indexes
    • Use cases: content management, catalogs, user profiles
  • Choose Neptune when:
    • Data is highly connected with complex relationships
    • Queries involve traversing relationships (multi-hop)
    • You need graph algorithms (shortest path, centrality, community detection)
    • Use cases: social networks, fraud detection, knowledge graphs, recommendations

AWS Certification Exam Practice Questions

  1. A social media application needs to find “friends of friends” and recommend connections based on mutual relationships. Which database is purpose-built for this query pattern?
    1. DynamoDB with GSI
    2. DocumentDB with aggregation
    3. Neptune (graph traversal)
    4. RDS with JOIN queries
  2. A company is migrating a MongoDB application to AWS. They use aggregation pipelines, geospatial queries, and change streams extensively. Which service provides the best compatibility?
    1. DynamoDB with Document model
    2. DocumentDB
    3. Neptune
    4. ElastiCache for MongoDB
  3. A gaming application needs a leaderboard that handles 50,000 writes per second with single-digit millisecond latency, using simple key-value access patterns. Which database fits?
    1. DocumentDB
    2. Neptune
    3. Aurora
    4. DynamoDB
  4. A fraud detection system needs to analyze transaction patterns by traversing relationships between accounts, devices, IP addresses, and merchants to find suspicious clusters. Which database is best suited?
    1. DynamoDB with Streams
    2. DocumentDB with aggregation
    3. Neptune with graph algorithms
    4. Redshift for analytics
  5. An e-commerce application stores product catalogs with deeply nested attributes (variations, specifications, reviews) and needs to query by any attribute with aggregation. Documents average 2MB. Which database fits?
    1. DynamoDB (400KB limit would be exceeded)
    2. DocumentDB (16MB limit, rich queries)
    3. Neptune
    4. S3 with Athena

Related Posts

References

Amazon DynamoDB Developer Guide

Amazon DocumentDB Developer Guide

Amazon Neptune User Guide

AWS Step Functions vs EventBridge

AWS Step Functions vs EventBridge

  • Both Step Functions and EventBridge are serverless services for coordinating workflows, but they serve fundamentally different purposes.
  • Step Functions orchestrates multi-step workflows with state management and error handling.
  • EventBridge routes events between services based on content-based rules without maintaining state.
  • They are often used together – EventBridge triggers Step Functions workflows based on events.

Step Functions vs EventBridge Comparison

Feature Step Functions EventBridge
Pattern Orchestration (centralized control) Choreography (decoupled routing)
State Management Yes – tracks execution state, input/output between steps No – stateless event routing
Execution Model Sequential, parallel, branching, looping Fire-and-forget event delivery
Duration Standard: up to 1 year; Express: up to 5 minutes Near real-time delivery (no duration concept)
Error Handling Built-in Retry, Catch, Fallback states Dead-letter queue on target delivery failure
Visibility Visual workflow graph, step-by-step execution history Rule match metrics, limited execution visibility
Targets/Integrations 200+ AWS service integrations (direct SDK calls) 200+ AWS service targets per rule
Event Sources Triggered by API call, EventBridge, API Gateway, Lambda 90+ AWS services, SaaS partners, custom apps
Filtering Choice state (conditions on input data) Content-based filtering on event body (event patterns)
Parallelism Parallel state, Distributed Map (millions of items) Multiple targets per rule (fan-out)
Human Approval Yes – Task tokens with callback pattern No native support
Scheduling Wait state (delay steps) EventBridge Scheduler (cron/rate/one-time)
Replay Redrive failed executions (2024) Event Archive and Replay
Pricing Standard: per state transition; Express: per request + duration Per event published ($1/million)

AWS Step Functions

  • Serverless workflow orchestration – coordinates multiple AWS services into visual workflows.
  • Standard Workflows – up to 1 year, exactly-once execution, full execution history, ideal for long-running processes.
  • Express Workflows – up to 5 minutes, at-least-once, high-volume event processing (100K+ executions/second).
  • States: Task, Choice, Parallel, Map, Wait, Pass, Succeed, Fail.
  • Direct SDK integrations – call 200+ AWS services without Lambda (DynamoDB PutItem, SQS SendMessage, ECS RunTask, Bedrock InvokeModel).
  • Distributed Map – process millions of items from S3 in parallel (up to 10,000 concurrent executions).
  • Callback pattern – pause workflow, wait for external system/human approval via task token.
  • Error handling – Retry with exponential backoff, Catch with fallback states, per-step timeout.
  • Redrive (2024) – restart failed executions from the point of failure without re-running completed steps.
  • Variables and JSONata (2024) – workflow-level variables and powerful data transformation expressions.
  • Best for: Multi-step processes needing coordination, error handling, human approval, long-running workflows, batch processing.

Amazon EventBridge

  • Serverless event bus – routes events between decoupled services based on rules.
  • Receives events from 90+ AWS services automatically without configuration.
  • Content-based filtering – event patterns match on any field in the event JSON body.
  • Multiple targets per rule – fan-out a single event to up to 5 targets.
  • EventBridge Scheduler – millions of one-time or recurring schedules (replaces CloudWatch Events).
  • EventBridge Pipes – point-to-point with filtering, enrichment, and transformation between source and target.
  • Event Archive and Replay – store events indefinitely for reprocessing or debugging.
  • Schema Registry – auto-discover event schemas for code generation.
  • Global endpoints – automatic failover to secondary region.
  • SaaS integrations – receive events from Zendesk, Datadog, Shopify, Auth0, etc.
  • Best for: Event-driven architectures, reacting to AWS service changes, decoupled microservices, SaaS integration, scheduling.

When to Choose Which

  • Choose Step Functions when:
    • You need to coordinate multiple steps in a specific order
    • Workflow requires error handling with retries and fallbacks
    • You need visibility into which step succeeded/failed
    • Process requires human approval or external callbacks
    • Long-running processes (minutes to months)
    • Batch processing of millions of items (Distributed Map)
  • Choose EventBridge when:
    • You need to react to events from AWS services or SaaS apps
    • Services should be decoupled (producers don’t know about consumers)
    • Routing based on event content to different targets
    • You need scheduling (cron jobs, one-time future events)
    • Fan-out: one event triggers multiple independent actions
    • Cross-account or cross-region event routing
  • Use Both Together: EventBridge detects an event (e.g., S3 upload) → triggers Step Functions workflow → orchestrates multi-step processing (validate → transform → load → notify).

AWS Certification Exam Practice Questions

  1. An order processing system requires validating payment, checking inventory, reserving items, charging the card, and sending confirmation – each step depends on the previous one succeeding. If payment fails, the reserved items must be released. Which service handles this?
    1. EventBridge with multiple rules
    2. Step Functions with error handling (Catch/compensating actions)
    3. SQS with multiple queues
    4. SNS with filter policies
  2. A company wants to automatically trigger different Lambda functions when EC2 instances change state (running, stopped, terminated) – each state routes to a different function. Which service is most appropriate?
    1. Step Functions with Choice state
    2. CloudWatch Alarms
    3. EventBridge with content-based rules
    4. SNS with message filtering
  3. A data pipeline processes millions of S3 objects in parallel, with each object needing 3 transformation steps. The pipeline must track progress and retry individual failures. Which approach is recommended?
    1. EventBridge Pipes with SQS
    2. Lambda triggered by S3 events
    3. Step Functions Distributed Map
    4. EventBridge with Lambda targets
  4. A workflow requires pausing execution until a human reviews and approves a document via an external web application (may take hours or days). Which feature supports this?
    1. EventBridge wait pattern
    2. Step Functions callback pattern with task token
    3. SQS visibility timeout
    4. Lambda with DynamoDB polling
  5. A company needs to schedule 2 million one-time reminder notifications to be sent at specific future times (each different). Which service handles this at scale?
    1. Step Functions Wait state
    2. CloudWatch Events cron
    3. EventBridge Scheduler
    4. SQS delay queues

Related Posts

References

AWS Step Functions Developer Guide

Amazon EventBridge User Guide

EventBridge Scheduler User Guide

AWS CloudWatch vs CloudTrail vs Config

AWS CloudWatch vs CloudTrail vs Config

  • AWS provides three core monitoring and governance services that are often confused but serve distinct purposes.
  • CloudWatch monitors performance and operational health, CloudTrail records API activity (who did what), and Config tracks resource configuration changes and compliance.
  • All three work together for a complete observability and governance strategy.

CloudWatch vs CloudTrail vs Config Comparison

Feature CloudWatch CloudTrail Config
Purpose Performance monitoring & observability API audit trail & activity logging Resource configuration tracking & compliance
Answers “How is it performing?” “Who did what and when?” “What changed and is it compliant?”
Data Type Metrics, logs, traces, events API call records (events) Resource configuration snapshots
Scope Resources, applications, services AWS account API activity AWS resource inventory & state
Retention Metrics: 15 months; Logs: configurable (forever) 90 days (console) or S3 (indefinite) Indefinite (configuration history)
Alerting Yes (Alarms on metrics and logs) Via EventBridge or CloudWatch Logs Yes (Config Rules – non-compliant triggers SNS)
Automation Auto Scaling, EC2 actions, Lambda EventBridge rules trigger actions Auto-remediation via SSM Automation
Cross-account Cross-account dashboards, metric sharing Organization trail Aggregator (multi-account, multi-region)
Pricing Per metric, log ingestion, dashboard Free (management events, 1 copy); data events paid Per rule evaluation + per configuration item recorded
Example CPU > 80% for 5 minutes → alarm User X deleted S3 bucket at 3:42pm Security group changed to allow 0.0.0.0/0 → non-compliant

Amazon CloudWatch

  • Monitoring and observability service for AWS resources and applications.
  • Metrics – collect and track standard (free) and custom metrics; 1-second resolution available.
  • Alarms – trigger actions (Auto Scaling, SNS, EC2 stop/terminate/reboot) when metrics cross thresholds.
  • Logs – centralized log collection with Logs Insights for SQL-like querying.
  • Dashboards – create visualizations across accounts and regions.
  • CloudWatch Agent – collect OS-level metrics (memory, disk) and application logs from EC2.
  • Anomaly Detection – ML-based bands to detect unusual metric behavior.
  • Composite Alarms – combine multiple alarms with AND/OR logic to reduce noise.
  • Synthetics – canary scripts to monitor endpoints and APIs proactively.
  • Application Signals – automatic application monitoring with SLOs (GA 2024).
  • Internet Monitor – monitor internet connectivity to your application.
  • Database Insights – unified database monitoring across RDS, Aurora, and self-managed databases.

AWS CloudTrail

  • Records all API calls made in your AWS account – who, what, when, from where.
  • Management events – control plane operations (CreateBucket, RunInstances, etc.) – free, 1 copy per region.
  • Data events – data plane operations (S3 GetObject, Lambda Invoke, DynamoDB GetItem) – paid.
  • Insights events – detect unusual API activity patterns (e.g., spike in API calls).
  • Trail delivery – send events to S3 (long-term storage) and/or CloudWatch Logs (real-time alerting).
  • Organization trail – single trail for all accounts in AWS Organizations.
  • CloudTrail Lake – managed data lake for querying events with SQL (replaces Athena queries on S3).
  • Event history – 90-day free lookup in the console (management events only).
  • Integrity validation – digest files prove logs haven’t been tampered with.
  • Network activity events (2024) – track VPC endpoint API calls for data perimeter monitoring.

AWS Config

  • Tracks resource configuration changes and evaluates compliance over time.
  • Configuration recorder – captures current state of resources as configuration items.
  • Configuration history – timeline of how a resource’s configuration changed.
  • Config Rules – evaluate resources against desired configurations (400+ AWS managed rules + custom Lambda rules).
  • Conformance Packs – collection of Config Rules and remediation actions packaged as a single entity.
  • Auto-remediation – automatically fix non-compliant resources via SSM Automation documents.
  • Aggregator – centralized view across multiple accounts and regions.
  • Advanced Query – SQL queries on current configuration state of all resources.
  • Proactive compliance (2024) – evaluate CloudFormation templates BEFORE deployment.
  • Service-linked rules – Config Rules managed by other AWS services (Security Hub, Control Tower).
  • Resource timeline – view config changes, compliance changes, and CloudTrail events together.

How They Work Together

  • Security incident investigation: Config shows WHAT changed → CloudTrail shows WHO changed it → CloudWatch shows the IMPACT on performance.
  • Compliance automation: Config Rule detects non-compliant resource → triggers SNS → Auto-remediation fixes it → CloudTrail logs the remediation → CloudWatch tracks the metric.
  • Proactive monitoring: CloudWatch alarm fires on high error rate → CloudTrail reveals recent deployment → Config shows configuration change that caused it.

When to Choose Which

  • Use CloudWatch – Monitor CPU/memory/disk, set alarms for thresholds, centralize application logs, create dashboards, track SLOs.
  • Use CloudTrail – Audit API calls, investigate security incidents, meet compliance requirements for activity logging, detect unusual API patterns.
  • Use Config – Track resource configuration drift, enforce compliance rules, audit resource history, auto-remediate non-compliant resources.
  • Use all three together – Complete governance: monitoring (CloudWatch) + auditing (CloudTrail) + compliance (Config).

AWS Certification Exam Practice Questions

  1. A security team needs to determine who deleted an S3 bucket last Tuesday and from which IP address. Which service provides this information?
    1. CloudWatch Logs
    2. CloudTrail
    3. AWS Config
    4. VPC Flow Logs
  2. A company needs to ensure all Security Groups in their account never allow SSH (port 22) from 0.0.0.0/0. If a non-compliant Security Group is detected, it should be automatically remediated. Which service provides this?
    1. CloudWatch Alarm with Lambda
    2. CloudTrail with EventBridge rule
    3. AWS Config Rule with auto-remediation
    4. GuardDuty
  3. An operations team wants to receive an alert when EC2 CPU utilization exceeds 90% for more than 5 minutes and automatically add instances to the fleet. Which service and feature enables this?
    1. CloudTrail with SNS
    2. Config Rule with remediation
    3. CloudWatch Alarm with Auto Scaling action
    4. EventBridge with Step Functions
  4. A compliance auditor needs to see the complete configuration history of an RDS instance over the past 6 months, including every change to its configuration. Which service provides this timeline view?
    1. CloudTrail event history
    2. CloudWatch Logs
    3. AWS Config (configuration timeline)
    4. RDS event notifications
  5. An organization wants to detect when an unusually high number of API calls are made to IAM (potential credential compromise). Which service and feature is purpose-built for this?
    1. CloudWatch Anomaly Detection
    2. Config Rule
    3. CloudTrail Insights
    4. GuardDuty

Related Posts

References

Amazon CloudWatch User Guide

AWS CloudTrail User Guide

AWS Config Developer Guide

AWS KMS vs CloudHSM vs Secrets Manager vs Parameter Store

AWS KMS vs CloudHSM vs Secrets Manager vs Parameter Store

  • AWS provides multiple services for managing encryption keys and secrets, each designed for different security requirements and use cases.
  • KMS is managed key management, CloudHSM is dedicated hardware security modules, Secrets Manager is for rotating secrets, and Systems Manager Parameter Store is for configuration and secrets storage.
  • Choice depends on compliance requirements (FIPS 140-2 Level 3), key control needs, rotation requirements, and cost.

KMS vs CloudHSM vs Secrets Manager vs Parameter Store Comparison

Feature KMS CloudHSM Secrets Manager Parameter Store
Purpose Managed encryption key service Dedicated HSM for key management Secret storage with automatic rotation Configuration & secret storage
Key Control AWS manages HSM, you manage keys You manage everything (single-tenant HSM) Uses KMS for encryption Uses KMS for encryption (SecureString)
FIPS 140-2 Level 3 (since 2023) Level 3 N/A (uses KMS) N/A (uses KMS)
Multi-tenancy Multi-tenant (shared infrastructure) Single-tenant (dedicated hardware) Multi-tenant Multi-tenant
Automatic Rotation Yes (annual for AWS-managed, configurable 90-365 days for customer-managed) Manual (you control rotation) Yes (built-in for RDS, Redshift, DocumentDB; Lambda for custom) No built-in rotation
Cross-account Yes (key policy + IAM) No (same VPC/account) Yes (resource policy) Yes (resource policy, Advanced tier)
Cross-region Multi-Region keys Cluster in single region Multi-Region secret replication No native replication
Max Secret Size 4KB (symmetric key operations) Unlimited (HSM capacity) 64KB 4KB (Standard) / 8KB (Advanced)
Pricing $1/month per key + API calls ~$1.50/hour per HSM ($1,095/month) $0.40/secret/month + API calls Free (Standard) / $0.05/parameter/month (Advanced)
Versioning Automatic (rotation creates new version) Manual Yes (staging labels: AWSCURRENT, AWSPREVIOUS) Yes (up to 100 versions)
Audit CloudTrail CloudTrail + HSM audit logs CloudTrail CloudTrail
AWS Integration 100+ services natively Custom integration required RDS, Redshift, DocumentDB, ECS, Lambda ECS, Lambda, CloudFormation, CodeDeploy
Key Types Symmetric (AES-256), Asymmetric (RSA, ECC), HMAC Symmetric, Asymmetric, HMAC, custom algorithms N/A (stores secrets, not keys) N/A (stores values)

AWS KMS (Key Management Service)

  • Fully managed encryption key service integrated with 100+ AWS services.
  • Three key types: AWS owned (free, AWS-managed), AWS managed (auto-created per service), Customer managed (full control).
  • Envelope encryption – generates data keys for encrypting data locally; KMS never stores data keys.
  • Multi-Region keys – replicate keys across regions for cross-region encryption/decryption.
  • Key policies + IAM – fine-grained access control; grants for temporary access.
  • Automatic key rotation – configurable 90-365 days for customer-managed keys (was annual only before 2024).
  • External Key Store (XKS) – use keys stored in your own HSM outside AWS.
  • FIPS 140-2 Level 3 validated since March 2023.
  • Best for: Most encryption use cases – S3, EBS, RDS, DynamoDB, Lambda, and 100+ other AWS services.

AWS CloudHSM

  • Dedicated, single-tenant HSM instances in your VPC – you own and manage the keys.
  • FIPS 140-2 Level 3 validated hardware – required for certain regulatory compliance.
  • Full key control – AWS cannot access your keys; AWS manages hardware only.
  • Supports PKCS#11, JCE, CNG, and OpenSSL interfaces for custom applications.
  • Cluster-based – deploy across multiple AZs for HA; keys automatically replicated.
  • Custom key store for KMS – back KMS keys with CloudHSM for compliance + service integration.
  • SSL/TLS offloading – use CloudHSM for web server private keys.
  • Code signing, certificate authority – custom crypto operations not available in KMS.
  • Best for: Regulatory compliance (PCI-DSS, HIPAA requiring dedicated HSM), custom cryptographic operations, SSL offloading, certificate authorities.

AWS Secrets Manager

  • Purpose-built for managing secrets (database credentials, API keys, tokens).
  • Automatic rotation – built-in for RDS (MySQL, PostgreSQL, Oracle, SQL Server, MariaDB), Redshift, DocumentDB; Lambda-based for custom secrets.
  • Multi-Region replication – replicate secrets across regions for DR and multi-region applications.
  • Versioning with staging labels – AWSCURRENT, AWSPREVIOUS, AWSPENDING during rotation.
  • Resource-based policies – share secrets cross-account.
  • Integration – ECS/Fargate (inject as environment variables), Lambda, RDS Proxy.
  • Batch retrieval – retrieve up to 20 secrets in a single API call.
  • Best for: Database credentials that need automatic rotation, API keys, OAuth tokens, any secret requiring lifecycle management.

Systems Manager Parameter Store

  • Hierarchical configuration storage for both configuration data and secrets.
  • Two tiers: Standard (free, 4KB, 10K params) and Advanced ($0.05/month, 8KB, 100K params).
  • Parameter types: String, StringList, SecureString (encrypted with KMS).
  • Hierarchy and tagging – organize parameters like /prod/db/password, /dev/api/key.
  • No built-in rotation – use EventBridge + Lambda for custom rotation.
  • Parameter policies (Advanced tier) – expiration notifications, no-change notifications.
  • Public parameters – AWS provides latest AMI IDs, ECS-optimized AMI, etc.
  • CloudFormation integration – resolve parameters dynamically during stack creation.
  • Best for: Application configuration, feature flags, non-rotating secrets, AMI IDs, and cost-sensitive use cases where rotation isn’t needed.

When to Choose Which

  • Choose KMS – Encrypting data in AWS services (S3, EBS, RDS), envelope encryption, most standard encryption needs.
  • Choose CloudHSM – Regulatory requirement for dedicated HSM (FIPS 140-2 Level 3 single-tenant), custom cryptographic operations, SSL offloading, running your own CA.
  • Choose Secrets Manager – Database credentials needing automatic rotation, API keys with lifecycle management, cross-region secret replication.
  • Choose Parameter Store – Application configuration, feature flags, non-rotating secrets, cost-sensitive (free tier), hierarchical organization of config data.
  • Combine KMS + Secrets Manager – Secrets Manager uses KMS for encryption; use customer-managed KMS key for additional control.
  • Combine CloudHSM + KMS – Use CloudHSM as a custom key store backing KMS keys (compliance + service integration).

AWS Certification Exam Practice Questions

  1. A company needs to store database credentials that automatically rotate every 30 days and are accessible from ECS tasks as environment variables. Which service is most appropriate?
    1. KMS with custom rotation
    2. Parameter Store SecureString
    3. Secrets Manager
    4. CloudHSM
  2. A financial institution must use dedicated hardware security modules (not shared) for key management to satisfy PCI-DSS Level 1 compliance. Which service meets this requirement?
    1. KMS with customer-managed keys
    2. CloudHSM
    3. KMS with external key store
    4. Secrets Manager with KMS
  3. A development team needs to store application configuration values (non-sensitive) and sensitive database passwords together in a hierarchical structure with minimal cost. Which approach is recommended?
    1. Secrets Manager for all values
    2. Parameter Store (String for config, SecureString for passwords)
    3. KMS encrypted S3 bucket
    4. DynamoDB with encryption
  4. An application needs encryption keys that work identically across 3 AWS regions for cross-region data encryption/decryption without re-encrypting. Which feature enables this?
    1. CloudHSM cluster replication
    2. Secrets Manager multi-region secrets
    3. KMS Multi-Region keys
    4. KMS key import in each region
  5. A company wants to use AWS KMS for service integrations but needs their keys to remain in their on-premises HSM that they fully control. Which KMS feature supports this?
    1. CloudHSM custom key store
    2. KMS imported key material
    3. KMS External Key Store (XKS)
    4. KMS with VPN connection

Related Posts

References

AWS KMS Developer Guide

AWS CloudHSM User Guide

AWS Secrets Manager User Guide

AWS Systems Manager Parameter Store

AWS Container Services Cheat Sheet

AWS Container Services Cheat Sheet

  • AWS provides a full container stack: orchestration (ECS, EKS), compute (Fargate, EC2), registry (ECR), and supporting services (App Mesh, Cloud Map, Proton).
  • Containers package applications with dependencies for consistent deployment across environments.

Container Orchestration

Amazon ECS (Elastic Container Service)

  • AWS-native container orchestrator – deeply integrated with IAM, CloudWatch, ALB, VPC.
  • Task Definition – blueprint for containers (image, CPU, memory, ports, IAM role, volumes).
  • Service – maintains desired count of tasks, integrates with load balancers, handles rolling updates.
  • Launch types: EC2 (you manage instances) or Fargate (serverless).
  • No control plane cost – free; pay only for EC2 or Fargate compute.
  • Capacity Providers – automatic EC2 Auto Scaling and Fargate/Fargate Spot management.
  • Service Connect – simplified service-to-service communication (built-in service mesh).
  • ECS Exec – interactive shell into running containers for debugging.
  • ECS Anywhere – run ECS tasks on on-premises servers.
  • Blue/Green deployments – native CodeDeploy integration.

Amazon EKS (Elastic Kubernetes Service)

  • Managed Kubernetes – certified conformant, runs upstream K8s.
  • Control plane: fully managed by AWS ($0.10/hour per cluster).
  • Compute options: Managed Node Groups, Self-Managed Nodes, Fargate, EKS Auto Mode.
  • EKS Auto Mode – AWS manages nodes, scaling, upgrades, and security patches automatically.
  • Add-ons: CoreDNS, kube-proxy, VPC CNI, EBS CSI, managed via EKS.
  • EKS Anywhere – run Kubernetes on-premises with EKS management.
  • EKS Connector – register external Kubernetes clusters to the EKS console.
  • Full Kubernetes ecosystem: Helm, Karpenter, Istio, ArgoCD, Prometheus, etc.

Compute

AWS Fargate

  • Serverless compute for ECS and EKS – no EC2 to manage.
  • Per-task pricing: vCPU-hour + GB-hour (per second billing, 1-min minimum).
  • Isolation: each task/pod runs in its own Firecracker microVM.
  • Fargate Spot: up to 70% discount; tasks can be interrupted with 2-min warning.
  • Limitations: no GPU, no EBS, no daemonsets, no privileged containers.
  • Storage: 20GB ephemeral per task (configurable up to 200GB) + EFS supported.

Container Registry

Amazon ECR (Elastic Container Registry)

  • Fully managed Docker container registry – stores, manages, and deploys container images.
  • Private repositories with IAM-based access control.
  • Public repositories (ECR Public Gallery) for open-source images.
  • Image scanning – automatic vulnerability scanning (Basic with Clair, or Enhanced with Inspector).
  • Lifecycle policies – automatically clean up old/untagged images.
  • Cross-region and cross-account replication.
  • Image immutability – prevent image tags from being overwritten.
  • OCI support – stores OCI images and Helm charts.

Networking & Service Discovery

AWS App Mesh

  • Service mesh using Envoy proxy for traffic management, observability, and security between services.
  • Supports ECS, EKS, and EC2 workloads.
  • Features: traffic routing, retries, timeouts, circuit breaking, mutual TLS.

Amazon VPC Lattice

  • Application-layer networking – connect, secure, and monitor services across VPCs and accounts.
  • Simpler than App Mesh – no sidecar proxies needed.
  • Supports ECS, EKS, Lambda, and EC2 targets.

AWS Cloud Map

  • Service discovery – register and discover services using DNS or API.
  • Health checking for registered instances.
  • Used by ECS Service Connect and App Mesh.

CI/CD & DevOps

  • AWS CodePipeline – CI/CD pipeline automation for container deployments.
  • AWS CodeBuild – build container images (docker build + push to ECR).
  • AWS CodeDeploy – blue/green deployments for ECS services.
  • AWS Proton – managed delivery service for container and serverless application templates.
  • AWS Copilot – CLI for building, releasing, and operating containerized apps on ECS.

Monitoring & Logging

  • CloudWatch Container Insights – metrics and logs for ECS and EKS (CPU, memory, network, disk per task/pod).
  • AWS X-Ray – distributed tracing for containerized microservices.
  • FireLens – ECS log router using Fluent Bit/Fluentd to send logs to CloudWatch, S3, Splunk, Datadog.
  • CloudWatch Logs – awslogs driver for ECS; Fluent Bit DaemonSet for EKS.

AWS Certification Exam Practice Questions

  1. A team wants to run containers without managing any infrastructure and needs the lowest operational overhead. They don’t use Kubernetes. Which combination is correct?
    1. EKS with Fargate
    2. ECS with Fargate
    3. ECS with EC2
    4. EKS with Managed Node Groups
  2. A company needs to automatically scan container images for vulnerabilities when pushed to the registry. Which service and feature provides this?
    1. ECS image scanning
    2. ECR Enhanced Scanning (with Amazon Inspector)
    3. GuardDuty container protection
    4. AWS Config rules
  3. An application needs service-to-service communication with mutual TLS, traffic routing, and retry policies across ECS and EKS services. Which service provides this?
    1. VPC Lattice
    2. Cloud Map
    3. AWS App Mesh
    4. ECS Service Connect
  4. A company needs to run Kubernetes on-premises while managing it with the same tools used for their AWS EKS clusters. Which service supports this?
    1. ECS Anywhere
    2. EKS Anywhere
    3. EKS Connector
    4. AWS Outposts

Related Posts

References

Amazon ECS Developer Guide

Amazon EKS User Guide

Amazon ECR User Guide

AWS Storage Services Cheat Sheet

AWS Storage Services Cheat Sheet

  • AWS provides storage services across four categories: Object (S3), Block (EBS), File (EFS, FSx), and Hybrid/Edge (Storage Gateway, Snow Family).
  • Each is optimized for different access patterns, latency requirements, and cost profiles.

Object Storage

Amazon S3

  • Unlimited object storage with 99.999999999% (11 nines) durability.
  • Max object size: 5TB; multipart upload for objects >100MB.
  • Storage Classes:
    • S3 Standard – frequently accessed, low latency, high throughput.
    • S3 Intelligent-Tiering – automatic cost optimization with access pattern monitoring.
    • S3 Standard-IA – infrequent access, lower storage cost, retrieval fee.
    • S3 One Zone-IA – single AZ, 20% cheaper than Standard-IA.
    • S3 Glacier Instant Retrieval – archive with millisecond access.
    • S3 Glacier Flexible Retrieval – archive, 1-12 hour retrieval.
    • S3 Glacier Deep Archive – lowest cost, 12-48 hour retrieval.
    • S3 Express One Zone – single-digit millisecond, single AZ, for analytics.
  • Lifecycle Policies – automatically transition objects between classes or expire them.
  • Versioning – keep multiple versions; protect against accidental deletes.
  • Replication – Cross-Region (CRR) or Same-Region (SRR) replication.
  • Object Lock – WORM (Write Once Read Many) for compliance (Governance or Compliance mode).
  • S3 Event Notifications – trigger Lambda, SQS, SNS, EventBridge on object events.
  • S3 Transfer Acceleration – faster uploads using CloudFront edge locations.
  • S3 Select / Glacier Select – retrieve subset of data using SQL.
  • Encryption: SSE-S3 (default), SSE-KMS, SSE-C, client-side.
  • Access Control: Bucket policies, IAM policies, ACLs (legacy), Access Points, Block Public Access.

Block Storage

Amazon EBS

  • Persistent block storage for EC2 instances.
  • Volume Types:
    • gp3 – general purpose SSD, 3,000 IOPS baseline, up to 16,000 IOPS. Cost-effective default.
    • gp2 – general purpose SSD, burst up to 3,000 IOPS (legacy, prefer gp3).
    • io2 Block Express – highest performance SSD, up to 256,000 IOPS, sub-ms latency. For databases.
    • io1 – provisioned IOPS SSD, up to 64,000 IOPS.
    • st1 – throughput-optimized HDD, up to 500 MB/s. For big data, data warehouses.
    • sc1 – cold HDD, lowest cost, up to 250 MB/s. For infrequent access.
  • Single AZ – must be in same AZ as EC2 instance.
  • Snapshots – point-in-time backups to S3 (incremental); can copy cross-region.
  • Multi-Attach – io2 volumes can attach to up to 16 Nitro instances in same AZ.
  • Encryption – AES-256 via KMS; encrypt at rest, in transit, and snapshots.
  • Elastic Volumes – resize, change type, or adjust IOPS without downtime.

EC2 Instance Store

  • Ephemeral block storage physically attached to host – highest IOPS/throughput.
  • Data lost on instance stop/terminate/failure.
  • Use for: temporary buffers, caches, scratch data.

File Storage

Amazon EFS

  • Managed NFS (NFSv4.1) – concurrent access from multiple EC2, ECS, Lambda.
  • Elastic – grows/shrinks automatically; pay only for what you use.
  • Performance modes: General Purpose (latency-sensitive) and Max I/O (high parallelism).
  • Throughput modes: Elastic (auto), Bursting, Provisioned.
  • Storage classes: Standard, Infrequent Access (IA), Archive – with lifecycle management.
  • Regional – data stored across multiple AZs; One Zone option available at lower cost.
  • Supports cross-region replication for DR.

Amazon FSx

  • FSx for Windows File Server – managed Windows SMB with Active Directory, DFS, VSS.
  • FSx for Lustre – high-performance parallel file system (HPC, ML). Integrates with S3.
  • FSx for NetApp ONTAP – multi-protocol (NFS, SMB, iSCSI) with snapshots, clones, tiering.
  • FSx for OpenZFS – high-performance NFS with snapshots and data compression.

Hybrid & Edge Storage

AWS Storage Gateway

  • Hybrid cloud storage connecting on-premises to AWS.
  • S3 File Gateway – NFS/SMB access to S3 objects.
  • FSx File Gateway – local cache for FSx for Windows File Server.
  • Volume Gateway – iSCSI block storage backed by S3 (Cached or Stored mode).
  • Tape Gateway – virtual tape library (VTL) backed by S3/Glacier.

AWS Snow Family

  • Snowcone – 8-14TB, portable edge computing and data transfer.
  • Snowball Edge – 80-210TB, Storage Optimized or Compute Optimized.
  • Snowmobile – 100PB, exabyte-scale data migration (truck).
  • Use for: offline data migration, edge computing where connectivity is limited.

AWS DataSync

  • Automated data transfer – on-premises to AWS (S3, EFS, FSx) or between AWS services.
  • Up to 10x faster than open-source tools; built-in scheduling, integrity validation.

AWS Transfer Family

  • Managed SFTP, FTPS, FTP, and AS2 transfers directly to/from S3 or EFS.

AWS Certification Exam Practice Questions

  1. A company needs to store 500TB of data that is rarely accessed (once per quarter) but must be retrievable within milliseconds. Which storage class is most cost-effective?
    1. S3 Standard
    2. S3 Standard-IA
    3. S3 Glacier Instant Retrieval
    4. S3 Glacier Deep Archive
  2. A database requires 100,000 IOPS with sub-millisecond latency on a single EC2 instance. Which EBS volume type should be used?
    1. gp3
    2. io1
    3. io2 Block Express
    4. st1
  3. Multiple EC2 instances across AZs need shared access to a POSIX-compliant file system with automatic capacity scaling. Which service fits?
    1. EBS Multi-Attach
    2. S3
    3. EFS
    4. FSx for Lustre
  4. A company needs to migrate 50TB of on-premises data to S3, but their internet connection would take 2 weeks. Which service provides faster physical transfer?
    1. S3 Transfer Acceleration
    2. DataSync
    3. Snowball Edge
    4. Storage Gateway
  5. An on-premises application uses NFS to access files that must be stored in S3 for durability. Which service provides this transparent NFS-to-S3 bridge?
    1. EFS
    2. FSx for ONTAP
    3. S3 File Gateway
    4. DataSync

Related Posts

References

Amazon S3 User Guide

Amazon EBS User Guide

Amazon EFS User Guide

AWS Storage Gateway User Guide

AWS Serverless Services Cheat Sheet

AWS Serverless Services Cheat Sheet

  • AWS serverless services allow running applications without provisioning or managing servers.
  • Services automatically scale, provide built-in high availability, and use pay-per-use pricing.
  • Core serverless stack: Lambda (compute) + API Gateway (API) + DynamoDB (database) + S3 (storage) + EventBridge (events) + Step Functions (orchestration).

Compute

AWS Lambda

  • Run code without provisioning servers – event-driven, function-as-a-service.
  • Triggers: 200+ event sources (S3, DynamoDB, SQS, API Gateway, EventBridge, Kinesis, etc.).
  • Duration: max 15 minutes per invocation.
  • Memory: 128MB – 10GB (CPU proportional).
  • Concurrency: 1,000 default (can increase); Reserved and Provisioned Concurrency available.
  • Pricing: per request ($0.20/million) + per GB-second of compute.
  • Deployment: ZIP package (250MB unzipped) or container image (10GB).
  • Layers: share code/libraries across functions (up to 5 layers).
  • Versions & Aliases: immutable versions with aliases for traffic shifting (canary/linear).
  • Lambda@Edge: run at CloudFront edge locations (viewer/origin request/response).
  • SnapStart: reduce cold starts for Java functions (caches initialized snapshot).

AWS Fargate

  • Serverless compute for containers – works with ECS and EKS.
  • No EC2 instances to manage – per-task pricing (vCPU + memory per second).
  • Each task runs in isolated microVM (Firecracker).
  • Fargate Spot: up to 70% savings for fault-tolerant workloads.

AWS App Runner

  • Fully managed – deploy from source code (GitHub) or container image to running web service in minutes.
  • Auto-scales based on traffic; can pause when idle.
  • Built-in HTTPS endpoint, load balancing, and certificate management.

API & Integration

Amazon API Gateway

  • Create, publish, and manage REST, HTTP, and WebSocket APIs.
  • REST API: full-featured (caching, request validation, WAF, usage plans, API keys).
  • HTTP API: lower latency, lower cost, simpler (JWT authorizers, OIDC).
  • WebSocket API: real-time two-way communication.
  • Integrates with Lambda, HTTP backends, AWS services, and Mock integrations.
  • Throttling: 10,000 requests/second default (account-level), burst 5,000.
  • Stages: dev/staging/prod with stage variables and canary deployments.

AWS Step Functions

  • Serverless orchestration – coordinate multiple AWS services into workflows.
  • Standard Workflows: up to 1 year, exactly-once execution, audit history.
  • Express Workflows: up to 5 minutes, at-least-once, high-volume event processing.
  • States: Task, Choice, Parallel, Map, Wait, Pass, Succeed, Fail.
  • Direct 200+ AWS service integrations (DynamoDB, SQS, SNS, ECS, Bedrock, etc.).
  • Distributed Map: process millions of items in parallel from S3.
  • Built-in error handling with Retry and Catch.

Amazon EventBridge

  • Serverless event bus for event-driven architectures.
  • 90+ AWS service events automatically; SaaS partner events; custom events.
  • Content-based filtering, event archive/replay, schema registry.
  • EventBridge Scheduler: one-time and recurring schedules (replaces CloudWatch Events cron).
  • EventBridge Pipes: point-to-point integration with filtering and enrichment.

Data & Storage

Amazon DynamoDB

  • Serverless NoSQL database – single-digit millisecond latency at any scale.
  • Capacity modes: On-Demand (pay per request) or Provisioned (with Auto Scaling).
  • DynamoDB Streams: capture item-level changes for event-driven processing.
  • Global Tables: multi-region, multi-active replication.
  • DAX: in-memory cache for microsecond read latency.
  • TTL: automatic item expiration at no cost.

Amazon S3

  • Serverless object storage – unlimited capacity, 11 nines durability.
  • Event notifications to Lambda, SQS, SNS, EventBridge.
  • S3 Object Lambda: transform data on retrieval using Lambda.

Amazon Aurora Serverless v2

  • Serverless relational database – scales instantly from 0.5 to 256 ACUs.
  • MySQL and PostgreSQL compatible.
  • Scales to zero (when paused) for development workloads.

Messaging

Amazon SQS

  • Serverless message queue – Standard (unlimited throughput) or FIFO (ordering + exactly-once).
  • Retention up to 14 days. Visibility timeout, delay queues, dead-letter queues.

Amazon SNS

  • Serverless pub/sub messaging – fan out to SQS, Lambda, HTTP, email, SMS.
  • Message filtering on attributes. FIFO topics for ordered fan-out.

Other Serverless Services

  • AWS AppSync – managed GraphQL and Pub/Sub API with real-time data sync.
  • Amazon Cognito – serverless user authentication, authorization, and user management.
  • AWS SAM (Serverless Application Model) – framework for building serverless applications (extends CloudFormation).
  • Amazon Kinesis Data Firehose – serverless streaming ETL to S3, Redshift, OpenSearch.
  • AWS Glue – serverless ETL for data preparation.
  • Amazon OpenSearch Serverless – serverless search and analytics.

Serverless Architecture Patterns

  • Synchronous API: API Gateway → Lambda → DynamoDB
  • Asynchronous Processing: S3 Event → SQS → Lambda → DynamoDB
  • Fan-out: SNS → multiple SQS queues → Lambda consumers
  • Orchestration: API Gateway → Step Functions → (Lambda + DynamoDB + SNS)
  • Event-driven: EventBridge → Lambda / Step Functions / SQS
  • Streaming: Kinesis → Lambda → DynamoDB / S3

AWS Certification Exam Practice Questions

  1. A serverless application needs to coordinate a multi-step order processing workflow that includes payment, inventory check, and shipping. Each step may take variable time and needs error handling with retries. Which service orchestrates this?
    1. Amazon SQS with multiple queues
    2. Lambda calling Lambda
    3. AWS Step Functions
    4. EventBridge rules
  2. A Lambda function needs to process messages from an SQS queue but should handle no more than 5 messages concurrently to avoid overwhelming a downstream API. What controls this?
    1. SQS visibility timeout
    2. Lambda Reserved Concurrency set to 5
    3. SQS MaximumBatchSize
    4. Lambda timeout
  3. An application needs a serverless relational database that automatically scales based on load and has zero cost when there are no connections. Which service provides this?
    1. DynamoDB On-Demand
    2. RDS with Auto Scaling
    3. Aurora Serverless v2 (with pause)
    4. ElastiCache Serverless
  4. A company wants to create a REST API with caching, request validation, API keys for rate limiting, and AWS WAF integration. Which API Gateway type should they use?
    1. HTTP API
    2. REST API
    3. WebSocket API
    4. AppSync

Related Posts

References

AWS Serverless Overview

AWS Lambda Developer Guide

AWS Step Functions Developer Guide