What is the difference between Kinesis Data Streams and Amazon Data Firehose?

Kinesis Data Streams is a real-time data streaming service where you build custom consumer applications (KCL, Lambda, SDK) with sub-second latency, data retention up to 365 days, and replay capability. Amazon Data Firehose is a fully managed delivery service that automatically loads streaming data into destinations like S3, Redshift, and OpenSearch with near real-time latency (seconds to minutes) and zero operational overhead — no custom code required.

When should I use Amazon Managed Service for Apache Flink instead of Kinesis Data Streams?

Use Managed Apache Flink when you need complex stream processing such as windowed aggregations, stream-to-stream joins, pattern detection (CEP), or stateful computations with exactly-once guarantees. Kinesis Data Streams is the ingestion layer — Flink reads from it to perform analytics. Use KDS alone when simple consumers (Lambda, KCL) can handle your processing logic without windowing or joins.

Can Kinesis Data Streams, Firehose, and Managed Flink be used together?

Yes, they are complementary. A common pattern: Kinesis Data Streams ingests data from producers, Managed Apache Flink reads from the stream for real-time analytics and complex processing, and Amazon Data Firehose consumes from the same stream to deliver raw data to S3 or a data lake for batch analytics.

What is Kinesis Data Streams On-Demand Advantage mode?

On-Demand Advantage (launched November 2025) provides instant scaling up to 10 GB/s via warm throughput capability, 60% lower per-GB pricing ($0.032/GB ingest, $0.016/GB retrieval), no per-stream hourly fee, support for up to 50 enhanced fan-out consumers, and 77% lower extended retention pricing. It requires a 25 MB/s minimum aggregate commitment across all on-demand streams in the account.

What happened to Amazon Kinesis Data Analytics?

Amazon Kinesis Data Analytics was renamed to Amazon Managed Service for Apache Flink in August 2023. The SQL-based version (Kinesis Data Analytics for SQL) was discontinued on January 27, 2026. Users should migrate to Managed Service for Apache Flink which supports Java, Scala, Python (PyFlink), and SQL via Flink Studio notebooks.

What is the difference between IAM Users and IAM Roles?

IAM Users have long-term credentials (passwords, access keys) and represent a specific person or application. IAM Roles have no long-term credentials; they provide temporary security credentials via AWS STS when assumed by a trusted entity (AWS service, user, or external identity). Roles are preferred for most use cases because temporary credentials are more secure and don't need manual rotation.

What is the difference between identity-based policies and resource-based policies?

Identity-based policies are attached to IAM users, groups, or roles and define what that identity can do. Resource-based policies are attached directly to resources (like S3 buckets or SQS queues) and define who can access that resource. Key difference: for cross-account access, resource-based policies allow the caller to retain their original permissions, while assuming a role means temporarily giving up original permissions.

What are permission boundaries in IAM?

Permission boundaries are advanced IAM features that set the maximum permissions an IAM entity (user or role) can have. They don't grant permissions themselves — they only limit what identity-based policies can grant. Effective permissions equal the intersection of the identity-based policy and the permission boundary. They're commonly used for delegation: allowing developers to create roles while ensuring those roles never exceed predefined permission limits.

How do SCPs differ from IAM policies?

Service Control Policies (SCPs) are organization-level guardrails that set maximum permissions for all IAM users and roles in member accounts. Unlike IAM policies, SCPs do NOT grant permissions — they only restrict. Even if an IAM policy allows an action, it will be denied if the SCP doesn't allow it. SCPs don't affect the management account and apply to all principals in member accounts, including the account root user (for service actions).

AWS VPC Lattice – Service-to-Service Networking

July 1, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

AWS VPC Lattice – Application-Layer Service Networking

Amazon VPC Lattice is a fully managed application networking service that connects, secures, and monitors service-to-service and service-to-resource communication across multiple VPCs and AWS accounts — without requiring VPC peering, Transit Gateway, or complex networking configurations. VPC Lattice operates at the application layer (Layer 7), providing built-in service discovery, traffic management, access controls, and observability for modern distributed architectures.

VPC Lattice is the AWS-recommended replacement for AWS App Mesh (which reaches End of Life on September 30, 2026) and eliminates the need for sidecar proxies, simplifying service mesh patterns significantly.

Key Benefits

No network complexity — Connect services across VPCs and accounts without VPC peering, Transit Gateway, or CIDR coordination.
Application-layer routing — Route based on HTTP path, headers, methods, and query parameters.
Built-in security — IAM-based auth policies for service-to-service authorization without code changes.
Compute agnostic — Works with EC2, ECS, EKS, Fargate, and Lambda targets in a single service.
No sidecars required — Unlike App Mesh or Istio, VPC Lattice operates as infrastructure — no Envoy proxy injection needed.
Overlapping CIDR support — Services in VPCs with identical IP ranges can communicate seamlessly.

VPC Lattice — Cross-VPC Service Networking
VPC A (Account 1)
ECS Service
Lambda Function
→
VPC Lattice Service Network
IAM Auth | Routing Rules | Observability
Service A
Service B
→
VPC B (Account 2)
EKS Pods
EC2 Instances
No VPC Peering needed • No Transit Gateway • Layer 7 with weighted routing

VPC Lattice Architecture

VPC Lattice introduces a layered architecture with distinct components that separate concerns between network administrators and service owners.

Service Network

A service network is a logical boundary — a collection of services and resource configurations that can communicate with each other.
VPCs are associated with a service network to enable client connectivity.
Clients in an associated VPC can automatically discover and connect to all services/resources in the service network.
Service networks can be shared across accounts using AWS Resource Access Manager (RAM).
Auth policies can be attached at the service network level for coarse-grained access control.
Multiple service networks can be associated with a single VPC using VPC endpoints of type “service network.”
Service networks are Regional resources.

Services

A service represents an independently deployable application unit (microservice).
Similar to an Application Load Balancer — consists of listeners, rules, and target groups.
Each service gets a unique DNS name (FQDN) automatically registered in Route 53.
Custom domain names are supported with SSL/TLS certificates via ACM.
Services can be associated with one or more service networks.
Fine-grained auth policies can be attached at the individual service level.

Target Groups

A target group is a collection of compute resources that run your application.
Supported target types:
- EC2 instances — Instance IDs
- IP addresses — For any IP-addressable resource
- Lambda functions — Serverless targets
- Application Load Balancers — Existing ALB targets
- Amazon ECS tasks — Direct ECS integration
- Kubernetes Pods — Via AWS Gateway API Controller
Health checks are supported to route traffic only to healthy targets.
Targets can span multiple Availability Zones for high availability.

Listeners

A listener checks for incoming connection requests using a specified protocol and port.
Supported protocols: HTTP, HTTPS, TLS Passthrough (for end-to-end encryption).
gRPC is supported over HTTP/2.
Each listener has a default action and can have multiple rules.

Rules

Rules define how the listener routes requests to target groups.
Conditions can match on: HTTP method, path pattern, headers, and query parameters.
Actions include: forward to target group(s), with optional weighted routing for traffic splitting.
Rules have priorities — evaluated in order from lowest to highest number.
A default rule handles requests that don’t match any other rule.

Resource Configurations and Resource Gateways (GA at re:Invent 2024)

A resource configuration represents a TCP-based resource (e.g., RDS database, IP address, domain name) that can be shared across VPCs/accounts.
A resource gateway is a point of ingress into the VPC where the resource resides, spanning multiple Availability Zones.
Enables private cross-account access to databases and other TCP resources without NLB or PrivateLink Endpoint Services.
Resource configurations can be shared via AWS RAM and associated with service networks.

Key Features

Cross-VPC and Cross-Account Communication Without Peering

VPC Lattice enables service-to-service connectivity across VPCs and accounts without VPC peering, Transit Gateway, or PrivateLink endpoint services.
No CIDR coordination required — supports overlapping IP address ranges between VPCs.
Network address translation is handled transparently between IPv4 and IPv6 address spaces.
Services are shared across accounts using AWS Resource Access Manager (RAM).
VPC association with a service network is the only requirement for connectivity.

Weighted Routing (Traffic Splitting)

Forward rules support weighted target groups for traffic distribution.
Assign weights (0-999) to multiple target groups within a single rule to split traffic proportionally.
Use cases:
- Blue/green deployments — Shift traffic gradually from old to new version.
- Canary releases — Send a small percentage (e.g., 5%) to the new version for validation.
- A/B testing — Split traffic between different service implementations.
Instant rollback by adjusting weights back to 100% on the stable target group.

Auth Policies with IAM (Authorization)

VPC Lattice uses IAM resource policies (auth policies) for service-to-service authorization.
Supports the standard IAM Principal-Action-Resource-Condition (PARC) model.
Auth policies can be applied at two levels:
- Service network level — Coarse-grained (e.g., “only authenticated requests from my AWS Organization”).
- Service level — Fine-grained (e.g., “only service A in account X can invoke POST /orders”).
Callers authenticate using AWS SigV4 (Signature Version 4) — the same signing protocol used for AWS API calls.
Supports conditions on: source VPC, source account, organization ID, request method, path.
Both layers (service network + service) must independently allow the request — defense in depth.

Mutual TLS (mTLS) and Encryption

VPC Lattice automatically generates and manages TLS certificates for each service via AWS Certificate Manager (ACM).
HTTPS listeners terminate TLS at the VPC Lattice data plane — callers do not need to manage certificates.
TLS Passthrough listeners (launched January 2025) enable end-to-end encryption — TLS is not terminated by VPC Lattice.
TLS Passthrough routes traffic based on the Server Name Indicator (SNI) field.
Supports mutual TLS (mTLS) for bidirectional authentication between client and service.
Client-side authentication uses SigV4 by default; mTLS adds certificate-based identity.

Observability (CloudWatch, Access Logs, X-Ray)

Access Logs — Detailed per-request logs including source/destination, latency, status codes, and error reasons.
- Can be sent to: Amazon CloudWatch Logs, Amazon S3, or Amazon Data Firehose.
- Available at service level, resource level, and service network level.
CloudWatch Metrics — Automatically published metrics for services and target groups (request count, latency, HTTP status codes, healthy/unhealthy targets).
AWS X-Ray — Distributed tracing integration for end-to-end request tracking across services.
VPC Flow Logs — Can capture network-level traffic to/from VPC Lattice endpoints.
No additional inter-AZ data transfer charges — all cross-AZ traffic is included in data processing charges.

Service Discovery (DNS-Based)

Each VPC Lattice service gets an auto-generated FQDN in an AWS-managed Route 53 public hosted zone.
When a VPC is associated with a service network, DNS resolution routes traffic to VPC Lattice data plane endpoints.
Custom domain names are supported — configure CNAME/Alias records in your own hosted zones.
No service mesh sidecar or agent required for discovery.

Availability Zone Affinity

VPC Lattice preferentially routes traffic to targets in the same Availability Zone as the client.
If the local AZ is unhealthy, traffic is automatically distributed to other AZs.
Reduces latency and avoids cross-AZ data transfer costs.

On-Premises Access

VPC endpoints of type “service network” (powered by AWS PrivateLink) enable on-premises clients to access VPC Lattice services.
Traffic can flow over AWS Direct Connect or Site-to-Site VPN → VPC endpoint → VPC Lattice service network.
Also supports access via VPC Peering or Transit Gateway through the VPC endpoint.

Integration with Compute Services

Amazon ECS

ECS tasks can be registered directly as VPC Lattice target group targets.
Supports both Fargate and EC2 launch types.
VPC Lattice replaces the need for internal ALBs or service discovery with Cloud Map.
AWS provides a migration guide from App Mesh to VPC Lattice for ECS workloads.
Works alongside ECS Service Connect — choose VPC Lattice for cross-account/cross-VPC scenarios.

Amazon EKS (Kubernetes)

Native integration via the AWS Gateway API Controller — an implementation of the Kubernetes Gateway API.
Define VPC Lattice services using Kubernetes-native Gateway, HTTPRoute, and GRPCRoute resources.
The controller automatically maps Kubernetes resources to VPC Lattice services, target groups, and listeners.
Supports EKS Pod Identity for simplified IAM authentication — pods can sign requests with SigV4.
Works with self-managed Kubernetes clusters (not just EKS).
No sidecar injection required — unlike Istio or App Mesh.

AWS Lambda

Lambda functions can be registered as targets in VPC Lattice target groups.
Enables serverless backends to participate in the same service network as container/instance workloads.
VPC Lattice invokes Lambda synchronously when requests are routed to Lambda targets.
Lambda functions can also act as clients — calling other VPC Lattice services using their DNS names.

Amazon EC2

EC2 instances can be registered by instance ID or IP address in target groups.
Supports Auto Scaling Group integration for dynamic target registration.
Applications on EC2 access VPC Lattice services via DNS — no SDK or agent installation required.
Health checks validate target availability before routing traffic.

Mixed Compute Environments

A single VPC Lattice service can have multiple target groups with different compute types.
Example: Route 80% of traffic to EKS pods, 20% to Lambda for canary testing.
Enables gradual migration between compute platforms without client-side changes.

VPC Lattice vs App Mesh vs API Gateway vs PrivateLink vs ALB

Feature	VPC Lattice	App Mesh (EOL Sept 2026)	API Gateway	PrivateLink	ALB
Primary Use Case	Service-to-service networking across VPCs/accounts	Service mesh with Envoy sidecars	External/internal API management	Private service exposure to consumers	Load balancing within a VPC
Layer	Layer 7 (HTTP/HTTPS/gRPC/TCP)	Layer 7 (HTTP/gRPC/TCP)	Layer 7 (REST/HTTP/WebSocket)	Layer 4 (TCP/UDP)	Layer 7 (HTTP/HTTPS/gRPC)
Cross-VPC	Yes (native, no peering needed)	Requires VPC connectivity (peering/TGW)	Yes (via public/private endpoints)	Yes (endpoint service model)	No (single VPC only)
Cross-Account	Yes (via AWS RAM)	Limited (shared mesh)	Yes (resource policies)	Yes (allow-listed accounts)	No
Sidecar Required	No	Yes (Envoy proxy)	No	No	No
IAM Auth Policies	Yes (SigV4, PARC model)	No (mTLS only)	Yes (IAM, Cognito, Lambda authorizers)	No (network-level only)	No (security groups only)
Weighted Routing	Yes	Yes	Yes (canary deployments)	No	Yes (target group weights)
Service Discovery	Built-in DNS (auto-generated FQDN)	Cloud Map integration	Custom domain + API endpoint	DNS name of VPC endpoint	DNS name of load balancer
Overlapping CIDRs	Supported	Not supported	N/A	Supported	N/A (single VPC)
Provider Requirement	None (service registration only)	Envoy sidecar per service	None	NLB or GWLB required	None
Pricing Model	Per service/hour + data + requests	Free (pay for Envoy compute)	Per request + data transfer	Per endpoint/hour + data	Per hour + LCU
Status	GA (active development)	EOL September 30, 2026	GA (active development)	GA (active)	GA (active)

When to Choose Which

VPC Lattice — Service-to-service communication across VPCs/accounts with IAM-based authorization. Best for internal microservice architectures.
API Gateway — External-facing APIs, rate limiting, API keys, request/response transformation, developer portal. Not designed for east-west traffic.
PrivateLink — Exposing a specific service to consumers (SaaS model) or accessing AWS services privately. Requires NLB on provider side.
ALB — Load balancing within a single VPC. Use with VPC Lattice when ALB is a target group target.
App Mesh — Legacy only. Migrate to VPC Lattice or ECS Service Connect before September 2026.

Use Cases

Microservices Communication

Connect hundreds of microservices running on mixed compute (EC2, ECS, EKS, Lambda) without managing load balancers per service.
Apply consistent security policies across all service-to-service traffic.
Use weighted routing for safe deployments (blue/green, canary).
Centralized observability for all inter-service communication.

Multi-Account Architectures

Share services and resources across organizational units using AWS RAM.
Central platform team manages service networks; application teams own their services.
Enforce organization-wide auth policies at the service network level.
No need to manage VPC peering connections or Transit Gateway attachments between accounts.

Service Mesh Replacement (App Mesh Migration)

Replace AWS App Mesh (EOL September 30, 2026) without application code changes.
Eliminate Envoy sidecar proxies — reduces compute costs and operational complexity.
Migration pattern: Create VPC Lattice services → register targets → shift traffic → remove App Mesh resources.
For ECS workloads, AWS also offers ECS Service Connect as an alternative for intra-cluster communication.

Zero Trust Networking

Implement defense-in-depth with multiple security layers:
1. VPC/service network association (network boundary)
2. Security groups on VPC-to-service-network associations
3. Service network auth policies (coarse-grained)
4. Service-level auth policies (fine-grained)
Every request must be authenticated (SigV4) and authorized (IAM policy evaluation) — no implicit trust based on network position.

Multi-Tenant SaaS Applications

Isolate tenant resources in separate VPCs while maintaining inter-service connectivity through VPC Lattice.
Use auth policies to enforce tenant-specific access controls.
Resource configurations enable secure, private access to shared databases across tenant accounts.

Hybrid and On-Premises Connectivity

On-premises applications access VPC Lattice services via VPC endpoints (type: service network) over Direct Connect or VPN.
Consolidate hybrid connectivity through a single VPC endpoint rather than multiple PrivateLink endpoints.

Pricing

VPC Lattice pricing has three dimensions for services and separate pricing for resource access:

Service Pricing (US East – N. Virginia)

Dimension	Price	Notes
Service hourly charge	$0.025/hour (~$18.25/month)	Per provisioned service
Data processing	$0.025/GB	Request + response data combined; includes cross-AZ
HTTP requests	$0.10 per 1M requests/hour (after first 300K free)	First 300,000 requests/hour are free per service
TCP connections (TLS listeners)	$0.10 per 1M connections/hour (after first 300K free)	For TLS Passthrough listeners only

Resource Access Pricing

Dimension	Price
Resource configuration hourly	$0.10/resource/hour (consumer pays)
Data processed (consumer)	$0.01/GB (first 1 PB), $0.006/GB (next 4 PB), $0.004/GB (5+ PB)
Data processed (provider)	$0.006/GB

Key Pricing Notes

No inter-AZ charges — Cross-AZ data transfer is included in the data processing charge.
VPC associations are free — No charge for associating VPCs with service networks.
VPC endpoints (type: service network) are free — No additional charge for service network endpoints.
Free tier for requests — First 300,000 HTTP requests (or TCP connections) per hour per service are free.
Prices vary by Region — check the VPC Lattice pricing page for current rates.

AWS Certification Exam Practice Questions

Question 1: A company has microservices running in multiple AWS accounts across different VPCs with overlapping CIDR ranges. They need service-to-service communication with IAM-based authorization and no infrastructure changes to existing VPCs. Which solution meets these requirements with the LEAST operational overhead?

Set up VPC peering between all VPCs and use security groups for authorization
Deploy AWS Transit Gateway and configure route tables for inter-VPC communication
Use Amazon VPC Lattice with a service network shared via AWS RAM and auth policies
Create PrivateLink endpoint services with NLBs in each provider VPC

Show Answer

Answer: C –

VPC Lattice supports overlapping CIDRs, provides native cross-VPC/cross-account connectivity without peering or Transit Gateway, and offers IAM-based auth policies. VPC peering (A) doesn’t support overlapping CIDRs. Transit Gateway (B) requires CIDR coordination and doesn’t provide application-layer authorization. PrivateLink (D) requires NLBs and doesn’t offer IAM-based service-to-service auth policies.

Question 2: A development team wants to perform a canary deployment where 5% of traffic goes to a new version of their service running on Lambda, while 95% continues to the existing version on ECS Fargate. The services are in the same VPC. Which VPC Lattice feature enables this?

Create separate services for each version with different DNS names
Configure a listener rule with weighted target groups — 95% to ECS target group, 5% to Lambda target group
Use auth policies to restrict 95% of callers to the old version
Deploy two service networks and associate the VPC with both

Show Answer

Answer: B –

VPC Lattice supports weighted routing across target groups within a single listener rule. You can assign weights to multiple target groups containing different compute types (ECS and Lambda), enabling canary deployments without changing client code or DNS configuration.

Question 3: A service network owner wants to enforce that only authenticated requests from their AWS Organization can access any service in the service network, while individual service owners can apply more specific access controls. How should this be configured?

Apply an auth policy at the service network level requiring aws:PrincipalOrgID condition, and let service owners apply service-level auth policies
Configure security groups on each service to allow only organization IP ranges
Use AWS WAF rules attached to the service network
Create IAM roles in each account with cross-account trust policies

Show Answer

Answer: A –

VPC Lattice supports auth policies at both the service network level (coarse-grained) and the service level (fine-grained). Both policies must independently allow the request — this implements defense in depth. The service network policy can enforce organization-wide requirements while service owners add specific conditions for their services.

Question 4: A company is migrating from AWS App Mesh (reaching EOL September 2026) to VPC Lattice for their ECS microservices. Which statements about this migration are TRUE? (Select TWO)

VPC Lattice requires Envoy sidecar proxies like App Mesh
VPC Lattice eliminates the need for sidecar proxies, reducing compute overhead
VPC Lattice provides IAM-based authorization that App Mesh did not offer
VPC Lattice requires VPC peering for cross-VPC communication
VPC Lattice only supports EKS workloads, not ECS

Show Answer

Answer: B, C

VPC Lattice operates as infrastructure without sidecar proxies (B is correct) — this reduces compute costs and eliminates proxy management. VPC Lattice provides IAM auth policies using SigV4 for service-to-service authorization (C is correct), which App Mesh did not offer (App Mesh relied on mTLS only). VPC Lattice does NOT require sidecars (A wrong), does NOT require VPC peering (D wrong), and supports ECS, EKS, Lambda, and EC2 (E wrong).

Question 5: An organization needs to provide private access to an Amazon RDS database from application VPCs in multiple AWS accounts without using VPC peering or Transit Gateway. Which VPC Lattice components are required? (Select THREE)

VPC Lattice service with listeners and target groups
Resource gateway in the VPC where the RDS database resides
Resource configuration defining the RDS database endpoint
Service network with VPC associations in consumer accounts
Network Load Balancer in front of the RDS database
AWS App Mesh virtual nodes for the database

Show Answer

Answer: B, C, D

For TCP resource access (like RDS), VPC Lattice uses resource gateways (B) as ingress points in the resource VPC, resource configurations (C) to define the resource, and service networks (D) for consumer VPC associations. This does NOT require a VPC Lattice service with listeners (A — that’s for HTTP services), an NLB (E — VPC Lattice removes this PrivateLink requirement), or App Mesh (F — which is deprecated).

Important Points for Certification Exams

VPC Lattice is a fully managed, Regional service — no infrastructure to deploy or manage.
It operates at Layer 7 (application layer) — not Layer 4 like PrivateLink.
Supports overlapping CIDRs — a key differentiator from VPC peering and Transit Gateway.
Auth policies use IAM and SigV4 — not API keys or Cognito tokens.
No sidecar proxies required — unlike App Mesh or Istio.
VPC Lattice is the recommended replacement for App Mesh (EOL September 30, 2026).
Resource configurations (GA 2024) extend VPC Lattice to TCP resources like databases.
Free tier includes 300,000 requests/hour per service.
No additional cross-AZ data transfer charges.
On-premises access is enabled via VPC endpoints of type “service network” over Direct Connect/VPN.

Frequently Asked Questions

What is AWS VPC Lattice?

VPC Lattice is an application-layer networking service that connects services across VPCs and accounts without VPC peering or transit gateways. It handles service discovery, routing, access control, and observability at Layer 7.

How is VPC Lattice different from PrivateLink?

PrivateLink provides one-way private connectivity to a specific endpoint. VPC Lattice enables bidirectional service-to-service communication with built-in routing rules, IAM auth policies, weighted targets, and cross-account service mesh capabilities.

Does VPC Lattice replace App Mesh?

Yes, AWS recommends VPC Lattice as the successor to App Mesh. VPC Lattice is simpler (no sidecar proxies needed), supports cross-VPC/account natively, and integrates with IAM for access control.

References

Kinesis Streams vs Firehose vs Flink – Compared

June 30, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

AWS Kinesis Data Streams vs Firehose vs Managed Apache Flink

AWS offers three core streaming services under the Kinesis umbrella, each designed for different use cases in real-time data processing pipelines. Understanding when to use Amazon Kinesis Data Streams, Amazon Data Firehose (formerly Kinesis Data Firehose), or Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) is critical for AWS certification exams and real-world architecture decisions.

📢 Service Naming Updates:

February 2024: Amazon Kinesis Data Firehose renamed to Amazon Data Firehose
August 2023: Amazon Kinesis Data Analytics renamed to Amazon Managed Service for Apache Flink
January 2026: Kinesis Data Analytics for SQL discontinued — migrate to Managed Service for Apache Flink

Kinesis Data Streams vs Firehose vs Managed Flink
Data Streams
Producers
↓
Shards
(retain 24h-365d)
↓
KCL / Lambda / EFO
Real-time (~70ms)
Firehose
Sources (Streams/Direct)
↓
Buffer + Transform
(Lambda/format)
↓
S3 / Redshift / OpenSearch
Near real-time (60s-900s)
Managed Flink
Streams / MSK / S3
↓
Apache Flink
(SQL/Java/Python)
↓
Any destination
Complex processing (ms)

Service Overview

Amazon Kinesis Data Streams

A real-time data streaming service that enables you to collect, process, and analyze streaming data with sub-second latency. You build custom consumer applications using KCL, AWS Lambda, or the SDK. It provides full control over data retention (up to 365 days), ordering guarantees via partition keys, and supports multiple simultaneous consumers.

Amazon Data Firehose

A fully managed delivery service that captures, transforms, and loads streaming data into AWS destinations (S3, Redshift, OpenSearch, Apache Iceberg tables, Snowflake) and third-party HTTP endpoints. It requires zero administration — no shards, no capacity planning, and no custom consumer code.

Amazon Managed Service for Apache Flink

A fully managed stream processing engine that runs Apache Flink applications for complex event processing, real-time analytics, and stateful computations. Supports Java, Scala, Python (PyFlink), and SQL. Provides exactly-once processing semantics, windowed aggregations, and joins across multiple streams.

Architecture Comparison

Kinesis Data Streams Architecture

Shard-based model: Each shard provides 1 MB/s write (1,000 records/s) and 2 MB/s read capacity
Producers → Stream (Shards) → Consumers — fully decoupled
Data is stored in shards and replicated across 3 Availability Zones
Records are ordered within a shard by sequence number
Three capacity modes: Provisioned, On-Demand Standard, On-Demand Advantage
Consumers pull data via GetRecords (shared throughput) or SubscribeToShard (enhanced fan-out with push model)

Amazon Data Firehose Architecture

Serverless pipeline: No shards, no partitions, fully managed infrastructure
Producers → Firehose Stream → Buffer → Transform (optional) → Destination
Automatically scales to match incoming data throughput (up to GB/s)
Buffers data by size (1–128 MB) or time (0–900 seconds) before delivery
Can source directly from producers (Direct PUT) or from Kinesis Data Streams
Supports inline transformation via AWS Lambda functions

Managed Apache Flink Architecture

Application-based model: Deploy Apache Flink applications (JAR/ZIP) to a managed cluster
Sources → Flink Application (operators, state) → Sinks
Uses Kinesis Processing Units (KPUs) — each KPU = 1 vCPU + 4 GB memory
Supports parallel processing across multiple KPUs with automatic scaling
Maintains application state with checkpointing and savepoints
Integrates with Kinesis Data Streams, MSK, S3, DynamoDB, and custom connectors as sources/sinks

Data Retention

Feature	Kinesis Data Streams	Amazon Data Firehose	Managed Apache Flink
Data Storage	Yes — stores data in stream	No — transit/delivery only	Application state only (checkpoints)
Default Retention	24 hours	N/A (buffered up to 15 min max)	N/A (state stored in checkpoints)
Extended Retention	Up to 7 days (additional charge)	N/A	N/A
Long-term Retention	Up to 365 days	N/A	Snapshots stored in S3 (indefinite)
Replay Capability	✅ Yes — within retention window	❌ No	✅ Yes — via savepoints/snapshots

Scaling Model

Kinesis Data Streams — Shards & On-Demand

Provisioned Mode: Manually add/remove shards. Each shard = 1 MB/s in, 2 MB/s out. You pay per shard-hour ($0.015/shard-hour in us-east-1).
On-Demand Standard Mode: Auto-scales up to 200 MB/s write (expandable to 1 GB/s). Per-stream hourly charge + per-GB pricing ($0.08/GB ingest, $0.04/GB retrieval).
On-Demand Advantage Mode (Nov 2025): Instant scaling up to 10 GB/s or 10M events/s via warm throughput. 60% lower per-GB pricing ($0.032/GB ingest, $0.016/GB retrieval). No per-stream hourly fee. Requires 25 MB/s minimum aggregate commitment.

Amazon Data Firehose — Fully Automatic

No capacity units to manage — scales automatically from KB/s to GB/s
No shard splitting, merging, or resharding operations
Pricing based purely on data volume ingested (per GB)
Scales buffer size and parallelism transparently

Managed Apache Flink — KPU-based Auto-scaling

Uses Kinesis Processing Units (KPUs): 1 KPU = 1 vCPU + 4 GB memory
Default limit: 64 KPUs per application (can request increase)
Auto-scaling: Scales out when CPU utilization exceeds 75% for 15 minutes
Parallelism configurable per operator (default 1, max 8 per KPU)
Also supports metric-based scaling and scheduled scaling policies

Consumers & Processing

Kinesis Data Streams Consumers

Shared Throughput (Classic): Multiple consumers share 2 MB/s per shard. ~200 ms latency via GetRecords polling.
Enhanced Fan-Out (EFO): Dedicated 2 MB/s per consumer per shard. Push-based via HTTP/2 SubscribeToShard. ~70 ms latency.
Consumer Options:
- KCL (Kinesis Client Library): Managed checkpointing, load balancing, shard assignment. KCL 1.x EOL January 30, 2026 — migrate to KCL 3.x.
- AWS Lambda: Event source mapping with automatic scaling, batching, and error handling.
- SDK: Low-level GetRecords/SubscribeToShard API calls for custom consumers.
- Amazon Data Firehose: As a consumer for delivery to destinations.
- Managed Apache Flink: As a consumer for complex stream processing.
Fan-out limits: Up to 50 EFO consumers per stream (On-Demand Advantage), 20 for other modes.

Amazon Data Firehose Consumers

No custom consumers — Firehose IS the consumer; it delivers to pre-defined destinations
Supported Destinations:
- Amazon S3 (including S3 Tables)
- Amazon Redshift (via S3 COPY)
- Amazon OpenSearch Service / OpenSearch Serverless
- Apache Iceberg Tables in S3
- Snowflake (via Snowpipe Streaming)
- Splunk
- HTTP endpoints (Datadog, Dynatrace, New Relic, MongoDB, Coralogix, Elastic, etc.)
Data Transformation: Inline via AWS Lambda (up to 3 min timeout per invocation)

Managed Apache Flink Processing

Sources: Kinesis Data Streams, Amazon MSK, Amazon S3, DynamoDB Streams, JDBC, custom Apache Flink connectors
Sinks: Kinesis Data Streams, Amazon Data Firehose, S3, DynamoDB, OpenSearch, JDBC, custom connectors
Languages: Java, Scala, Python (PyFlink), SQL (via Flink Studio notebooks)
Processing: Windowed aggregations, joins, pattern detection (CEP), stateful transformations

Latency Comparison

Service	Latency Type	Typical Latency	Best For
Kinesis Data Streams	Real-time	~70 ms (EFO) / ~200 ms (shared)	Sub-second event processing, real-time dashboards
Amazon Data Firehose	Near real-time / Micro-batch	~5 seconds (zero buffering) / 60–900 seconds (standard)	Log delivery, data lake loading, ETL pipelines
Managed Apache Flink	Real-time (continuous processing)	Milliseconds to seconds (application-dependent)	Complex event processing, windowed analytics, real-time ML inference

Pricing Models (US East – N. Virginia)

Kinesis Data Streams Pricing

Mode	Ingestion	Retrieval	Other Charges
Provisioned	$0.014 per 1M PUT payload units	Included (2 MB/s per shard)	$0.015/shard-hour
On-Demand Standard	$0.08/GB	$0.04/GB	$0.04/stream-hour
On-Demand Advantage	$0.032/GB	$0.016/GB	No per-stream fee; 25 MB/s min commitment

Additional charges: Extended retention (24h–7d): $0.020/shard-hour (Provisioned) or $0.023/GB-month (Advantage). Long-term retention (7d–365d): $0.023/GB-month (Advantage) vs $0.10/GB-month (Standard). Enhanced Fan-Out: $0.013/shard-hour + $0.016/GB (Advantage has no EFO surcharge).

Amazon Data Firehose Pricing

Ingestion: Tiered pricing starting at ~$0.029/GB for first 500 TB/month (billed in 5 KB increments)
Format conversion: $0.018/GB (Parquet/ORC conversion)
VPC delivery: Additional charges for VPC endpoints
Dynamic partitioning: $0.020/GB
No charges for: Delivery to destinations, scaling, or idle streams

Managed Apache Flink Pricing

KPU-hour: $0.11/KPU-hour (1 KPU = 1 vCPU + 4 GB RAM)
Running application storage: $0.10/GB-month (for application state/checkpoints)
Durable application backups (snapshots): $0.023/GB-month
Orchestration: $0.011/orchestration-hour per KPU
Billed per second with a 1-minute minimum
Auto-scaling adjusts KPUs automatically — you pay only for KPUs consumed

Data Transformation Capabilities

Capability	Kinesis Data Streams	Amazon Data Firehose	Managed Apache Flink
Built-in Transformation	❌ None (requires consumer code)	✅ Lambda-based transformation	✅ Full Apache Flink operators
Format Conversion	❌ Not built-in	✅ JSON → Parquet/ORC (via Glue schema)	✅ Any format (programmatic)
Compression	❌ Client-side only	✅ GZIP, Snappy, Zip, Hadoop Snappy	✅ Programmatic
Aggregation/Windowing	❌ Not built-in	❌ No (only batching by buffer)	✅ Tumbling, Sliding, Session windows
Joins	❌ Not built-in	❌ No	✅ Stream-stream & stream-table joins
Stateful Processing	❌ Consumer must manage state	❌ Stateless delivery	✅ Managed state with checkpointing
Complex Event Processing	❌ Custom code required	❌ No	✅ Pattern detection via Flink CEP

Ordering & Delivery Guarantees

Guarantee	Kinesis Data Streams	Amazon Data Firehose	Managed Apache Flink
Ordering	✅ Per-shard ordering via partition key	⚠️ Best-effort ordering (no strict guarantee)	✅ Per-key ordering with watermarks and event-time processing
Delivery Semantics	At-least-once (exactly-once with KCL deduplication)	At-least-once	Exactly-once for application state (with checkpointing)
Deduplication	Consumer responsibility (sequence numbers)	Not guaranteed (possible duplicates on retry)	Built-in via Flink’s checkpoint mechanism
Data Durability	3 AZ replication, up to 365 days retention	Retry with S3 backup for failed deliveries	Checkpoints stored in durable storage, snapshots in S3

Comprehensive Comparison Table

Feature	Kinesis Data Streams	Amazon Data Firehose	Managed Apache Flink
Type	Data streaming & ingestion	Data delivery & loading	Stream processing & analytics
Management	Semi-managed (shard management in Provisioned mode)	Fully managed (zero admin)	Fully managed (auto-scaling KPUs)
Scaling	Shards (manual) or On-Demand (auto up to 10 GB/s)	Automatic (unlimited, up to GB/s)	KPU auto-scaling (default max 64 KPUs)
Latency	~70 ms (EFO) to ~200 ms (shared)	~5s (zero buffer) to 60–900s	Milliseconds (continuous processing)
Data Retention	24 hours – 365 days	None (pass-through)	State in checkpoints/snapshots
Max Record Size	10 MiB (since Oct 2025)	1 MiB	Limited by available memory
Consumers	KCL, Lambda, SDK, EFO (up to 50)	Built-in delivery (no custom consumers)	Flink application sinks
Destinations	Any (custom consumer code)	S3, Redshift, OpenSearch, Iceberg, Snowflake, Splunk, HTTP endpoints	Kinesis, S3, DynamoDB, OpenSearch, MSK, JDBC, custom connectors
Transformation	None (consumer responsibility)	Lambda functions, format conversion	Full Flink operators (map, filter, join, window, CEP)
Ordering	Per-shard (partition key)	Best-effort	Per-key with event-time semantics
Delivery Semantics	At-least-once	At-least-once	Exactly-once (state)
Replay	✅ Yes	❌ No	✅ Yes (savepoints)
Pricing Model	Per shard-hour OR per GB (on-demand)	Per GB ingested (5 KB increments)	Per KPU-hour + storage
Languages/SDK	Any (via SDK/KCL)	N/A (configuration-based)	Java, Scala, Python, SQL
Use Case	Custom real-time apps, multiple consumers, replay	Simple data delivery/ETL to storage	Complex analytics, CEP, ML inference, joins

When to Choose Each Service

Choose Kinesis Data Streams When:

You need sub-second latency (real-time processing with ~70 ms via EFO)
You need multiple consumers reading the same stream independently
You require data replay capability (reprocess historical data)
You need ordering guarantees per partition key
You need data retention beyond delivery (audit, compliance — up to 365 days)
You want to build custom processing logic with KCL, Lambda, or SDK
You need large record support (up to 10 MiB)
Your architecture needs a durable message bus between producers and multiple downstream services

Choose Amazon Data Firehose When:

You need the simplest path to load data into S3, Redshift, OpenSearch, or Iceberg
You want zero operational overhead — no code, no capacity planning
Near real-time delivery (seconds to minutes) is acceptable
You need built-in format conversion (JSON to Parquet/ORC)
You need automatic compression and batching before storage
You’re building a data lake ingestion pipeline with minimal complexity
You need to deliver to third-party destinations (Datadog, Splunk, Snowflake, etc.)
You don’t need replay, multiple consumers, or strict ordering

Choose Managed Apache Flink When:

You need complex stream processing — windowed aggregations, joins, pattern detection
You need exactly-once processing guarantees for application state
You’re building real-time analytics with SQL or programmatic queries
You need to join multiple streams or enrich streams with reference data
You need event-time processing with watermarks (handling late/out-of-order events)
You’re implementing real-time ML scoring or anomaly detection
You need stateful computations (running totals, session tracking, deduplication)
You want an interactive notebook environment for stream exploration (Flink Studio)

Common Architecture Pattern: Using All Three Together

These services are complementary, not mutually exclusive. A common production pattern:

Kinesis Data Streams — Ingests and buffers high-volume event data from producers
Managed Apache Flink — Reads from the stream, performs real-time analytics (windowed counts, anomaly detection, enrichment), writes results to another stream or DynamoDB
Amazon Data Firehose — Consumes from Kinesis Data Streams and delivers raw/processed data to S3/Redshift data lake for batch analytics

AWS Certification Exam Practice Questions

Question 1:

A company collects clickstream data from its website at 500,000 events per second. The data must be processed in real-time to update a fraud detection dashboard with sub-second latency, while also being stored in S3 in Parquet format for batch analytics. Which architecture meets these requirements?

Send data directly to Amazon Data Firehose with Lambda transformation and zero buffering enabled
Send data to Kinesis Data Streams, use a Lambda consumer for real-time fraud detection, and configure Amazon Data Firehose as a second consumer to deliver to S3 in Parquet format
Send data to Amazon Data Firehose with two delivery destinations: one for the dashboard and one for S3
Send data directly to Amazon Managed Service for Apache Flink which writes to both the dashboard and S3

Show Answer

Answer: B –

Explanation: Kinesis Data Streams supports multiple consumers — Lambda provides sub-second processing for real-time fraud detection, while Firehose as a second consumer handles S3 delivery with Parquet format conversion. Firehose alone (A, C) cannot provide sub-second latency for the dashboard. Option D is possible but adds unnecessary complexity when Lambda suffices for the fraud detection logic.

Question 2:

A data engineering team needs to continuously aggregate IoT sensor readings into 5-minute tumbling windows, join the aggregated data with a reference table in DynamoDB, and detect anomalous patterns across multiple sensors. Which AWS service is BEST suited for this requirement?

Amazon Kinesis Data Streams with KCL consumers
Amazon Data Firehose with Lambda transformation
Amazon Managed Service for Apache Flink
AWS Lambda with Kinesis Data Streams event source mapping

Show Answer

Answer: C –

Explanation: Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) is purpose-built for windowed aggregations, stream-table joins, and complex event processing (CEP). KCL (A) and Lambda (D) would require custom state management for windows. Firehose (B) cannot perform windowed aggregations or joins.

Question 3:

A company wants to stream application logs to Amazon OpenSearch Service with minimal operational overhead and within 60 seconds of generation. They do not need to reprocess or replay the data. Which is the MOST operationally efficient solution?

Kinesis Data Streams with a KCL consumer that writes to OpenSearch
Amazon Data Firehose with OpenSearch as the destination
Amazon Managed Service for Apache Flink reading from Kinesis Data Streams and writing to OpenSearch
Kinesis Data Streams with an AWS Lambda consumer writing to OpenSearch

Show Answer

Answer: B –

Explanation: Amazon Data Firehose provides the most operationally efficient solution — it natively supports OpenSearch as a destination, requires no custom code, auto-scales, and can deliver within 60 seconds with standard buffering (or ~5 seconds with zero buffering). Since replay is not needed, Kinesis Data Streams (A, D) adds unnecessary complexity. Managed Flink (C) is overkill for simple log delivery.

Question 4:

A company uses Kinesis Data Streams with 5 enhanced fan-out consumers. During peak traffic, Consumer A is experiencing high latency while Consumer B processes normally. What is the MOST likely cause?

Enhanced fan-out consumers share throughput, so Consumer A is being throttled by Consumer B’s usage
Consumer A has a processing bottleneck in its application code, causing its iterator to fall behind
The stream has insufficient shards to support 5 enhanced fan-out consumers
Enhanced fan-out only supports up to 2 consumers per stream

Show Answer

Answer: B –

Explanation: Enhanced fan-out provides dedicated 2 MB/s throughput per consumer per shard — consumers do NOT share throughput (eliminating A). Since Consumer B is fine, the stream has adequate shards (eliminating C). Enhanced fan-out supports up to 20 consumers (Standard/Provisioned) or 50 (Advantage mode), not 2 (eliminating D). The issue is Consumer A’s own processing performance.

Question 5:

A company is evaluating pricing for their streaming workload: 50 GB/hour of data ingested continuously, single consumer, 24-hour retention, in us-east-1. They want the lowest-cost Kinesis Data Streams option. Which capacity mode should they choose?

Provisioned mode with calculated shard count
On-Demand Standard mode
On-Demand Advantage mode
Amazon Data Firehose instead of Kinesis Data Streams

Show Answer

Answer: C –

Explanation: At 50 GB/hour (~13.9 MB/s), the workload exceeds On-Demand Advantage’s 25 MB/s minimum aggregate commitment threshold (when considering retrieval). On-Demand Advantage costs $0.032/GB ingest + $0.016/GB retrieval = ~$57.60/day, vs On-Demand Standard at $0.08/GB + $0.04/GB + stream-hour fees = ~$145/day+. Provisioned mode (A) could be cheaper for very predictable loads but requires operational overhead for shard management. Firehose (D) is a different service that doesn’t provide retention or replay capabilities the company may need.

References

AWS VPC Explained – Beginner’s Guide to Networking

June 30, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

What is AWS VPC? (AWS VPC Explained)

Amazon Virtual Private Cloud (VPC) is your own private, isolated section of the AWS cloud where you launch resources like EC2 instances, databases, and load balancers. Think of it as your own private office building within a massive business park (AWS).

Real-World Analogy: VPC as a Private Office Building

Imagine AWS as a giant business park with thousands of buildings. When you create a VPC, you get your own building with:

Your own address range (CIDR block) — like having your own floor numbers and room numbers
Your own rooms (subnets) — different departments on different floors
Your own security guards (security groups & NACLs) — controlling who enters and exits
Your own reception desk (internet gateway) — managing visitors from outside
Your own internal hallways (route tables) — directing people to the right rooms

Without a VPC, your AWS resources would be exposed to everyone — like working in an open field. A VPC gives you walls, doors, and locks.

┌─────────────────────────────────────────────────────────┐
│                     AWS CLOUD                            │
│                                                         │
│   ┌─────────────────────────────────────────────┐       │
│   │            YOUR VPC (10.0.0.0/16)           │       │
│   │                                             │       │
│   │   ┌──────────────┐  ┌──────────────┐       │       │
│   │   │Public Subnet │  │Private Subnet│       │       │
│   │   │ 10.0.1.0/24  │  │ 10.0.2.0/24 │       │       │
│   │   │  [Web Server]│  │  [Database]  │       │       │
│   │   └──────────────┘  └──────────────┘       │       │
│   │                                             │       │
│   └─────────────────────────────────────────────┘       │
│                                                         │
└─────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────┐

│ AWS CLOUD │

│ │

│ ┌─────────────────────────────────────────────┐ │

│ │ YOUR VPC (10.0.0.0/16) │ │

│ │ │ │

│ │ ┌──────────────┐ ┌──────────────┐ │ │

│ │ │Public Subnet │ │Private Subnet│ │ │

│ │ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │ │

│ │ │ [Web Server]│ │ [Database] │ │ │

│ │ └──────────────┘ └──────────────┘ │ │

│ │ │ │

│ └─────────────────────────────────────────────┘ │

│ │

└─────────────────────────────────────────────────────────┘

AWS VPC Architecture
VPC (10.0.0.0/16)
Public Subnet (AZ-1)
EC2 (Web Server)
NAT Gateway
↕ Route to IGW
Private Subnet (AZ-1)
EC2 (App Server)
RDS (Database)
↕ Route to NAT GW
Public Subnet (AZ-2)
EC2 (Web Server)
NAT Gateway
↕ Route to IGW
Private Subnet (AZ-2)
EC2 (App Server)
RDS (Standby)
↕ Route to NAT GW
⬆⬇
Internet Gateway (IGW)
⬆⬇
Internet / Users

CIDR Notation Basics

Before diving deeper, you need to understand CIDR (Classless Inter-Domain Routing) notation — it’s how you define the size of your network.

How CIDR Works

A CIDR block looks like this: 10.0.0.0/16

10.0.0.0 — the starting IP address
/16 — how many bits are “locked” (the network portion)

The smaller the number after the slash, the MORE IP addresses you get:

CIDR Block	Number of IPs	Use Case
/16	65,536	Large VPC (maximum size)
/20	4,096	Medium subnet
/24	256	Standard subnet (most common)
/28	16	Smallest allowed (minimum size)

Simple Rule: Each step from /16 to /17 to /18… cuts the number of addresses in half.

AWS-specific note: AWS reserves 5 IP addresses in each subnet (first 4 and last 1). So a /24 subnet gives you 251 usable IPs, not 256.

Common Private IP Ranges for VPCs

10.0.0.0/16 — Most popular choice (10.0.0.0 to 10.0.255.255)
172.16.0.0/16 — Alternative range
192.168.0.0/16 — Familiar if you’ve used home routers

Default VPC vs. Custom VPC

Every AWS account comes with a default VPC in each Region. It’s like a starter apartment — convenient but limited.

Feature	Default VPC	Custom VPC
Created automatically?	Yes (one per Region)	No (you create it)
CIDR block	172.31.0.0/16	You choose (/16 to /28)
Subnets	One public subnet per AZ	You design the layout
Internet access	Yes (by default)	Only if you configure it
Best for	Quick testing, learning	Production workloads
Security posture	Open (all subnets are public)	Locked down by design

Best Practice: Use the default VPC for experimentation. Create custom VPCs for any real workload — you get full control over security, IP addressing, and network design.

Subnets: Public vs. Private

A subnet is a smaller segment of your VPC’s IP address range, placed in a specific Availability Zone (AZ). Think of subnets as individual rooms in your office building.

Public Subnet

Has a route to the Internet Gateway
Resources CAN have public IP addresses
Used for: Web servers, load balancers, bastion hosts
Analogy: The lobby of your building — visitors (internet traffic) can reach it

Private Subnet

NO direct route to the Internet Gateway
Resources cannot be reached from the internet
Used for: Databases, application servers, internal services
Analogy: The server room — only authorized internal staff can access it

┌─────────────────────────────────────────────────────────────┐
│                        YOUR VPC                              │
│                                                             │
│  ┌─────────────────────┐      ┌─────────────────────┐      │
│  │   PUBLIC SUBNET     │      │   PRIVATE SUBNET    │      │
│  │                     │      │                     │      │
│  │  [Web Server]       │      │  [Database]         │      │
│  │  [Load Balancer]    │      │  [App Server]       │      │
│  │                     │      │                     │      │
│  │  → Route to IGW ✓   │      │  → No IGW route ✗   │      │
│  └─────────────────────┘      └─────────────────────┘      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐

│ YOUR VPC │

│ │

│ ┌─────────────────────┐ ┌─────────────────────┐ │

│ │ PUBLIC SUBNET │ │ PRIVATE SUBNET │ │

│ │ │ │ │ │

│ │ [Web Server] │ │ [Database] │ │

│ │ [Load Balancer] │ │ [App Server] │ │

│ │ │ │ │ │

│ │ → Route to IGW ✓ │ │ → No IGW route ✗ │ │

│ └─────────────────────┘ └─────────────────────┘ │

│ │

└─────────────────────────────────────────────────────────────┘

Key Point: A subnet is public or private based on its route table, not a toggle switch. If the route table has a route to an Internet Gateway, it’s public.

Route Tables

A route table contains rules (routes) that determine where network traffic is directed. Every subnet must be associated with a route table.

Analogy: Route tables are like signs in a building directing you — “Lobby → left,” “Server room → right,” “Exit → through reception.”

How Route Tables Work

Destination	Target	Meaning
10.0.0.0/16	local	Traffic within VPC stays local
0.0.0.0/0	igw-xxxxx	All other traffic → Internet Gateway

Key Rules:

Every VPC has a main route table (default for all subnets)
You can create custom route tables for specific subnets
The local route (VPC internal traffic) cannot be removed
Most specific route wins (longest prefix match)

Internet Gateway (IGW)

An Internet Gateway is a horizontally scaled, redundant, and highly available VPC component that allows communication between your VPC and the internet.

Analogy: The front door of your office building. Without it, nobody from outside can enter, and nobody inside can leave to the internet.

Key Characteristics

Only one IGW per VPC
Supports both IPv4 and IPv6
Highly available — no bandwidth constraints
Must be attached to the VPC AND referenced in a route table to work

Making a Subnet Public (3 Steps)

Create and attach an Internet Gateway to your VPC
Add a route in the subnet’s route table: 0.0.0.0/0 → IGW
Ensure instances have public IP addresses (or Elastic IPs)

NAT Gateway

A NAT (Network Address Translation) Gateway allows instances in private subnets to access the internet (for software updates, API calls) WITHOUT allowing the internet to initiate connections back to them.

Analogy: A one-way mail slot. Your private servers can send letters out (make requests), but nobody outside can push mail back in unless it’s a reply.

How NAT Gateway Works

Private Subnet Instance → NAT Gateway (in Public Subnet) → Internet Gateway → Internet
         ↑                                                                        │
         └────────────────── Response comes back same path ───────────────────────┘

Private Subnet Instance → NAT Gateway (in Public Subnet) → Internet Gateway → Internet

↑ │

└────────────────── Response comes back same path ───────────────────────┘

Key Points

NAT Gateway lives in a public subnet
You add a route in the private subnet’s route table: 0.0.0.0/0 → nat-xxxxx
Managed by AWS — no patching required
Supports up to 45 Gbps bandwidth (scales automatically)
Charged per hour + per GB of data processed
Zonal NAT Gateway: operates in a single AZ (traditional)
Regional NAT Gateway (New 2025): automatically expands across AZs for high availability without manual setup

NAT Gateway vs. NAT Instance

Feature	NAT Gateway	NAT Instance
Managed by	AWS	You
Availability	Highly available in AZ	Depends on your setup
Bandwidth	Up to 45 Gbps	Depends on instance type
Maintenance	None required	You patch/update
Cost	Higher	Lower (but more effort)
Recommendation	✓ Use this	Only for cost savings

Security Groups vs. NACLs

AWS gives you two layers of network security. Understanding the difference is critical for the exam and real-world usage.

Security Groups (Instance-Level Firewall)

Analogy: A bodyguard assigned to each person (instance). The bodyguard decides who can talk to that person.

Operates at the instance level (attached to ENI)
Stateful — if inbound traffic is allowed, the response is automatically allowed
Supports ALLOW rules only (no deny rules)
All rules evaluated before deciding
Default: denies all inbound, allows all outbound

Example Security Group for a Web Server:

Type	Protocol	Port	Source	Description
Inbound	TCP	80	0.0.0.0/0	Allow HTTP from anywhere
Inbound	TCP	443	0.0.0.0/0	Allow HTTPS from anywhere
Inbound	TCP	22	203.0.113.0/32	Allow SSH from my IP only
Outbound	All	All	0.0.0.0/0	Allow all outbound

Network ACLs (Subnet-Level Firewall)

Analogy: A security checkpoint at each floor’s entrance. Everyone passing through that floor gets checked, regardless of which room they’re going to.

Operates at the subnet level
Stateless — inbound and outbound rules are evaluated independently
Supports both ALLOW and DENY rules
Rules evaluated in order by rule number (lowest first)
Default NACL: allows all inbound and outbound traffic

Example NACL for a Public Subnet:

Rule #	Type	Protocol	Port Range	Source/Dest	Allow/Deny
100	Inbound	TCP	80	0.0.0.0/0	ALLOW
110	Inbound	TCP	443	0.0.0.0/0	ALLOW
120	Inbound	TCP	1024-65535	0.0.0.0/0	ALLOW
*	Inbound	All	All	0.0.0.0/0	DENY
100	Outbound	TCP	80	0.0.0.0/0	ALLOW
110	Outbound	TCP	443	0.0.0.0/0	ALLOW
120	Outbound	TCP	1024-65535	0.0.0.0/0	ALLOW
*	Outbound	All	All	0.0.0.0/0	DENY

Security Groups vs. NACLs Comparison

Feature	Security Group	Network ACL
Level	Instance (ENI)	Subnet
Stateful/Stateless	Stateful	Stateless
Rules	Allow only	Allow AND Deny
Rule evaluation	All rules evaluated	Rules processed in order
Default (custom)	Deny all inbound	Deny all traffic
Default (default)	Allow internal	Allow all traffic
Applies to	Only if associated	All instances in subnet

Best Practice: Use Security Groups as your primary defense (easier to manage). Use NACLs as an additional layer for subnet-wide rules, like blocking a specific IP range.

VPC Peering

A VPC Peering Connection is a networking connection between two VPCs that enables traffic routing between them using private IP addresses. Instances in either VPC can communicate as if they are in the same network.

Analogy: Building a private bridge between two office buildings. Employees (resources) can walk between buildings directly without going outside (through the internet).

Key Rules

Works across different AWS accounts and different Regions
CIDR blocks must NOT overlap between peered VPCs
Not transitive — if VPC-A peers with VPC-B, and VPC-B peers with VPC-C, VPC-A CANNOT reach VPC-C through VPC-B
You must update route tables in BOTH VPCs
Security groups can reference the peered VPC’s security groups

    VPC-A (10.0.0.0/16)  ←──── Peering Connection ────→  VPC-B (172.16.0.0/16)
         │                                                      │
         │                      NOT transitive                  │
         │                                                      │
         ↓                                                      ↓
    VPC-C (192.168.0.0/16)  ← Must create separate peering →  VPC-B

VPC-A (10.0.0.0/16) ←──── Peering Connection ────→ VPC-B (172.16.0.0/16)

│ │

│ NOT transitive │

│ │

↓ ↓

VPC-C (192.168.0.0/16) ← Must create separate peering → VPC-B

When to Use VPC Peering

Connecting a development VPC to a production VPC
Sharing resources across AWS accounts
Simple one-to-one VPC connectivity

For complex multi-VPC architectures, consider AWS Transit Gateway instead — it acts as a central hub connecting multiple VPCs and on-premises networks.

VPC Endpoints

VPC Endpoints allow you to privately connect your VPC to supported AWS services without requiring an Internet Gateway, NAT Gateway, VPN, or AWS Direct Connect. Traffic never leaves the AWS network.

Analogy: Instead of leaving your building to visit the bank (AWS service), the bank opens a private counter inside your building. Faster, safer, and cheaper.

Types of VPC Endpoints

Type	How it Works	Supported Services	Cost
Gateway Endpoint	Route table entry pointing to the endpoint	S3, DynamoDB only	Free
Interface Endpoint (PrivateLink)	ENI with private IP in your subnet	Most AWS services (100+)	Per hour + per GB

Why Use VPC Endpoints?

Security: Traffic stays within AWS network (never traverses the internet)
Performance: Lower latency, more reliable
Cost savings: No NAT Gateway data processing charges for AWS service traffic
Compliance: Keep sensitive data off the public internet

Example: S3 Gateway Endpoint

Without endpoint: EC2 → NAT Gateway → Internet Gateway → S3 (over the internet)

With endpoint: EC2 → S3 Gateway Endpoint → S3 (private AWS network)

Step-by-Step: Creating a VPC with Public and Private Subnets

Here’s how to create a production-ready VPC from scratch using the AWS Console:

Step 1: Create the VPC

Go to VPC Console → “Create VPC”
Choose “VPC and more” (creates subnets, route tables, and gateways automatically) OR “VPC only” for manual setup
Name: my-app-vpc
IPv4 CIDR: 10.0.0.0/16 (65,536 addresses — plenty of room)
Click “Create VPC”

Step 2: Create Subnets

Create a Public Subnet: 10.0.1.0/24 in AZ us-east-1a
Create a Private Subnet: 10.0.2.0/24 in AZ us-east-1a
Create a Public Subnet: 10.0.3.0/24 in AZ us-east-1b (for high availability)
Create a Private Subnet: 10.0.4.0/24 in AZ us-east-1b

Step 3: Create and Attach an Internet Gateway

Create an Internet Gateway: my-app-igw
Attach it to your VPC

Step 4: Configure Route Tables

Public Route Table: Add route 0.0.0.0/0 → igw-xxxxx
Associate public subnets (10.0.1.0/24, 10.0.3.0/24) with this route table
Private Route Table: Keep only the local route (or add NAT Gateway route)
Associate private subnets (10.0.2.0/24, 10.0.4.0/24) with this route table

Step 5: Create a NAT Gateway (for private subnet internet access)

Create a NAT Gateway in one of the public subnets
Allocate an Elastic IP for the NAT Gateway
Update the Private Route Table: add 0.0.0.0/0 → nat-xxxxx

Step 6: Configure Security Groups

Web-SG: Allow inbound HTTP (80), HTTPS (443) from 0.0.0.0/0
App-SG: Allow inbound from Web-SG only on app port
DB-SG: Allow inbound from App-SG only on database port (3306/5432)

FINAL ARCHITECTURE:

Internet
    │
    ▼
┌─── Internet Gateway ───┐
│                        │
▼                        ▼
┌──────────────┐  ┌──────────────┐
│ Public Sub   │  │ Public Sub   │
│ (AZ-1a)     │  │ (AZ-1b)     │
│ [Web Server] │  │ [Web Server] │
│ [NAT GW]    │  │              │
└──────┬───────┘  └──────────────┘
       │
       ▼
┌──────────────┐  ┌──────────────┐
│ Private Sub  │  │ Private Sub  │
│ (AZ-1a)     │  │ (AZ-1b)     │
│ [App Server] │  │ [App Server] │
│ [Database]  │  │ [Database]  │
└──────────────┘  └──────────────┘

FINAL ARCHITECTURE:

Internet

│

▼

┌─── Internet Gateway ───┐

│ │

▼ ▼

┌──────────────┐ ┌──────────────┐

│ Public Sub │ │ Public Sub │

│ (AZ-1a) │ │ (AZ-1b) │

│ [Web Server] │ │ [Web Server] │

│ [NAT GW] │ │ │

└──────┬───────┘ └──────────────┘

│

▼

┌──────────────┐ ┌──────────────┐

│ Private Sub │ │ Private Sub │

│ (AZ-1a) │ │ (AZ-1b) │

│ [App Server] │ │ [App Server] │

│ [Database] │ │ [Database] │

└──────────────┘ └──────────────┘

Quick Reference Table

Component	What It Does	Key Facts
VPC	Isolated virtual network	Max size /16, min /28. Up to 5 CIDRs per VPC (adjustable to 50).
Subnet	Segment of VPC in one AZ	Public = has IGW route. AWS reserves 5 IPs per subnet.
Internet Gateway	Connects VPC to internet	One per VPC. Highly available. No bandwidth limit.
NAT Gateway	Private subnet → internet (outbound only)	Lives in public subnet. Up to 45 Gbps. Charged per hour + data.
Route Table	Directs traffic	Local route always present. Most specific route wins.
Security Group	Instance-level firewall	Stateful. Allow rules only. All rules evaluated.
Network ACL	Subnet-level firewall	Stateless. Allow + Deny. Rules processed in order.
VPC Peering	Connect two VPCs privately	Non-transitive. No overlapping CIDRs. Cross-account/region OK.
VPC Endpoint	Private access to AWS services	Gateway (S3, DynamoDB – free). Interface (most services – paid).
Elastic IP	Static public IPv4 address	Free when attached to a running instance. Charged when unused.

Practice Questions

Question 1

You have an EC2 instance in a private subnet that needs to download software updates from the internet. Which combination enables this?

Attach an Internet Gateway and assign a public IP to the instance
Create a NAT Gateway in a public subnet and add a route in the private subnet’s route table
Create a VPC Endpoint for the update server
Add a route to 0.0.0.0/0 in the private subnet pointing to the Internet Gateway

Show Answer

Answer: B – A NAT Gateway in a public subnet allows private instances to access the internet for outbound traffic without being directly accessible from the internet. Option A would make it a public subnet. Option D would not work without a public IP on the instance.

Question 2

What makes a subnet “public” in AWS?

It has “public” in its name tag
Auto-assign public IP is enabled
Its route table has a route to an Internet Gateway
It is in the default VPC

Show Answer

Answer: C – A subnet is public when its associated route table contains a route directing internet-bound traffic (0.0.0.0/0) to an Internet Gateway. The name and auto-assign IP settings don’t determine this.

Question 3

Your security team wants to block all traffic from a specific IP range (203.0.113.0/24) at the subnet level. Which should you use?

Security Group with a deny rule
Network ACL with a deny rule
Route table blackhole route
AWS WAF rule

Show Answer

Answer: B – Network ACLs support both ALLOW and DENY rules and operate at the subnet level. Security Groups only support ALLOW rules, so you cannot explicitly deny specific IPs with them.

Question 4

VPC-A (10.0.0.0/16) is peered with VPC-B (172.16.0.0/16). VPC-B is peered with VPC-C (192.168.0.0/16). Can instances in VPC-A communicate with VPC-C through VPC-B?

Yes, VPC peering is transitive
Yes, if route tables are configured correctly in VPC-B
No, VPC peering is NOT transitive
Yes, but only for ICMP traffic

Show Answer

Answer: C – VPC peering is non-transitive. VPC-A must create a direct peering connection with VPC-C to communicate. Traffic cannot pass through VPC-B as an intermediary.

Question 5

An application in a private subnet needs to access S3 without traversing the internet. What’s the most cost-effective solution?

Create a NAT Gateway and access S3 over the internet
Create an S3 Gateway Endpoint
Create an S3 Interface Endpoint (PrivateLink)
Peer the VPC with the S3 VPC

Show Answer

Answer: B – S3 Gateway Endpoints are free and provide private access to S3 without requiring a NAT Gateway. Interface Endpoints (Option C) would also work but incur hourly and data charges. NAT Gateway (Option A) would add unnecessary cost.

What’s New in AWS VPC (2025-2026)

VPC Encryption Controls (Nov 2025): Enforce encryption in transit for all traffic within and across VPCs using monitor and enforce modes — no application changes needed.
Regional NAT Gateways (Nov 2025): A single NAT Gateway that automatically expands across Availability Zones based on your workload, providing automatic high availability without manual multi-AZ setup.
Amazon VPC Lattice: A fully managed service for connecting, securing, and monitoring service-to-service communication across VPCs and accounts — ideal for microservices architectures.
Amazon VPC IPAM: Centrally plan, track, and monitor IP addresses across your AWS organization.

Summary

AWS VPC is the foundation of cloud networking. Every resource you deploy sits inside a VPC. Here’s the key takeaway:

VPC = Your private network in AWS (an isolated building)
Subnets = Rooms in your building (public-facing or private)
Route Tables = Signs directing traffic to the right destination
Internet Gateway = Your front door to the internet
NAT Gateway = One-way outbound access for private resources
Security Groups = Personal bodyguards (stateful, allow-only)
NACLs = Floor-level security checkpoints (stateless, allow + deny)
VPC Peering = Private bridge between two VPCs
VPC Endpoints = Private counter for AWS services inside your VPC

Start with the default VPC for learning, then build custom VPCs for production. Always place databases in private subnets and web servers in public subnets. Use security groups as your primary defense, and add NACLs for subnet-wide rules.

Frequently Asked Questions

What is a VPC in AWS?

A Virtual Private Cloud (VPC) is your own isolated network within AWS where you launch resources. Think of it like renting a private floor in an office building — you control the layout (subnets), doors (gateways), and security (security groups/NACLs).

What is the difference between a public and private subnet?

A public subnet has a route to an Internet Gateway, allowing resources with public IPs to communicate directly with the internet. A private subnet has no internet route — resources can only access the internet through a NAT Gateway for outbound traffic.

Do I need to create a VPC to use AWS?

No, every AWS account comes with a default VPC in each region with public subnets, an internet gateway, and default security groups. However, for production workloads, creating a custom VPC with public and private subnets is recommended for better security.

IAM Roles vs Policies vs Users – Complete Guide

June 30, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

AWS IAM Roles vs Policies vs Users – Complete Guide

AWS Identity and Access Management (IAM) is the foundation of security in AWS. Understanding the differences between IAM Users, IAM Roles, and IAM Policies is critical for both real-world AWS security and AWS certification exams. This guide provides a comprehensive comparison of these three core IAM components, explains how they work together, and covers best practices for 2025-2026.

IAM — How Users, Roles & Policies Work Together
AWS Account
IAM User
Long-term credentials
(Access Key / Password)
IAM Role
Temporary credentials
(AssumeRole → STS)
IAM Group
Collection of Users
(attach policies to group)
↓ All get permissions from ↓
Identity Policy
Attached to User/Role/Group
Resource Policy
Attached to S3/SQS/KMS
Permission Boundary
Max permissions cap
SCP
Organization guardrail

IAM Overview – The Big Picture

IAM controls who (authentication) can do what (authorization) on which resources in your AWS account. Every single AWS API call passes through IAM for evaluation before reaching the target service.

The three core building blocks are:

IAM Users – Identities representing people or applications with long-term credentials
IAM Roles – Identities with temporary credentials that can be assumed by trusted entities
IAM Policies – JSON documents that define permissions (allow/deny actions on resources)

Component	What It Is	Credentials	Primary Use Case
IAM User	An identity (person or app)	Long-term (password, access keys)	Legacy human access, service accounts
IAM Role	An assumable identity	Temporary (STS tokens, 1-12 hrs)	Services, cross-account, federation
IAM Policy	A permissions document	N/A (attached to identities/resources)	Define what actions are allowed/denied

IAM Users

What Are IAM Users?

An IAM User is an identity within your AWS account that represents a person or application. Each IAM user has:

A unique name within the AWS account
Long-term credentials – either a password (for console access) or access keys (for programmatic access)
A unique ARN – e.g., arn:aws:iam::123456789012:user/jayendra

IAM User Credentials

Credential Type	Usage	Rotation
Password	AWS Console sign-in	Configurable via password policy
Access Key ID + Secret Access Key	CLI, SDK, API calls	Manual (max 2 active keys per user)
MFA Device	Additional authentication factor	Virtual (TOTP) or hardware

When to Use IAM Users

Third-party integrations that cannot use IAM roles or OIDC federation
Break-glass emergency access when federation is unavailable
Very small teams (1-2 people) without AWS Organizations

⚠️ Best Practice (2025-2026): AWS strongly recommends using IAM Identity Center (formerly AWS SSO) instead of IAM users for human access. IAM Identity Center provides temporary credentials, centralized management, and integrates with external identity providers (Okta, Entra ID, Google Workspace). Reserve IAM users for legacy workloads and specific programmatic use cases.

IAM Groups

IAM Groups are collections of IAM users that simplify permission management:

Attach policies to a group; all users in the group inherit those permissions
A user can belong to up to 10 groups
Groups cannot be nested (no group within a group)
Groups are not identities – they cannot be referenced in resource-based policies or assume roles

IAM Roles

What Are IAM Roles?

An IAM Role is an identity with specific permissions that can be assumed by trusted entities. Unlike users, roles do not have long-term credentials. When an entity assumes a role, AWS STS (Security Token Service) provides temporary security credentials consisting of:

Access Key ID
Secret Access Key
Session Token
Expiration timestamp (default 1 hour, configurable 15 min to 12 hours)

Role Components

Component	Purpose	Example
Trust Policy	Defines WHO can assume the role	EC2 service, another account, OIDC provider
Permissions Policy	Defines WHAT the role can do	Read S3, write DynamoDB
Permission Boundary	Sets maximum permissions ceiling	Restrict to specific services only
Session Duration	How long credentials are valid	1 hour (default), max 12 hours

Trust Policy Example

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"Service": "ec2.amazonaws.com"

"Action": "sts:AssumeRole"

}

]

}

This trust policy allows the EC2 service to assume this role, enabling EC2 instances to use the role’s permissions.

Types of IAM Roles

Role Type	Trusted Entity	Use Case
Service Role	AWS Service (EC2, Lambda, ECS)	EC2 instance accessing S3, Lambda accessing DynamoDB
Cross-Account Role	Another AWS account	Account A accessing resources in Account B
Federation Role	Identity Provider (SAML/OIDC)	Corporate SSO users accessing AWS
Service-Linked Role	Specific AWS service	Pre-defined by AWS, cannot modify permissions

Cross-Account Access with Roles

Cross-account roles enable secure access between AWS accounts without sharing long-term credentials:

Account B (resource owner) creates a role with a trust policy allowing Account A
Account A (requester) grants its users/roles permission to call sts:AssumeRole on Account B’s role
User in Account A calls AssumeRole and receives temporary credentials for Account B

// Trust policy in Account B's role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111111:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "unique-external-id"
        }
      }
    }
  ]
}

// Trust policy in Account B's role

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Principal": {

"AWS": "arn:aws:iam::111111111111:root"

"Action": "sts:AssumeRole",

"Condition": {

"StringEquals": {

"sts:ExternalId": "unique-external-id"

}

]

}

Role Chaining

Role chaining occurs when a role assumes another role. Key limitations:

Maximum session duration is limited to 1 hour (regardless of role’s max session setting)
CloudTrail logs each AssumeRole call in the chain
The final role’s permissions apply (not a union of all roles in the chain)

IAM Roles Anywhere

Launched in 2022 and enhanced through 2025, IAM Roles Anywhere extends IAM roles to workloads outside AWS using X.509 certificates. On-premises servers, IoT devices, and other non-AWS workloads can obtain temporary AWS credentials without long-term access keys.

IAM Policies

What Are IAM Policies?

IAM Policies are JSON documents that define permissions. They specify which actions are allowed or denied on which resources under what conditions. Policies do not grant access on their own – they must be attached to an identity (user, group, role) or a resource.

Policy Structure

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowS3ReadAccess",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ],
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "192.168.1.0/24"
        }
      }
    }
  ]
}

{

"Version": "2012-10-17",

"Statement": [

{

"Sid": "AllowS3ReadAccess",

"Effect": "Allow",

"Action": [

"s3:GetObject",

"s3:ListBucket"

"Resource": [

"arn:aws:s3:::my-bucket",

"arn:aws:s3:::my-bucket/*"

"Condition": {

"IpAddress": {

"aws:SourceIp": "192.168.1.0/24"

}

]

}

Policy Elements

Element	Required	Description
Version	Yes	Always use “2012-10-17” (current version)
Statement	Yes	Array of permission statements
Sid	No	Statement identifier (optional description)
Effect	Yes	Allow or Deny
Principal	Resource policies only	Who the policy applies to
Action	Yes	API actions (e.g., s3:GetObject)
Resource	Yes	ARN of resources the policy applies to
Condition	No	When the policy is in effect

Types of IAM Policies

Identity-Based Policies vs Resource-Based Policies

Aspect	Identity-Based Policies	Resource-Based Policies
Attached to	Users, Groups, Roles	Resources (S3 bucket, SQS queue, KMS key)
Principal element	Not used (implied by attachment)	Required (specifies who gets access)
Cross-account	Requires role assumption	Can grant access directly to another account
Types	Managed and Inline	Inline only (embedded in resource)
Examples	AmazonS3ReadOnlyAccess	S3 Bucket Policy, SQS Queue Policy, KMS Key Policy

💡 Key Difference for Cross-Account Access: With resource-based policies, the principal retains permissions from their original account (no need to give up permissions). With IAM roles, the assuming entity temporarily gives up their original permissions and takes on only the role’s permissions.

AWS Managed vs Customer Managed vs Inline Policies

Type	Created By	Reusable	Versioning	Use When
AWS Managed	AWS	Yes (across accounts)	Yes (AWS updates)	Standard permissions for common use cases
Customer Managed	You	Yes (within account)	Yes (up to 5 versions)	Custom, reusable permissions specific to your org
Inline	You	No (1:1 with identity)	No	Strict 1:1 relationship, deleted with identity

Permission Boundaries

A permission boundary is an advanced feature that sets the maximum permissions an IAM entity (user or role) can have. It acts as a guardrail:

The effective permissions = intersection of identity-based policies AND permission boundary
Even if an identity-based policy grants s3:*, if the permission boundary only allows s3:GetObject, the effective permission is only s3:GetObject
Used for delegation – allow developers to create roles but only within defined boundaries
Permission boundaries do NOT grant permissions on their own

Service Control Policies (SCPs)

SCPs are organization-level policies in AWS Organizations that set permission guardrails across all accounts:

Apply to all IAM users and roles in member accounts (including root user for service actions)
Do NOT grant permissions – they only restrict what’s allowed
Effective permissions = SCP ∩ Identity-based policies ∩ Permission boundaries
Can be attached at Organization root, OU, or account level
Do NOT affect the management account
As of 2025, SCPs support the full IAM policy language including conditions in Allow statements and NotAction

Resource Control Policies (RCPs)

Introduced in 2024, Resource Control Policies (RCPs) complement SCPs by controlling access to resources rather than identities:

Restrict which principals (including external accounts) can access resources in your organization
Applied at Organization root, OU, or account level
Complement SCPs (identity guardrails) with resource-level guardrails

IAM Policy Evaluation Logic

When an AWS API call is made, IAM evaluates all applicable policies in a specific order:

Explicit Deny – If ANY policy explicitly denies the action → DENIED (always wins)
Resource Control Policies (RCPs) – Organization-level resource guardrails
Service Control Policies (SCPs) – Organization-level identity guardrails
Resource-Based Policies – If the resource policy allows the caller → may ALLOW (same account)
Permission Boundaries – Must allow the action for it to proceed
Session Policies – If using assumed role with session policy, must allow
Identity-Based Policies – Must explicitly allow the action

💡 Key Rule: By default, all requests are implicitly denied. An explicit Allow is required (from the appropriate policy type), and an explicit Deny always overrides any Allow.

Same-Account vs Cross-Account Evaluation

Scenario	Requirements
Same account	Either identity-based OR resource-based policy can grant access
Cross-account	BOTH the identity-based policy (in caller’s account) AND resource-based policy (in resource’s account) must allow access

How Users, Roles, and Policies Work Together

Visual Hierarchy

AWS Organization
├── Service Control Policies (SCPs) ──── Guardrails for all accounts
│
├── AWS Account
│   ├── IAM Users ─────────────── Long-term identities
│   │   ├── Identity-based Policies (attached)
│   │   ├── Group Policies (inherited)
│   │   └── Permission Boundary (ceiling)
│   │
│   ├── IAM Roles ─────────────── Assumable identities
│   │   ├── Trust Policy (who can assume)
│   │   ├── Permissions Policies (what it can do)
│   │   └── Permission Boundary (ceiling)
│   │
│   └── Resources ─────────────── S3 buckets, KMS keys, etc.
│       └── Resource-based Policies (who can access this resource)
│
└── Resource Control Policies (RCPs) ── Guardrails for resources

AWS Organization

├── Service Control Policies (SCPs) ──── Guardrails for all accounts

│

├── AWS Account

│ ├── IAM Users ─────────────── Long-term identities

│ │ ├── Identity-based Policies (attached)

│ │ ├── Group Policies (inherited)

│ │ └── Permission Boundary (ceiling)

│ │

│ ├── IAM Roles ─────────────── Assumable identities

│ │ ├── Trust Policy (who can assume)

│ │ ├── Permissions Policies (what it can do)

│ │ └── Permission Boundary (ceiling)

│ │

│ └── Resources ─────────────── S3 buckets, KMS keys, etc.

│ └── Resource-based Policies (who can access this resource)

│

└── Resource Control Policies (RCPs) ── Guardrails for resources

Common Patterns

Pattern 1: EC2 Instance Accessing S3

Create an IAM Role with a trust policy for ec2.amazonaws.com
Attach an identity-based policy granting S3 permissions to the role
Attach the role to the EC2 instance (Instance Profile)
Application on EC2 uses temporary credentials automatically via instance metadata

Pattern 2: Lambda Function Accessing DynamoDB

Create an IAM Role with a trust policy for lambda.amazonaws.com
Attach policies granting DynamoDB access
Assign the role as the Lambda execution role
Lambda automatically assumes the role on each invocation

Pattern 3: Cross-Account S3 Access

Option A (Role-based): Create a role in Account B, trust Account A, grant S3 access
Option B (Resource-based): Add a bucket policy in Account B allowing Account A’s principal

Pattern 4: Developer Permission Delegation

Admin creates a permission boundary policy (e.g., only allow S3 and DynamoDB actions)
Admin grants developer permission to create roles WITH the permission boundary
Developer creates roles freely, but those roles can never exceed the boundary’s permissions

Best Practices (2025-2026)

1. Use IAM Identity Center for Human Access

Replace IAM users with IAM Identity Center (formerly AWS SSO)
Connect your existing IdP (Okta, Entra ID, Google Workspace)
Provides temporary credentials with automatic token refresh
Single sign-on to all AWS accounts and applications
No additional AWS charge for IAM Identity Center

2. Enforce Least Privilege

Start with minimal permissions and add as needed
Use IAM Access Analyzer to generate policies based on actual usage (CloudTrail)
Review and remove unused permissions regularly
Use last accessed information to identify unused services
Prefer specific actions over wildcards (s3:GetObject not s3:*)

3. Require MFA

Enable MFA for all IAM users (especially those with console access)
Use policy conditions to require MFA for sensitive operations: "Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
Prefer FIDO2/WebAuthn security keys over TOTP virtual MFA

4. Use Roles Over Users

EC2 instances → use Instance Profiles (IAM Roles)
Lambda functions → use Execution Roles
Cross-account access → use IAM Roles with AssumeRole
On-premises workloads → use IAM Roles Anywhere
CI/CD pipelines → use OIDC federation (GitHub Actions, GitLab)

5. Implement Guardrails at Scale

Use SCPs to prevent dangerous actions organization-wide (e.g., deny leaving organization, deny disabling CloudTrail)
Use Permission Boundaries for delegation (developers creating their own roles)
Use RCPs to restrict external access to your resources
Tag-based access control (ABAC) for scalable permissions management

6. Monitor and Audit

Enable CloudTrail in all accounts and regions
Use IAM Access Analyzer to detect external and unused access
Review IAM credential reports regularly
Set up alerts for root account usage

When to Use Each – Decision Guide

Scenario	Use	Why
Human accessing AWS Console	IAM Identity Center	Centralized, temporary credentials, SSO
EC2 instance needs AWS API access	IAM Role (Instance Profile)	Temporary credentials, auto-rotated
Lambda function accessing DynamoDB	IAM Role (Execution Role)	Service role, no static credentials
Account A accessing Account B resources	IAM Role (Cross-Account)	Secure delegation without sharing keys
Third-party SaaS needing AWS access	IAM Role (External ID)	Prevents confused deputy, no shared secrets
CI/CD pipeline (GitHub Actions)	IAM Role (OIDC Federation)	No stored AWS secrets in CI system
On-premises server accessing S3	IAM Roles Anywhere	X.509 certs, temporary credentials
Restricting max permissions for developers	Permission Boundary	Delegation without privilege escalation
Organization-wide security guardrails	SCPs	Prevent dangerous actions across all accounts
Allow external account to access S3 bucket	Resource-Based Policy	Principal retains own permissions
Emergency break-glass access	IAM User (with MFA)	When federation/SSO is unavailable

AWS Certification Exam Tips

🎯 Exam Tips for SAA-C03, SAP-C02, SCS-C02, DVA-C02:

Explicit Deny always wins – No matter what allows exist, a single deny overrides everything
Roles for services, users for legacy – If the question mentions EC2/Lambda needing access, the answer involves a role, not access keys
Cross-account = role OR resource policy – Know when each is appropriate. Resource-based policies let the caller keep their original permissions
Permission boundaries don’t grant permissions – They only restrict. Effective = identity policy ∩ boundary
SCPs don’t affect management account – Common exam trick question
External ID prevents confused deputy – Required for third-party cross-account access
Role chaining limits session to 1 hour – Even if role’s max is 12 hours
Groups are NOT identities – Cannot be principal in a resource policy, cannot assume roles
IAM is global – Not region-specific (but STS endpoints can be regional)
AssumeRoleWithWebIdentity vs AssumeRoleWithSAML – Web identity for OIDC (Cognito, GitHub), SAML for enterprise IdPs

IAM Roles vs Policies vs Users – Practice Questions

Question 1

A company wants its EC2 instances to securely access objects in an S3 bucket without storing credentials on the instances. What is the recommended approach?

Create an IAM user, generate access keys, and store them in environment variables on the EC2 instance
Create an IAM role with S3 permissions and associate it with the EC2 instance via an instance profile
Add a bucket policy allowing all EC2 instances in the VPC to access the bucket
Enable public access on the S3 bucket for the specific objects needed

Show Answer

Answer: B –

Explanation: IAM roles attached to EC2 instances (via instance profiles) provide temporary credentials that are automatically rotated. This eliminates the need to store long-term credentials. Option A uses long-term credentials which is a security risk. Option C is not how bucket policies work (VPC endpoint policies could restrict, but not via bucket policy to EC2). Option D removes security entirely.

Question 2

An organization uses AWS Organizations with multiple accounts. They want to ensure NO user in any member account can disable CloudTrail, regardless of their IAM permissions. What should they implement?

An IAM policy attached to all users denying cloudtrail:StopLogging
A permission boundary on all roles denying cloudtrail:StopLogging
A Service Control Policy (SCP) denying cloudtrail:StopLogging on member accounts
A resource-based policy on the CloudTrail trail denying StopLogging

Show Answer

Answer: C –

Explanation: SCPs provide organization-wide guardrails that cannot be overridden by any IAM policy within member accounts. Option A requires attaching to every user and can be removed by admins. Option B requires applying to every role and doesn’t cover users. Option D is not supported by CloudTrail. Only SCPs provide mandatory, centralized enforcement across accounts.

Question 3

A developer in Account A (111111111111) needs to access a DynamoDB table in Account B (222222222222). The solution must NOT require the developer to give up their Account A permissions while accessing Account B’s table. What approach should be used?

Create a cross-account IAM role in Account B and have the developer assume it
Create an IAM user in Account B for the developer with DynamoDB permissions
Add a resource-based policy on the DynamoDB table granting access to Account A’s developer
Use VPC peering between the accounts to access DynamoDB

Show Answer

Answer: C –

Explanation: Resource-based policies allow cross-account access while the principal retains their original account permissions. When you assume a role (Option A), you temporarily give up your original permissions and can only use the assumed role’s permissions. DynamoDB does not support resource-based policies directly, but this question tests the concept. In practice, for DynamoDB cross-account, you would use a role. However, for the exam concept being tested: resource-based policies = keep original permissions; role assumption = adopt role’s permissions only.

Note: In real-world scenarios, DynamoDB does NOT support resource-based policies, so cross-account IAM roles would be the correct approach for DynamoDB specifically. This question tests conceptual understanding of the difference between roles and resource-based policies.

Question 4

A company wants to allow its development team to create IAM roles for their Lambda functions, but ensure those roles can never have more permissions than a predefined set (S3 and DynamoDB access only). What should the security team implement?

An SCP restricting the development account to S3 and DynamoDB only
A permission boundary policy that allows only S3 and DynamoDB actions, required when developers create roles
AWS managed policies that only include S3 and DynamoDB permissions
Inline policies on each developer’s IAM user restricting role creation

Show Answer

Answer: B –

Explanation: Permission boundaries are designed for this exact delegation pattern. The security team creates a permission boundary policy allowing only S3 and DynamoDB. They then grant developers permission to create roles only if those roles have this permission boundary attached (using iam:PermissionsBoundary condition key). This ensures developers can self-service while preventing privilege escalation. Option A would restrict the entire account, not just developer-created roles. Options C and D don’t prevent developers from attaching broader policies.

Question 5

A company’s security policy requires that all API calls from IAM users must use Multi-Factor Authentication (MFA). They want users who haven’t authenticated with MFA to only be able to manage their own MFA device. Which policy approach achieves this?

Attach a deny policy that denies all actions except MFA management unless aws:MultiFactorAuthPresent is true
Remove all permissions from users and only grant them through roles that require MFA to assume
Use an SCP to deny all actions unless MFA is present
Configure the AWS account password policy to require MFA

Show Answer

Answer: A –

Explanation: The standard pattern uses a policy with two statements: (1) Allow MFA self-management actions always, and (2) Deny all other actions with a condition "BoolIfExists": {"aws:MultiFactorAuthPresent": "false"}. This forces users to set up and authenticate with MFA before they can perform any other actions. Option B works but is overly complex. Option C would affect all principals including service roles which don’t use MFA. Option D only enforces MFA at console sign-in, not for API/CLI calls.

AWS Security Services Cheat Sheet

References

Global Accelerator vs CloudFront – When to Use Each

June 30, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

AWS Global Accelerator vs CloudFront

AWS Global Accelerator and Amazon CloudFront both leverage the AWS global edge network to improve application performance for distributed users, but they solve fundamentally different problems. Understanding when to use each — or both together — is critical for AWS certification exams and real-world architecture decisions.

CloudFront is a Content Delivery Network (CDN) that caches content at edge locations and serves it directly to users, reducing latency and origin load for HTTP/HTTPS workloads.
Global Accelerator is a network-layer traffic accelerator that uses anycast static IP addresses to route TCP/UDP traffic over the AWS backbone to the optimal regional endpoint — without caching.

The critical distinction: CloudFront optimizes what is delivered (content caching and edge compute). Global Accelerator optimizes how packets travel (network path optimization via the AWS backbone).

CloudFront (CDN)
User Request (HTTP/S)
↓
750+ Edge Locations
↓ Cache HIT → Response
↓ Cache MISS ↓
Regional Edge Cache
↓
Origin (ALB/S3/EC2)
Layer 7 | Caches content | HTTP/S only
Global Accelerator
User Request (TCP/UDP)
↓
Anycast Static IPs
↓
Nearest AWS Edge
↓ AWS Private Backbone
Endpoint (ALB/NLB/EC2)
↕ Health checks → Failover
Layer 4 | No caching | TCP/UDP/HTTP

Architecture Comparison

CloudFront Architecture: Edge Caching

Uses 750+ Points of Presence (PoPs) in 100+ cities across 50+ countries, plus 1,140+ Embedded PoPs within ISP networks.
Operates at Layer 7 (Application layer) — understands HTTP/HTTPS, headers, cookies, query strings.
Multi-tier caching: Edge Locations → Regional Edge Caches (RECs) → Origin Shield → Origin.
Users connect to the nearest edge location via DNS-based routing (anycast DNS). CloudFront resolves to the optimal edge IP.
If content is cached (cache hit), it’s served directly from the edge — origin is never contacted.
If content is not cached (cache miss), CloudFront fetches from origin over the AWS backbone, caches it, then serves it.
Supports edge compute via CloudFront Functions (lightweight, sub-ms) and Lambda@Edge (full Node.js/Python).
Since November 2024, supports Anycast Static IPs for allowlisting and apex domain support (up to 21 IPs).

Global Accelerator Architecture: Anycast IP + AWS Backbone

Uses 130 PoPs in 95 cities across 53 countries.
Operates at Layer 4 (Transport layer) — works with TCP and UDP packets regardless of application protocol.
Provides two static anycast IPv4 addresses (optionally two IPv6 for dual-stack) per accelerator, serviced by independent network zones.
Users connect to the nearest edge location via anycast IP routing — traffic enters the AWS global network at the closest PoP.
Traffic then travels over the AWS private backbone (not the public internet) to the optimal endpoint in the target AWS Region.
No caching — every request is proxied to the backend endpoint.
Supports Custom Routing accelerators for deterministic routing to specific EC2 instances (e.g., gaming matchmaking).
Health checks continuously monitor endpoints and failover happens in under 30 seconds without DNS changes.

Detailed Feature Comparison Table

Feature	Amazon CloudFront	AWS Global Accelerator
Service Type	Content Delivery Network (CDN)	Network traffic accelerator (anycast routing)
OSI Layer	Layer 7 (Application)	Layer 4 (Transport)
Protocol Support	HTTP, HTTPS (TLSv1.3), WebSocket, gRPC	TCP, UDP (any application protocol)
Caching	Yes — Edge + Regional Edge Caches + Origin Shield	No — proxies all requests to endpoint
Edge Locations	750+ PoPs + 1,140+ Embedded PoPs	130 PoPs in 95 cities
Static IP Addresses	Yes — Anycast Static IPs (Nov 2024), up to 21 IPs; BYOIP via IPAM	Yes — 2 anycast IPv4 + 2 IPv6 (dual-stack); BYOIP supported
Performance Approach	Serve cached content from edge (eliminate round-trips to origin)	Route traffic over AWS backbone (reduce internet hops and congestion)
Health Checks	Origin failover (GET/HEAD only) via origin groups	Continuous TCP/HTTP/HTTPS health checks with configurable thresholds
Failover	DNS-based; origin failover for GET/HEAD; subject to DNS TTL	Instant (<30 seconds); no DNS change needed (same static IPs)
DDoS Protection	AWS Shield Standard (auto); Shield Advanced optional; built-in bot management	AWS Shield Standard (auto); Shield Advanced optional; rate-limit mitigations based on endpoint capacity
WAF Integration	Yes — AWS WAF with managed rules, rate limiting, bot control	No — WAF not supported
Client Affinity	No (stateless edge caching); sticky sessions via cookies at ALB origin	Yes — Source IP affinity (NONE or SOURCE_IP) to maintain endpoint stickiness
Origins/Endpoints	S3, ALB, NLB, EC2, API Gateway, MediaStore, custom HTTP origins; VPC Origins (private ALB/NLB/EC2)	ALB, NLB, EC2 instances, Elastic IP addresses (in any Region)
Multi-Region	Single origin or origin group (primary + secondary)	Multiple endpoint groups across Regions with traffic dials (0-100%)
Edge Compute	CloudFront Functions + Lambda@Edge	No
Custom Routing	No	Yes — deterministic routing to specific EC2 instances (port mapping)
IPv6 Support	Yes (dual-stack distributions); Anycast Static IPs IPv6 (Nov 2025)	Yes (dual-stack accelerators with NLB endpoints)

Pricing Comparison

Pricing Component	Amazon CloudFront	AWS Global Accelerator
Fixed Cost	No fixed fee (pay-as-you-go); Flat-rate plans available from Nov 2025	$0.025/hour per accelerator (~$18/month)
Data Transfer	$0.085/GB (first 10TB, US/EU); tiered pricing down to $0.020/GB at 5PB+	Standard EC2 data transfer out + DT-Premium fee ($0.015–$0.105/GB depending on source/destination)
Request Charges	$0.0075–$0.016 per 10,000 HTTP requests (varies by region)	No per-request charges
Free Tier	1 TB data transfer out + 10M requests/month (always free)	No free tier
Billing Model	Charged on all outbound data + requests; or flat-rate plan	Charged only on dominant direction (inbound OR outbound, whichever is higher per hour)
Public IPv4 Charges	N/A (DNS-based routing by default)	Standard public IPv4 address charges apply
Cost Optimization	Caching reduces origin fetches; Price Class selection limits expensive regions; Flat-rate plans for predictability	Traffic dials to shift traffic between regions; dominant-direction billing saves on bidirectional traffic

Use Cases — When to Use Each

Choose CloudFront When:

Serving static content globally — images, CSS, JavaScript, video, software downloads. Caching at 750+ edge locations dramatically reduces latency and origin load.
API acceleration with caching — cacheable API responses (GET requests), dynamic site delivery, and personalization at the edge.
Video streaming — live and on-demand streaming with Embedded PoPs in ISP networks for large-scale delivery.
Web application security — need AWS WAF integration for rate limiting, geo-blocking, bot management, or SQL injection/XSS protection.
Edge compute requirements — A/B testing, URL rewrites, header manipulation, authentication at edge via CloudFront Functions or Lambda@Edge.
Cost-sensitive workloads — free tier (1TB/month), and caching eliminates repeated origin fetches, reducing overall data transfer costs.
S3 origin delivery — serving S3 content with Origin Access Control (OAC) for security.

Choose Global Accelerator When:

Non-HTTP protocols — gaming (UDP), IoT (MQTT over TCP), VoIP (SIP/RTP over UDP), custom TCP protocols.
Static IP addresses required — firewall allowlisting, DNS-independent addressing, compliance requirements for fixed IPs.
Instant failover needed — multi-region active-active or active-passive with <30 second failover, no DNS propagation delay.
Dynamic, uncacheable content — every request must reach the origin (financial transactions, real-time data).
Client affinity (session stickiness) — route the same client to the same endpoint for stateful connections.
Multi-region traffic management — traffic dials to gradually shift traffic between regions (blue/green deployments, disaster recovery).
Custom routing to specific instances — gaming matchmaking, multiplayer session routing to specific EC2 instances.
Consistent performance for TCP workloads — eliminate internet congestion and variable routing for any TCP/UDP application.

Use Both Together When:

You need cacheable content delivery (CloudFront) AND fixed IPs with instant failover for the origin infrastructure (Global Accelerator as the origin for CloudFront).
A multi-region application where CloudFront caches static assets and Global Accelerator handles the dynamic API layer requiring TCP-level optimization.
Gaming platforms: CloudFront delivers game patches/updates while Global Accelerator handles real-time multiplayer (UDP).

Decision Flowchart Guidance

Step-by-Step Decision Process

Step 1: What protocol does your application use?

→ If UDP (gaming, VoIP, IoT) → Global Accelerator
→ If TCP (non-HTTP) (custom protocols, MQTT, database proxying) → Global Accelerator
→ If HTTP/HTTPS → Continue to Step 2

Step 2: Is your content cacheable?

→ If Yes (static assets, cacheable API responses, video) → CloudFront
→ If No (fully dynamic, uncacheable) → Continue to Step 3

Step 3: Do you need static IP addresses for firewall allowlisting?

→ If Yes → Global Accelerator (or CloudFront Anycast Static IPs if HTTP-only)
→ If No → Continue to Step 4

Step 4: Do you need instant failover without DNS propagation delay?

→ If Yes → Global Accelerator (failover <30 seconds, no DNS change)
→ If No → Continue to Step 5

Step 5: Do you need WAF, edge compute, or bot management?

→ If Yes → CloudFront
→ If No → Continue to Step 6

Step 6: Do you need client affinity (same client → same endpoint)?

→ If Yes → Global Accelerator
→ If No → CloudFront (better global coverage with 750+ PoPs, lower cost with free tier)

Integration with ALB, NLB, and EC2

Integration	CloudFront	Global Accelerator
ALB	Yes — as custom origin (public or private via VPC Origins). Full Layer 7 features (path-based routing, host-based routing).	Yes — as endpoint in an endpoint group. Supports multiple ALBs across regions with weighted traffic.
NLB	Yes — as custom origin (public or private via VPC Origins). Useful for TCP/TLS passthrough to origin.	Yes — as endpoint. Supports dual-stack NLB endpoints. Ideal for TCP/UDP workloads behind NLB.
EC2	Yes — as custom origin (public IP or private via VPC Origins). Must run a web server (HTTP/HTTPS).	Yes — as endpoint (via Elastic IP). Custom Routing accelerators can map to specific EC2 instance ports.
S3	Yes — native S3 origin with Origin Access Control (OAC).	No — S3 is not a supported endpoint type.
API Gateway	Yes — as custom origin for caching API responses.	No — API Gateway is not a supported endpoint.
Multi-Region	Origin groups (primary + secondary) in different regions; failover for GET/HEAD only.	Multiple endpoint groups across any number of regions with traffic dials and automatic health-check failover for all traffic types.

DDoS Protection Deep Dive

Both services automatically include AWS Shield Standard at no additional cost, protecting against common Layer 3/4 DDoS attacks.
Both support AWS Shield Advanced ($3,000/month + DRT support) for enhanced detection, mitigation, and cost protection against DDoS-related scaling costs.
CloudFront additionally provides:
- AWS WAF integration for Layer 7 (application-layer) DDoS mitigation (HTTP flood protection, rate limiting).
- Built-in bot management and geographic restrictions.
- Distributed architecture with 750+ PoPs absorbs volumetric attacks closer to the source.
Global Accelerator provides:
- Shield mitigations enforce rate limits based on endpoint capacity — only valid traffic reaches listeners.
- Anycast distribution across 130 PoPs absorbs DDoS traffic at the edge.
- Particularly effective for TCP/UDP DDoS attacks (SYN floods, UDP reflection) targeting non-HTTP workloads.
- Shield Advanced health check integration enables proactive DDoS response.

Performance Optimization Approaches

Optimization	CloudFront	Global Accelerator
Latency Reduction	Eliminates round-trip to origin via cached content; persistent connections to origin for cache misses	Routes traffic off public internet at nearest PoP; AWS backbone provides consistent low-latency path (up to 60% improvement)
Throughput	High throughput via distributed caching; TCP optimizations (congestion window tuning)	AWS backbone provides higher, more consistent throughput than public internet paths
Connection Optimization	TLS termination at edge; persistent connections to origin; HTTP/2 and HTTP/3 (QUIC) support	TCP termination at edge PoP; optimized TCP connections to endpoints over AWS backbone
Availability	Origin failover (GET/HEAD); cached content continues serving during origin outages	Instant multi-region failover (<30s); traffic dials for gradual migration; health checks on all traffic types

AWS Certification Exam Practice Questions

Question 1:

A gaming company runs a multiplayer online game that communicates via UDP. Players are distributed globally and experience high latency due to internet routing inefficiencies. The company needs static IP addresses for players to configure in their game clients. Which AWS service should be used to improve performance?

Amazon CloudFront
AWS Global Accelerator
Amazon Route 53 with latency-based routing
Application Load Balancer with cross-zone load balancing

Show Answer

Answer: B –

Explanation: Global Accelerator is the correct choice for UDP workloads requiring static IP addresses. CloudFront only supports HTTP/HTTPS/WebSocket/gRPC (not UDP). Global Accelerator provides two static anycast IPs and routes UDP traffic over the AWS backbone to the nearest healthy regional endpoint, reducing latency. Route 53 provides DNS-based routing but doesn’t optimize the network path or provide static IPs for non-DNS traffic.

Question 2:

A media streaming company serves video content to millions of viewers globally. During peak events, origin servers experience heavy load. The company wants to reduce origin load and serve content with minimal latency. They also need AWS WAF to block malicious requests. Which solution meets these requirements?

AWS Global Accelerator with ALB endpoints and AWS WAF on the ALB
Amazon CloudFront with S3 origin and AWS WAF associated with the distribution
AWS Global Accelerator with CloudFront as an endpoint
Amazon Route 53 with geolocation routing to regional ALBs

Show Answer

Answer: B –

Explanation: CloudFront is designed for content delivery with edge caching, which reduces origin load during peak events. It natively integrates with AWS WAF at the distribution level for Layer 7 protection. Global Accelerator doesn’t cache content (wouldn’t reduce origin load) and doesn’t support WAF integration. Option A places WAF on ALB but doesn’t solve the origin load problem since every request reaches the ALB.

Question 3:

A financial services company runs a trading application across us-east-1 and eu-west-1. The application handles real-time transactions over HTTPS that cannot be cached. They require failover between regions in under 30 seconds without DNS propagation delay, and the application needs fixed IP addresses for partner firewall rules. Which architecture meets these requirements?

Amazon CloudFront with origin groups configured for both regions
AWS Global Accelerator with endpoint groups in both regions and health checks
Amazon Route 53 failover routing with health checks and low TTL
Amazon CloudFront with Anycast Static IPs and Lambda@Edge for routing

Show Answer

Answer: B –

Explanation: Global Accelerator provides static anycast IPs (for firewall rules), instant failover under 30 seconds (no DNS change needed since the same IPs are used), and works well with uncacheable HTTPS content. CloudFront origin failover only works for GET/HEAD requests (not suitable for financial transactions using POST). Route 53 failover depends on DNS TTL propagation. While CloudFront now offers Anycast Static IPs, it cannot provide deterministic sub-30-second failover for POST requests.

Question 4:

A company wants to deploy a global web application that serves both static assets (images, CSS, JS) and dynamic API requests. Static assets are cacheable, but API responses are personalized and uncacheable. They need the lowest possible latency for both content types and instant multi-region failover for the API layer. Which combination of services should be used?

CloudFront for everything with cache behaviors (cache static, bypass cache for API)
Global Accelerator for everything with ALB endpoints in multiple regions
CloudFront for static assets + Global Accelerator for the dynamic API endpoints
CloudFront for static assets + Route 53 latency-based routing for API

Show Answer

Answer: C –

Explanation: Using both services together provides the best of both worlds. CloudFront caches static assets at 750+ edge locations for lowest latency delivery. Global Accelerator handles the dynamic API layer by routing traffic over the AWS backbone with instant failover across regions. Option A would work for caching but doesn’t provide instant failover for the API. Option B doesn’t cache static content, increasing origin load and latency. Option D depends on DNS TTL for failover.

Question 5:

A company operates an IoT platform that receives MQTT messages (over TCP) from 500,000 devices worldwide. They need to ensure devices always connect to the nearest healthy regional endpoint, with automatic failover if a region becomes unavailable. Devices have hardcoded IP addresses and cannot resolve DNS. Which solution is most appropriate?

Amazon CloudFront with WebSocket support for MQTT
Network Load Balancer with Elastic IPs in each region
AWS Global Accelerator with Custom Routing to endpoint groups across regions
AWS Global Accelerator with standard routing and NLB endpoints in multiple regions

Show Answer

Answer: D –

Explanation: Global Accelerator is ideal because: (1) it provides static anycast IPs that devices can hardcode without DNS dependency, (2) it routes TCP traffic (MQTT) to the nearest healthy regional endpoint, (3) it provides automatic failover when a region’s health check fails. CloudFront doesn’t support raw TCP/MQTT (only HTTP/HTTPS/WebSocket). Option B requires devices to know which regional IP to use and doesn’t provide cross-region failover. Custom Routing (Option C) is for deterministic routing to specific instances, not for nearest-region routing.

Key Takeaways for AWS Certification Exams

CloudFront = CDN = Caching = HTTP/HTTPS = Layer 7. Think web content, APIs, video streaming, edge compute.
Global Accelerator = Network Optimization = No Caching = TCP/UDP = Layer 4 = Static IPs. Think gaming, IoT, VoIP, instant failover.
If the question mentions UDP or non-HTTP protocols → Global Accelerator.
If the question mentions static IPs for firewall allowlisting + instant failover → Global Accelerator (though CloudFront now has Anycast Static IPs for HTTP workloads).
If the question mentions caching, WAF, edge compute, or reducing origin load → CloudFront.
If the question mentions both cached content AND non-HTTP/instant failover needs → Use both together.
Global Accelerator failover is instant (no DNS); CloudFront/Route 53 failover depends on DNS TTL propagation.

Frequently Asked Questions

What is the difference between Global Accelerator and CloudFront?

CloudFront is a CDN that caches content at 750+ edge locations for HTTP/HTTPS traffic. Global Accelerator uses anycast IPs to route TCP/UDP traffic over AWS’s private backbone to the nearest healthy endpoint, without caching.

When should I use Global Accelerator instead of CloudFront?

Use Global Accelerator for non-HTTP protocols (TCP/UDP gaming, IoT, VoIP), when you need static IP addresses, instant failover between regions, or for applications that can’t benefit from caching like real-time APIs.

Can I use Global Accelerator and CloudFront together?

Yes, you can place CloudFront behind Global Accelerator to get both static IPs and edge caching. This is useful when you need deterministic IPs for firewall allowlisting plus CDN benefits.

References

SQS Standard vs FIFO Queue – Complete Comparison

June 30, 2026 ~ Last updated on : July 18, 2026 ~ Kiro Agent

AWS SQS Standard vs FIFO Queue – Differences & Use Cases

Amazon Simple Queue Service (SQS) offers two queue types — Standard and FIFO (First-In-First-Out) — each designed for different messaging requirements. Choosing the wrong queue type leads to either unnecessary complexity or data integrity issues. This comprehensive comparison covers every dimension: ordering, deduplication, throughput, pricing, delivery guarantees, and real-world use cases to help you make the right choice.

🎯 Quick Decision Rule:

Need maximum throughput, can handle duplicates & out-of-order? → Standard Queue
Need strict ordering & exactly-once processing? → FIFO Queue
Need strict ordering AND high throughput (70K+ TPS)? → FIFO Queue with High Throughput Mode

SQS Standard vs FIFO — Message Flow
Standard Queue
Producer → Msg A, B, C
↓ Best-effort ordering
Queue
∞ throughput
↓ At-least-once
Consumer → B, A, C, A
(possible duplicates & reorder)
FIFO Queue (.fifo)
Producer → Msg A, B, C
↓ Strict ordering
Queue
3,000 msg/s (batch)
↓ Exactly-once
Consumer → A, B, C
(guaranteed order, no duplicates)

SQS Standard vs FIFO – Detailed Comparison Table

Feature	Standard Queue	FIFO Queue
Message Ordering	Best-effort ordering; messages may arrive out of order	Strict first-in-first-out ordering within each message group
Delivery Guarantee	At-least-once delivery; duplicates possible	Exactly-once processing; no duplicates within 5-minute deduplication window
Throughput (Default)	Nearly unlimited TPS	300 TPS per API action (3,000 msg/sec with batching)
Throughput (High Throughput Mode)	N/A – already unlimited	Up to 70,000 TPS per API action (700,000 msg/sec with batching) in select regions
Deduplication	No built-in deduplication; application must handle idempotency	Built-in 5-minute deduplication via MessageDeduplicationId or content-based deduplication (SHA-256 hash)
Message Groups	Optional (Fair Queue feature); not required	Required MessageGroupId; ordering guaranteed within each group
In-Flight Messages	~120,000 messages	120,000 messages (increased from 20,000 in Nov 2024)
Pricing (US East)	$0.40 per million requests (first 100B); tiered pricing down to $0.24/million	$0.50 per million requests (first 100B); tiered pricing down to $0.35/million
Free Tier	1 million requests/month free	1 million requests/month free (separate from Standard free tier)
Batching	Up to 10 messages per batch (max 256 KB total payload)	Up to 10 messages per batch (max 256 KB total payload); multiplies effective throughput 10x
Dead-Letter Queue (DLQ)	Supported; DLQ must be a Standard queue; redrive to source supported	Supported; DLQ must be a FIFO queue; redrive to source supported (since Nov 2023)
Queue Name	Any valid name (up to 80 characters)	Must end with `.fifo` suffix
Message Retention	1 minute to 14 days (default 4 days)	1 minute to 14 days (default 4 days)
Per-Message Delay	Supported (0–900 seconds per message)	Not supported per-message; only queue-level delay
AWS Service Integration	All AWS services (S3 events, SNS, Lambda, etc.)	SNS FIFO, Lambda, EventBridge; S3 Event Notifications NOT directly supported
Visibility Timeout	0 seconds to 12 hours (default 30 seconds)	0 seconds to 12 hours (default 30 seconds)
Long Polling	Supported (1–20 seconds)	Supported (1–20 seconds)

Message Ordering

Standard Queue – Best-Effort Ordering

Messages are generally delivered in the order they are sent, but this is not guaranteed
The highly distributed architecture optimizes for throughput, which may cause occasional out-of-order delivery
If ordering matters, the application must include sequence information in the message body and handle reordering at the consumer level

FIFO Queue – Strict Ordering

Messages are delivered in exactly the order they are sent within a message group
MessageGroupId is required — messages with the same group ID are delivered in strict FIFO order
Different message groups can be processed in parallel by different consumers, enabling concurrent processing while maintaining per-group ordering
Only one consumer can process messages from a given message group at a time (until the message is deleted or the visibility timeout expires)

Deduplication

Standard Queue – No Built-in Deduplication

At-least-once delivery means the same message may be delivered more than once
Applications must implement idempotent consumers — processing the same message twice should produce the same result
Common patterns: use a unique message identifier stored in DynamoDB or a database to track processed messages

FIFO Queue – Built-in Deduplication

Provides exactly-once processing within a 5-minute deduplication interval
Two deduplication mechanisms:
- Content-based deduplication — Enable on the queue; SQS generates a SHA-256 hash of the message body as the deduplication ID. Messages with identical bodies within 5 minutes are treated as duplicates.
- MessageDeduplicationId — Explicitly provide a unique token per message. If a message with the same deduplication ID is sent within 5 minutes, it is accepted but not delivered again.
If both are configured, the explicit MessageDeduplicationId takes precedence over content-based deduplication
After the 5-minute window expires, sending the same message again is treated as a new message

Throughput & Scaling

Standard Queue

Nearly unlimited throughput — supports unlimited TPS for SendMessage, ReceiveMessage, and DeleteMessage
Can handle any spike without pre-provisioning or configuration changes
Ideal for applications with unpredictable or extremely high message volumes

FIFO Queue

Default mode: 300 TPS per API action (SendMessage, ReceiveMessage, DeleteMessage)
With batching (10 messages/request): effectively 3,000 messages/second
High Throughput Mode (enable via SQS console or API):
- Up to 70,000 TPS per API action without batching in US East (N. Virginia), US West (Oregon), and Europe (Ireland)
- Up to 700,000 messages/second with batching in those regions
- Up to 18,000 TPS in other supported regions
- Uses message group-level partitioning — distribute messages across multiple message group IDs for maximum throughput
In-flight message limit: 120,000 messages (increased from 20,000 in November 2024)

⚡ Throughput Tip: To maximize FIFO throughput, use unique MessageGroupId values for messages that don’t require relative ordering. Each message group is processed independently, enabling parallel processing across groups.

Pricing

Monthly Requests	Standard (per million)	FIFO (per million)
First 1 million	Free	Free
1M – 100 Billion	$0.40	$0.50
100B – 200 Billion	$0.30	$0.40
Over 200 Billion	$0.24	$0.35

FIFO queues cost approximately 25% more than Standard queues at every tier
Each 64 KB chunk of payload counts as one request (a 256 KB message = 4 requests)
Batch operations (up to 10 messages) count as a single request — use batching to reduce costs
No additional cost for enabling High Throughput Mode on FIFO queues
AWS KMS encryption adds KMS API call charges (if server-side encryption is enabled)

Delivery Guarantees

Standard Queue – At-Least-Once Delivery

A message is delivered at least once, but occasionally more than once
Duplicate delivery occurs because SQS stores messages redundantly across multiple servers for high availability
Applications must be designed to handle duplicates gracefully (idempotent processing)
Best suited when occasional duplicates are acceptable and throughput is the priority

FIFO Queue – Exactly-Once Processing

Each message is delivered exactly once and remains available until processed and deleted
SQS uses the deduplication mechanism to prevent duplicate delivery within the 5-minute window
If a consumer receives a message but fails to delete it before the visibility timeout, the message becomes available again to the same or another consumer — but only within the same message group ordering constraints
Best suited when duplicate processing would cause business logic errors (financial transactions, order processing)

Batching

Both queue types support batch operations: SendMessageBatch, ReceiveMessage (up to 10 messages), and DeleteMessageBatch
Maximum total payload per batch: 256 KB
A single batch API call counts as one request for billing purposes — batching reduces cost by up to 10x
Standard queues: Batching is purely a cost optimization; throughput is already unlimited
FIFO queues: Batching is critical for throughput — it effectively multiplies the TPS limit by 10 (e.g., 300 TPS → 3,000 messages/sec; 70,000 TPS → 700,000 messages/sec)
Note: FIFO queues are NOT compatible with the SQS Buffered Asynchronous Client (which batches messages client-side). Use the standard SDK SendMessageBatch API instead.

Dead-Letter Queues (DLQ)

Both Standard and FIFO queues support DLQs for isolating messages that fail processing
Configure maxReceiveCount (1–1,000) — after this many receives without deletion, the message moves to the DLQ
DLQ must match the source queue type: Standard queue → Standard DLQ; FIFO queue → FIFO DLQ
DLQ redrive to source (moving messages back to the original queue) is supported for both types
FIFO DLQ redrive was launched in November 2023 and expanded to GovCloud in April 2024
Messages retain their original message ID when moved to a DLQ, enabling tracking

💡 DLQ Best Practice: Don’t use a DLQ with a FIFO queue if breaking the exact order of messages/operations would cause issues. When a message is moved to the DLQ, subsequent messages in the same group can proceed, potentially breaking the intended sequence.

Use Cases

When to Use Standard Queue

High-volume event processing — website clickstream analytics, IoT telemetry ingestion
Background job processing — image/video transcoding, report generation, email sending
Decoupling microservices — when services need to communicate asynchronously at high throughput
Fan-out with SNS — receiving messages from SNS Standard topics at scale
Batch data processing — ETL pipelines where order doesn’t matter
Buffer for traffic spikes — absorbing burst traffic between web tier and backend services
Log aggregation — collecting logs from distributed systems for centralized processing

When to Use FIFO Queue

Financial transactions — ensuring debits and credits are processed in sequence
E-commerce order processing — order placed → payment → fulfillment must happen in sequence
Inventory management — stock updates must be applied in order to maintain accuracy
Command execution — ensuring commands are executed in the exact order submitted
Price updates — displaying the correct current price by processing changes in order
Event sourcing — maintaining event sequence for aggregate reconstruction
Ticketing systems — first-come-first-served ticket allocation
Registration workflows — ensuring account creation precedes enrollment actions

Decision Guidance – When to Choose Each

Choose Standard Queue when:

Your application can tolerate occasional duplicate messages
Message order is not critical to business logic
You need the highest possible throughput without any limits
You’re integrating with AWS services that don’t support FIFO (e.g., S3 event notifications directly)
Cost optimization is a priority (25% cheaper than FIFO)
Your consumer is already idempotent

Choose FIFO Queue when:

Message ordering is critical to correctness (financial, ordering, sequential workflows)
Duplicate processing would cause data corruption or business logic errors
You need exactly-once semantics without building custom deduplication logic
Throughput requirements are within FIFO limits (up to 70K TPS with High Throughput Mode)
You can group messages logically using MessageGroupId for parallel processing within ordering constraints

Migration Considerations

You cannot convert an existing Standard queue to FIFO or vice versa — you must create a new queue
FIFO queue names must end with .fifo suffix
When migrating from Standard to FIFO, you must add MessageGroupId (required) and either enable content-based deduplication or provide MessageDeduplicationId
Test throughput requirements before migrating — ensure FIFO limits (even with High Throughput Mode) meet your peak load

AWS Certification Exam Tips (SAA-C03 / DVA-C02)

📝 Key Points for the Exam:

FIFO = ordering + exactly-once. Any question mentioning “strict order,” “sequence,” “exactly-once,” or “no duplicates” → FIFO queue.
Standard = high throughput + at-least-once. Questions about “unlimited throughput” or “maximum scalability” → Standard queue.
MessageGroupId is required for FIFO queues. Messages with the same group ID are ordered; different groups can be processed in parallel.
FIFO queue names must end in .fifo — this is a common trick in exam questions.
S3 event notifications cannot directly target FIFO queues. Use EventBridge as an intermediary.
DLQ must match queue type — Standard DLQ for Standard queue, FIFO DLQ for FIFO queue.
High Throughput Mode eliminates the throughput objection — FIFO can now handle 70K+ TPS. Questions about “ordering AND high throughput” → FIFO with High Throughput Mode.
Deduplication window is 5 minutes. Content-based deduplication uses SHA-256 hash of message body.
Per-message delay is NOT supported in FIFO queues — only queue-level DelaySeconds.
DVA-C02 focus: Understand how to implement idempotent consumers for Standard queues and how MessageDeduplicationId works for FIFO queues.

AWS Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

A financial services company processes stock trade orders. Each trade must be processed exactly once and in the precise order it was submitted. The system handles approximately 500 trades per second during peak hours. Which Amazon SQS configuration meets these requirements with the LEAST operational overhead?
1. Use a Standard queue with application-level deduplication using DynamoDB
2. Use a Standard queue with message sequencing in the message body
3. Use a FIFO queue with MessageGroupId set to the stock ticker symbol
4. Use Amazon Kinesis Data Streams with one shard per stock symbol
Show Answer

Answer: c) – FIFO queues provide exactly-once processing and strict ordering within each message group. Using the stock ticker as the MessageGroupId ensures trades for the same stock are processed in order while allowing parallel processing across different stocks. 500 TPS is well within FIFO default limits (300 TPS per group, but distributed across many groups = much higher aggregate).
A developer is building an application that sends S3 event notifications to an SQS queue for ordered processing. The team requires FIFO delivery of these notifications. What should the developer do?
1. Configure S3 to send events directly to an SQS FIFO queue
2. Configure S3 to send events to Amazon EventBridge, then create a rule to forward events to an SQS FIFO queue
3. Configure S3 to send events to an SNS Standard topic, then subscribe the SQS FIFO queue
4. Configure S3 to send events to a Standard queue and use Lambda to forward them to a FIFO queue
Show Answer

Answer: b) – S3 Event Notifications do not support SQS FIFO queues as a direct destination. Amazon EventBridge receives S3 events and can route them to SQS FIFO queues as a rule target, providing the required FIFO delivery with minimal operational overhead.
An e-commerce platform needs to process over 100,000 order update messages per second with strict per-customer ordering. The solutions architect wants to use Amazon SQS. Which approach meets BOTH the throughput and ordering requirements?
1. Use a Standard queue with application-level ordering per customer
2. Use multiple FIFO queues with default settings and distribute customers across them
3. Use a FIFO queue with High Throughput Mode enabled and a unique MessageGroupId per customer
4. Use Amazon Kinesis Data Streams with customer ID as the partition key
Show Answer

Answer: c) – FIFO High Throughput Mode supports up to 70,000 TPS per API action (700,000 messages/sec with batching) in select regions. Using a unique MessageGroupId per customer ensures per-customer ordering while distributing load across message group partitions. With batching of 10 messages per request, 100,000+ messages/second is achievable.
A development team is using an SQS FIFO queue for processing payment events. They notice that after enabling content-based deduplication, some messages with different payment amounts but identical message bodies (due to a serialization bug) are being silently dropped. What is the MOST appropriate fix?
1. Disable deduplication entirely on the FIFO queue
2. Switch to a Standard queue to avoid deduplication issues
3. Provide an explicit MessageDeduplicationId (e.g., a unique payment transaction ID) for each message instead of relying on content-based deduplication
4. Increase the deduplication interval beyond 5 minutes
Show Answer

Answer: c) – Content-based deduplication uses a SHA-256 hash of the message body. If messages have identical bodies (due to a bug), they get the same deduplication ID and are treated as duplicates. Providing an explicit MessageDeduplicationId (like a unique transaction ID) ensures each legitimate message is treated as unique regardless of body content. The deduplication interval cannot be changed from 5 minutes.
A company has a Standard SQS queue receiving 50,000 messages per second. They are experiencing issues with duplicate message processing causing double charges to customers. The team wants to prevent duplicates with minimal code changes. What should they do?
1. Increase the visibility timeout to prevent reprocessing
2. Migrate to a FIFO queue with content-based deduplication enabled, using customer account ID as the MessageGroupId
3. Add a DynamoDB table for deduplication tracking in the consumer application
4. Enable long polling to reduce duplicate receives
Show Answer

Answer: b) – For “minimal code changes” and preventing duplicates, migrating to FIFO with content-based deduplication is the most effective solution. FIFO provides exactly-once processing natively without custom deduplication logic. However, verify that 50,000 TPS is achievable with High Throughput Mode enabled (70K TPS in select regions). Option c is valid but requires more code changes. Option a doesn’t prevent duplicates — it only reduces the window.

Frequently Asked Questions

What is the difference between SQS Standard and FIFO?

SQS Standard offers unlimited throughput with best-effort ordering and at-least-once delivery. FIFO guarantees exact message ordering and exactly-once processing but is limited to 3,000 messages/second with batching (300 without).

When should I use SQS FIFO queue?

Use FIFO when message ordering matters (e.g., financial transactions, e-commerce order processing) or when you need exactly-once processing to prevent duplicate actions. Use Standard for high-throughput workloads where occasional duplicates or out-of-order messages are acceptable.

Can I convert an SQS Standard queue to FIFO?

No, you cannot convert between queue types. You must create a new FIFO queue and migrate your application. FIFO queue names must end with the .fifo suffix.

References

RDS vs Aurora vs DynamoDB – Database Selection Guide

June 29, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

RDS vs Aurora vs DynamoDB – AWS Database Selection Guide

Choosing the right AWS database service is one of the most impactful architectural decisions you’ll make. Amazon RDS, Amazon Aurora, and Amazon DynamoDB serve fundamentally different needs, and selecting the wrong one leads to performance bottlenecks, cost overruns, or painful migrations. This guide provides a structured decision framework — not just feature comparisons — to help you choose correctly the first time.

This post is designed as a selection/decision guide with clear criteria, decision flowcharts, and tradeoff analysis for each service.

🎯 Quick Decision Rule:

Need SQL, complex joins, existing relational schema? → RDS or Aurora
Need SQL + high availability + auto-scaling + performance? → Aurora
Need unlimited scale, single-digit ms latency, simple access patterns? → DynamoDB

Decision Framework: When to Choose Each

Choose Amazon RDS When:

You need a specific database engine not supported by Aurora (Oracle, SQL Server, MariaDB)
Your workload is predictable and steady with well-understood capacity needs
You want the lowest cost for a managed relational database with moderate performance needs
You’re doing a lift-and-shift migration from on-premises with minimal changes
Your application requires engine-specific features (e.g., Oracle RAC alternatives, SQL Server Always On)
Storage needs are under 64 TB and you want direct control over IOPS provisioning

Choose Amazon Aurora When:

You need MySQL or PostgreSQL compatibility with significantly better performance
Your workload requires high availability with fast automated failover (<30 seconds)
You need auto-scaling storage up to 128 TB without manual provisioning
Your traffic is variable or unpredictable (Aurora Serverless v2 scales to zero)
You need cross-region disaster recovery with <1 second replication lag (Global Database)
You need horizontal write scaling for relational data (Aurora Limitless Database)
Performance requirements exceed what standard RDS can deliver (5x MySQL, 3x PostgreSQL throughput)

Choose Amazon DynamoDB When:

Your access patterns are well-defined and predictable (key-value lookups, simple queries)
You need single-digit millisecond latency at any scale (or microseconds with DAX)
Your application must scale to millions of requests per second without capacity planning
You want zero infrastructure management — no instances, no patching, no maintenance windows
You need active-active multi-region writes with Global Tables
Your data model is denormalized or fits key-value/document patterns
You need event-driven architectures with DynamoDB Streams triggering Lambda

Architecture Differences

Amazon RDS – Traditional Managed Architecture

Compute + Storage coupled — EC2 instance with attached EBS volumes (gp3 or io2)
Storage limited to 64 TB (gp3) with manual IOPS provisioning
Multi-AZ: synchronous standby replica for failover (30-60 second failover)
Multi-AZ DB Clusters: 1 writer + 2 readable standbys, ~35 second failover
Read Replicas: asynchronous, up to 15 (MySQL/MariaDB) or 5 (PostgreSQL/Oracle/SQL Server)
Supports 6 engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Db2
RDS Custom: Full OS/database access for Oracle and SQL Server customization
RDS Proxy: Connection pooling for serverless/Lambda workloads

Amazon Aurora – Cloud-Native Relational Architecture

Compute and storage decoupled — storage is a shared distributed volume across 3 AZs
6 copies of data across 3 AZs; writes acknowledged with 4/6 quorum
Tolerates loss of 2 copies for writes, 3 copies for reads — without interruption
Storage auto-scales from 10 GB to 128 TB, no provisioning needed
Up to 15 read replicas sharing the same storage (near-zero replication lag)
Failover to replica in <30 seconds (no data copy required — shared storage)
Aurora Serverless v2: scales in ACUs, scales to zero, up to 30% better performance (2026 platform v4)
Aurora Global Database: cross-region with <1 second replication, RPO <1 second
Aurora Limitless Database (GA Oct 2024): automated horizontal write scaling via sharding, millions of writes/sec
Aurora DSQL (GA May 2025): distributed SQL, active-active multi-region, 99.999% availability
Supports: MySQL and PostgreSQL only

Amazon DynamoDB – Serverless Distributed NoSQL

Fully serverless — no instances, no storage provisioning, no maintenance windows
Data automatically replicated across 3 AZs
Horizontally partitioned by partition key — unlimited scaling
Supports key-value and document data models
Global Tables: active-active multi-region with multi-region strong consistency (MRSC, 2025)
DynamoDB Streams: ordered change data capture for event-driven patterns
DAX: in-memory cache providing microsecond read latency
Zero-ETL with Redshift: real-time analytics without data movement
Standard and Standard-IA table classes for cost optimization

Comprehensive Comparison Table

Criteria	Amazon RDS	Amazon Aurora	Amazon DynamoDB
Database Type	Relational (SQL)	Relational (SQL) – cloud-native	NoSQL (Key-Value / Document)
Engines	MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Db2	MySQL-compatible, PostgreSQL-compatible	Proprietary (API + PartiQL)
Max Storage	64 TB (gp3)	128 TB (auto-scaling)	Unlimited (per table)
Performance	Standard engine performance; dependent on instance + EBS IOPS	5x MySQL, 3x PostgreSQL throughput; optimized I/O paths	Single-digit ms latency; microseconds with DAX; consistent at any scale
Scaling – Vertical	Instance resize (downtime); up to 128 vCPUs	Instance resize or Serverless v2 auto-scaling (0.5–256 ACUs)	N/A – fully managed, scales automatically
Scaling – Horizontal (Reads)	Up to 15 read replicas (async)	Up to 15 replicas (shared storage, near-zero lag)	Automatic partitioning; unlimited read throughput
Scaling – Horizontal (Writes)	Single writer only (manual sharding needed)	Single writer; Limitless Database for automated sharding (PG)	Automatic partitioning; unlimited write throughput
Availability SLA	99.95% (Multi-AZ)	99.99%; 99.999% (DSQL multi-region)	99.99% (standard); 99.999% (Global Tables)
Failover Time	30-60 sec (Multi-AZ); ~35 sec (DB Clusters)	<30 sec (replica promotion); instant (Serverless)	N/A – multi-AZ by default, no failover concept
Multi-Region	Cross-region read replicas (manual promotion)	Global Database (<1s lag); DSQL (active-active)	Global Tables (active-active, MRSC for strong consistency)
Serverless Option	No (always provisioned instances)	Yes – Serverless v2 (scales to zero)	Yes – fully serverless by default
Backup/Recovery	Automated backups (35 days); manual snapshots; PITR	Continuous backup to S3; PITR; Backtrack (MySQL, in-place rewind)	Continuous backup; PITR (35 days); on-demand backup
Encryption	At-rest (KMS) + in-transit (SSL/TLS)	At-rest (KMS) + in-transit (SSL/TLS)	At-rest (KMS, default) + in-transit (TLS)
Authentication	DB native + IAM DB Auth + Kerberos/AD	DB native + IAM DB Auth + Kerberos/AD	IAM policies + fine-grained access control
Pricing Model	Instance hours + EBS storage + IOPS (if io2) + data transfer	Instance/ACU hours + storage ($0.10/GB) + I/O or I/O-Optimized	On-demand (per request) or Provisioned (WCU/RCU) + storage ($0.25/GB)
Cost Optimization	Reserved Instances (1yr/3yr)	Reserved Instances; I/O-Optimized tier; Serverless	Reserved Capacity; Standard-IA class; on-demand vs provisioned
Maintenance	Maintenance windows required for patching	Maintenance windows (less frequent); zero-downtime patching available	Zero maintenance — no windows, no patching, no downtime
Best For	Lift-and-shift; Oracle/SQL Server workloads; steady predictable loads	High-performance MySQL/PostgreSQL; variable traffic; mission-critical apps	Massive scale; gaming/IoT/mobile; simple access patterns; event-driven

Scaling Approaches Compared

RDS Scaling

Vertical: Change instance class (requires brief downtime for single-AZ; rolling for Multi-AZ DB Clusters)
Storage: Increase EBS volume size (online, but cannot decrease); up to 80,000 IOPS with gp3 (2026)
Read scale-out: Add read replicas (async replication means eventual consistency for reads)
Write scale-out: Not supported natively — requires application-level sharding
Limitation: Write throughput bound by single instance capacity

Aurora Scaling

Vertical: Instance resize or Serverless v2 auto-scaling (0.5 to 256 ACUs, increments of 0.5)
Storage: Automatic — grows in 10 GB increments, up to 128 TB, never shrinks below high-water mark
Read scale-out: Up to 15 replicas with shared storage (no replication lag penalty)
Write scale-out: Aurora Limitless Database (PostgreSQL) — automated sharding across multiple writer instances, petabyte scale
Serverless: Scales to zero when idle; responds in milliseconds; ideal for dev/test and variable traffic
Advantage: Scaling doesn’t require data copying — shared storage architecture

DynamoDB Scaling

Fully automatic: No instance sizing or storage provisioning — scales horizontally by adding partitions
On-demand mode: Instantly accommodates up to 2x previous peak; no throttling for gradual increases
Provisioned mode: Set WCU/RCU with auto-scaling policies (target utilization-based)
No practical limits: Handles millions of requests/second, unlimited storage per table
Consideration: Requires good partition key design — hot partitions can cause throttling
2025 update: More frequent mode switches between provisioned and on-demand now allowed

Pricing Comparison

Amazon RDS Pricing

Instance hours: Pay per hour for chosen instance type (e.g., db.r6g.large ~$0.26/hr in us-east-1)
Storage: gp3 at $0.115/GB/month (includes 3,000 IOPS baseline); io2 for high-performance
Additional IOPS: gp3 provisioned IOPS $0.08/IOPS/month above baseline
Backup: Free up to 100% of DB size; additional at $0.095/GB/month
Data transfer: Standard AWS rates
Savings: Reserved Instances (up to 60% discount for 3-year all-upfront)
Lowest entry cost among the three for relational workloads

Amazon Aurora Pricing

Instance hours: ~20% premium over equivalent RDS instances
Storage: $0.10/GB/month (auto-provisioned, slightly cheaper per-GB than RDS gp3)
I/O (Standard tier): $0.20 per million I/O requests — can be significant for write-heavy workloads
I/O-Optimized tier: 30-40% higher instance + storage cost, but zero I/O charges — breaks even at ~500K I/Os per instance hour
Serverless v2: $0.12 per ACU-hour (billed per second); scales to zero = $0 when idle
Savings: Reserved Instances; choose I/O-Optimized for high-throughput; Serverless for variable loads
Cost trap: I/O charges in Standard tier can double the bill for read-heavy/high-throughput workloads

Amazon DynamoDB Pricing

On-demand: $1.25 per million write request units (WRU); $0.25 per million read request units (RRU)
Provisioned: $0.00065 per WCU/hour; $0.00013 per RCU/hour (~$0.47/WCU/month)
Storage: $0.25/GB/month (Standard); $0.10/GB/month (Standard-IA for infrequent access)
Global Tables: Replicated writes cost 1.5x (rWRU/rWCU) + cross-region transfer
Transactions: 2x cost (each transactional operation counts double)
Savings: Reserved Capacity (up to 77% for 3-year); provisioned mode for steady workloads
Cost insight: On-demand is ~7x more expensive than provisioned for sustained throughput — switch to provisioned once patterns stabilize

💰 Cost Decision Matrix:

Lowest cost, steady relational workload: RDS with Reserved Instances
Variable traffic, pay-for-what-you-use: Aurora Serverless v2 or DynamoDB on-demand
High-throughput relational: Aurora I/O-Optimized with Reserved Instances
Massive scale NoSQL, steady traffic: DynamoDB Provisioned + Reserved Capacity
Unpredictable/spiky NoSQL: DynamoDB on-demand

Availability and Disaster Recovery

Feature	RDS	Aurora	DynamoDB
Data Replication	Synchronous to 1 standby (Multi-AZ)	6 copies across 3 AZs (automatic)	3 copies across 3 AZs (automatic)
RPO (Data Loss)	0 (Multi-AZ sync); seconds (read replicas)	0 (same region); <1 sec (Global Database)	0 (same region); 0 with MRSC (Global Tables)
RTO (Recovery Time)	30-60 sec (Multi-AZ); minutes (replica promotion)	<30 sec (replica); <1 min (Global failover)	Instant (multi-AZ built-in); seconds (Global Tables failover)
Cross-Region DR	Cross-region read replicas (manual failover)	Global Database (managed failover); DSQL (automatic)	Global Tables (automatic active-active)
Point-in-Time Recovery	Yes (up to 35 days)	Yes (up to 35 days) + Backtrack (MySQL, no new cluster)	Yes (up to 35 days)
Maintenance Downtime	Required (maintenance windows)	Minimal (zero-downtime patching for many updates)	Zero (no maintenance windows ever)

Security Features

Security Feature	RDS	Aurora	DynamoDB
Network Isolation	VPC, Security Groups, private subnets	VPC, Security Groups, private subnets	VPC Endpoints (Gateway); no VPC placement needed
Encryption at Rest	AES-256 via KMS (must enable at creation)	AES-256 via KMS (must enable at creation)	AES-256 via KMS (enabled by default)
Encryption in Transit	SSL/TLS (configurable, can enforce)	SSL/TLS (configurable, can enforce)	TLS (HTTPS endpoints, always encrypted)
Authentication	Database native; IAM DB Auth; Kerberos/AD; Secrets Manager rotation	Database native; IAM DB Auth; Kerberos/AD; Secrets Manager rotation	IAM policies only (no DB-level users)
Fine-Grained Access	Database GRANT/REVOKE (table/column level)	Database GRANT/REVOKE (table/column level)	IAM conditions on partition keys, attributes
Audit Logging	Engine-native audit logs + CloudTrail (API)	Engine-native audit logs + CloudTrail (API)	CloudTrail (API); no query-level audit natively
Connection Management	RDS Proxy for pooling	RDS Proxy for pooling	N/A — HTTP/HTTPS API (no persistent connections)

Decision Flowchart: Selecting Your Database

Step 1: What’s Your Data Model?

✅ Relational (tables, joins, foreign keys, complex queries) → Go to Step 2
✅ Key-value, document, or denormalized → Go to Step 5

Step 2: Which Database Engine Do You Need?

✅ Oracle, SQL Server, MariaDB, or Db2 → Choose RDS
✅ MySQL or PostgreSQL → Go to Step 3

Step 3: What Are Your Performance/Scale Requirements?

✅ Standard performance is sufficient; cost is primary concern → Choose RDS
✅ Need high throughput (5x MySQL/3x PG), auto-scaling storage, fast failover → Go to Step 4
✅ Need horizontal write scaling (millions of writes/sec) → Choose Aurora Limitless Database
✅ Need active-active multi-region SQL with 99.999% availability → Choose Aurora DSQL

Step 4: What’s Your Traffic Pattern?

✅ Steady, predictable traffic → Choose Aurora Provisioned
✅ Variable/spiky traffic or dev/test environments → Choose Aurora Serverless v2
✅ Infrequent use with cost sensitivity → Choose Aurora Serverless v2 (scales to zero)

Step 5: DynamoDB Fit Assessment

✅ Access patterns are known and can be modeled with partition/sort keys → Choose DynamoDB
✅ Need ad-hoc queries, complex joins, or flexible querying → Go back to Step 2 (use relational)
✅ Need <1ms reads with caching → Choose DynamoDB + DAX
✅ Need active-active multi-region with zero RPO → Choose DynamoDB Global Tables (MRSC)

Common Use Case Mapping

Use Case	Recommended Service	Why
E-commerce product catalog + orders	Aurora (orders) + DynamoDB (catalog/cart)	ACID for transactions; low-latency reads for catalog
Gaming leaderboard / session store	DynamoDB	Unlimited scale, single-digit ms, simple access patterns
SaaS multi-tenant application	Aurora (Limitless for large scale) or DynamoDB	SQL for complex queries; DynamoDB for per-tenant isolation
Legacy Oracle migration to AWS	RDS for Oracle or RDS Custom	Full Oracle compatibility; minimal code changes
IoT sensor data ingestion	DynamoDB	Massive write throughput; time-series via sort key; TTL for expiry
Financial transaction processing	Aurora or Aurora DSQL (global)	Strong ACID; high throughput; cross-region consistency
Content management system (WordPress-style)	RDS MySQL/PostgreSQL	Standard performance sufficient; lowest cost; proven compatibility
Real-time mobile app backend	DynamoDB + DynamoDB Streams	Serverless; event-driven; scales with users
Enterprise reporting with complex joins	Aurora PostgreSQL	Complex SQL; parallel query; high read throughput
Globally distributed app (multi-region writes)	DynamoDB Global Tables or Aurora DSQL	Active-active; conflict resolution; low-latency global access

Key Tradeoffs to Consider

Tradeoff	Winner	Explanation
Lowest cost (small relational workload)	RDS	~20% cheaper instances than Aurora; no I/O charges with gp3
Best price-performance (relational)	Aurora	5x MySQL throughput offsets 20% cost premium; I/O-Optimized eliminates surprise bills
Zero operational overhead	DynamoDB	No instances, no patching, no maintenance windows, no version upgrades
Query flexibility	RDS / Aurora	Full SQL — ad-hoc queries, joins, aggregations; DynamoDB requires pre-planned access patterns
Unlimited horizontal scale	DynamoDB	Automatic partitioning; no upper bound on throughput or storage
Multi-region active-active (NoSQL)	DynamoDB	Global Tables with MRSC for zero-RPO multi-region strong consistency
Multi-region active-active (SQL)	Aurora DSQL	Only AWS relational option with true active-active multi-region writes
Engine diversity (Oracle, SQL Server)	RDS	Aurora only supports MySQL/PostgreSQL; RDS supports 6 engines

AWS Certification Exam Tips

📝 Key Exam Concepts:

If the question mentions “millisecond latency at any scale” or “millions of requests/second” → DynamoDB
If the question mentions “complex queries,” “joins,” or “ACID transactions” with high performance → Aurora
If the question mentions “Oracle,” “SQL Server,” or “lift-and-shift” → RDS
If the question mentions “serverless” + “relational” → Aurora Serverless v2
If the question mentions “globally distributed” + “active-active” → DynamoDB Global Tables or Aurora DSQL
If the question mentions “no maintenance windows” → DynamoDB
Aurora’s 6 copies across 3 AZs and shared storage architecture are frequent exam topics
DynamoDB partition key design and hot partition issues are common scenario questions

Practice Questions

Question 1:

A company is building a new e-commerce platform that requires complex SQL queries for inventory management, order processing with ACID transactions, and must handle Black Friday traffic spikes that are 10x normal load. The team uses PostgreSQL. Which database solution provides the best combination of SQL support, performance, and automatic scaling?

Amazon RDS for PostgreSQL with Multi-AZ
Amazon Aurora PostgreSQL with Serverless v2
Amazon DynamoDB with on-demand capacity
Amazon RDS for PostgreSQL with read replicas

Show Answer

Answer: B –

Explanation: Aurora PostgreSQL Serverless v2 provides full PostgreSQL SQL compatibility (complex queries, ACID transactions), delivers 3x PostgreSQL throughput, and automatically scales compute capacity to handle traffic spikes without manual intervention. It scales to zero during low-traffic periods and scales up instantly during Black Friday peaks. RDS would require manual instance resizing or over-provisioning. DynamoDB doesn’t support complex SQL queries or joins needed for inventory management.

Question 2:

A gaming company needs a database for their global leaderboard that must handle 5 million writes per second during peak hours, provide single-digit millisecond read latency, and be available in 4 AWS regions simultaneously with active-active writes. Which solution meets these requirements?

Amazon Aurora Global Database with write forwarding
Amazon DynamoDB with Global Tables
Amazon RDS Multi-AZ with cross-region read replicas
Amazon Aurora DSQL in multi-region configuration

Show Answer

Answer: B –

Explanation: DynamoDB Global Tables provide active-active multi-region replication with single-digit millisecond latency and can handle millions of requests per second with automatic scaling. Gaming leaderboards are a classic DynamoDB use case — simple access patterns (get/put by player ID, query by score) with massive scale requirements. Aurora Global Database doesn’t support active-active writes (only forwarding). Aurora DSQL supports active-active but is optimized for OLTP transactions, not the extreme write throughput needed here. RDS doesn’t support active-active multi-region.

Question 3:

A financial services company is migrating their Oracle-based core banking application to AWS. The application uses PL/SQL stored procedures extensively, requires full ACID compliance, and the team wants minimal code changes. Which is the most appropriate migration path?

Amazon Aurora PostgreSQL with Babelfish
Amazon RDS for Oracle
Amazon DynamoDB with transactions
Amazon Aurora MySQL

Show Answer

Answer: B –

Explanation: RDS for Oracle provides full Oracle engine compatibility including PL/SQL stored procedures, requiring minimal or zero code changes. The requirement explicitly states “minimal code changes” with heavy PL/SQL usage, which rules out Aurora (only MySQL/PostgreSQL compatible). DynamoDB doesn’t support SQL or stored procedures. Babelfish is for SQL Server T-SQL compatibility, not Oracle PL/SQL. For lift-and-shift of Oracle workloads, RDS for Oracle (or RDS Custom for Oracle with OS access) is the correct choice.

Question 4:

A startup is building an IoT platform that ingests sensor data from 100,000 devices. Each device sends readings every 5 seconds. The data must be stored for 30 days, after which it should be automatically deleted. The team needs to query recent data by device ID and time range. Which database solution is most cost-effective and operationally efficient?

Amazon Aurora PostgreSQL with table partitioning
Amazon RDS for MySQL with scheduled deletion jobs
Amazon DynamoDB with TTL enabled
Amazon Aurora Serverless v2 with scheduled scaling

Show Answer

Answer: C –

Explanation: DynamoDB with TTL (Time to Live) is ideal for this scenario. The access pattern is simple (device_id as partition key, timestamp as sort key), enabling efficient range queries. TTL automatically deletes expired items at no cost — no scheduled jobs needed. The write volume (~20,000 writes/second from 100K devices at 5-sec intervals) is easily handled by DynamoDB’s automatic scaling. DynamoDB’s serverless model means zero operational overhead for the startup. Aurora or RDS would require instance management, manual partition pruning or deletion jobs, and would be more expensive for this write-heavy, simple-query workload.

Question 5:

A company runs a MySQL-based application on Amazon RDS. They are experiencing performance issues during peak hours and their DBA reports that read replica lag is causing stale data problems. They need to improve read performance with minimal replication lag while keeping MySQL compatibility. The budget allows for a 20% cost increase. What should they do?

Upgrade to a larger RDS instance class
Migrate to Amazon Aurora MySQL
Add more RDS read replicas
Migrate to Amazon DynamoDB

Show Answer

Answer: B –

Explanation: Aurora MySQL provides 5x throughput improvement over standard MySQL, and its shared storage architecture means read replicas have near-zero replication lag (typically <10ms vs seconds/minutes for RDS async replicas). This directly addresses both problems: performance issues and stale read data. Aurora instances cost ~20% more than RDS (within budget). Adding more RDS replicas doesn’t fix the lag issue. A larger instance helps writes but doesn’t fix replica lag. DynamoDB would require a complete application rewrite and doesn’t maintain MySQL compatibility.

Summary

Amazon RDS is the right choice for traditional relational workloads requiring specific engines (Oracle, SQL Server, MariaDB, Db2), lift-and-shift migrations, or when cost is the primary concern for steady, predictable workloads.
Amazon Aurora excels for MySQL/PostgreSQL workloads needing superior performance, high availability, automatic storage scaling, and modern features like Serverless v2, Limitless Database, and DSQL for distributed SQL.
Amazon DynamoDB is the answer for applications requiring unlimited scale, single-digit millisecond latency, zero operational overhead, and well-defined access patterns that fit a key-value or document model.
Many real-world architectures use multiple services together — e.g., Aurora for transactional data and DynamoDB for session stores or high-velocity data streams.
The selection decision should be driven by access patterns, scale requirements, and data model — not team familiarity or historical precedent.

Frequently Asked Questions

How do I choose between RDS, Aurora, and DynamoDB?

Start with your data model: if you need key-value/document access patterns, choose DynamoDB. For relational data, choose Aurora for high performance and scaling needs, or standard RDS for simple workloads where cost is the priority.

Is Aurora worth the extra cost over RDS?

Aurora costs ~20% more than standard RDS but delivers up to 5x MySQL / 3x PostgreSQL throughput, 6-way storage replication, faster failover (30s vs 60-120s), and auto-scaling storage up to 128TB. Worth it for production workloads needing high availability.

Can DynamoDB replace a relational database?

DynamoDB can replace relational databases for access-pattern-driven workloads, but not for complex ad-hoc queries, multi-table joins, or analytics. It excels at single-digit millisecond key-value lookups at any scale but requires careful data modeling upfront.

References

Aurora vs RDS vs DynamoDB – Database Services Compared

June 29, 2026 ~ Last updated on : July 3, 2026 ~ Kiro Agent

AWS Aurora vs RDS vs DynamoDB – Database Services Compared

AWS offers multiple database services, each designed for different workloads. Amazon Aurora, Amazon RDS, and Amazon DynamoDB are three of the most widely used options. Understanding their architecture, scaling capabilities, pricing models, and ideal use cases is critical for both real-world implementations and AWS certification exams.

Overview

Amazon RDS – Managed relational database service supporting MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server. Handles provisioning, patching, backup, and failover.
Amazon Aurora – Cloud-native relational database compatible with MySQL and PostgreSQL. Part of the RDS family but with a fundamentally redesigned storage architecture delivering up to 5x MySQL and 3x PostgreSQL throughput.
Amazon DynamoDB – Fully managed, serverless NoSQL key-value and document database delivering single-digit millisecond performance at any scale.

Architecture

Amazon RDS Architecture

Traditional database architecture with compute and storage tightly coupled on a single instance
Uses Amazon EBS (gp3 or io2) for storage, attached to a single DB instance
Multi-AZ deployment creates a synchronous standby replica in another AZ for failover
Multi-AZ DB Clusters (MySQL/PostgreSQL) provide one writer + two readable standbys across 3 AZs with faster failover (~35 seconds)
Read Replicas use asynchronous replication (up to 15 for MySQL/MariaDB, 5 for PostgreSQL/Oracle/SQL Server)
Supports ENA Express for improved Multi-AZ replication (2026)

Amazon Aurora Architecture

Separation of compute and storage – fundamentally different from RDS
Shared distributed storage volume spanning 3 AZs with 6 copies of data
Writes acknowledged when 4 of 6 copies confirm (quorum-based)
Tolerates loss of 2 copies without write impact, 3 copies without read impact
Storage auto-scales from 10 GB up to 128 TB with no provisioning
Up to 15 low-latency read replicas sharing the same storage volume (replication lag typically <10ms)
Failover to read replica in <30 seconds (shared storage means no data copy needed)
Aurora Serverless – auto-scales compute capacity in ACU (Aurora Capacity Units); scales to zero; up to 30% better performance with enhanced scaling (2026)
Aurora Global Database – cross-region replication with <1 second lag, RPO <1 second
Aurora DSQL (GA May 2025) – distributed SQL with active-active multi-region writes, 99.999% availability, PostgreSQL-compatible

Amazon DynamoDB Architecture

Fully serverless – no instances to provision or manage
Distributed across multiple AZs automatically (data replicated 3 times)
Partitioned by partition key for horizontal scalability
Supports key-value and document data models
Global Tables – active-active multi-region replication with multi-region strong consistency (MRSC, GA 2025) enabling zero RPO
Cross-account replication for Global Tables (2026)
DAX (DynamoDB Accelerator) provides microsecond read latency via in-memory caching

Data Model

Feature	RDS	Aurora	DynamoDB
Type	Relational (SQL)	Relational (SQL)	NoSQL (Key-Value / Document)
Schema	Fixed schema, structured data	Fixed schema, structured data	Flexible schema, semi-structured
Query Language	SQL	SQL (MySQL/PostgreSQL compatible)	PartiQL, API-based access
Transactions	Full ACID	Full ACID	ACID (up to 100 items, 4 MB)
Joins	Yes (complex queries)	Yes (complex queries)	No native joins (denormalized design)
Secondary Indexes	Standard SQL indexes	Standard SQL indexes	GSI and LSI (max 20 GSI, 5 LSI)

Scaling

Vertical Scaling

Aspect	RDS	Aurora	DynamoDB
Compute scaling	Instance class change (requires downtime/failover)	Instance class change OR Aurora Serverless (auto-scales ACUs in seconds)	N/A – fully serverless, automatic
Storage scaling	Manual increase (up to 64 TB); auto-scaling with threshold	Automatic (10 GB to 128 TB, no action needed)	Unlimited – fully managed
Max storage	64 TB (io2), 16 TB (gp3)	128 TB	Virtually unlimited

Horizontal Scaling

Aspect	RDS	Aurora	DynamoDB
Read scaling	Up to 5-15 Read Replicas (async)	Up to 15 Read Replicas (shared storage, <10ms lag); Auto Scaling	Automatic partitioning; DAX for caching
Write scaling	Single writer (vertical only)	Single writer (Aurora DSQL supports multi-region active-active writes)	Automatic horizontal partitioning; virtually unlimited write throughput
Global distribution	Cross-Region Read Replicas	Aurora Global Database (<1s replication); Aurora DSQL (active-active)	Global Tables (active-active, multi-region strong consistency)

Pricing

Component	RDS	Aurora	DynamoDB
Compute	Per-hour instance pricing	Per-hour (provisioned) or per-ACU-second (serverless)	No compute charges (serverless)
Storage	EBS: ~$0.115/GB/month (gp3)	$0.10/GB/month (includes replication)	$0.25/GB/month (Standard); $0.10/GB (IA)
I/O	Included in EBS pricing (gp3: 3000 IOPS free)	$0.20/million I/O requests (Standard); Aurora I/O-Optimized eliminates I/O charges for +30% compute cost	On-demand: $1.25/million WRU, $0.25/million RRU; Provisioned: ~$0.00065/WCU/hour
Savings options	Reserved Instances (1yr/3yr)	Reserved Instances (1yr/3yr)	Reserved Capacity (up to 77% savings, 3yr); On-demand 50% price reduction (2024)
Free Tier	750 hrs/month db.t2.micro or db.t3.micro (12 months)	$100 credits at sign-up for Aurora Serverless	25 GB storage + 25 WCU + 25 RCU (always free)

Cost Comparison Note: Aurora instances cost ~20% more per vCPU than equivalent RDS instances. However, Aurora’s shared storage and automatic replication often result in lower total cost for high-availability workloads. DynamoDB is most cost-effective for simple access patterns with predictable traffic (provisioned mode) or highly variable traffic (on-demand mode).

Performance

Metric	RDS	Aurora	DynamoDB
Throughput	Standard MySQL/PostgreSQL performance	Up to 5x MySQL, 3x PostgreSQL throughput	Virtually unlimited (scales with partitions)
Latency	Low milliseconds	Low milliseconds (optimized I/O path)	Single-digit milliseconds; microseconds with DAX
Read Replica lag	Seconds to minutes (async)	<10ms (shared storage)	Eventually consistent reads return latest; strongly consistent available
Connection model	Connection-based (limited by instance)	Connection-based; RDS Proxy for pooling	HTTP API (connectionless, unlimited concurrent)

Availability & Durability

Feature	RDS	Aurora	DynamoDB
SLA	99.95% (Multi-AZ)	99.99% (Multi-AZ)	99.99% (single region); 99.999% (Global Tables)
Failover time	~60 seconds (Multi-AZ instance); ~35 seconds (Multi-AZ cluster)	<30 seconds (shared storage)	Automatic, transparent (no failover concept)
Data replication	Synchronous to 1-2 standbys	6 copies across 3 AZs (quorum writes)	3 copies across multiple AZs
Cross-region DR	Cross-Region Read Replicas (manual promotion)	Aurora Global Database (managed failover, RPO <1s)	Global Tables (active-active, zero RPO with MRSC)

Backup & Disaster Recovery

Feature	RDS	Aurora	DynamoDB
Automated backups	Daily snapshots + transaction logs; retention 0-35 days	Continuous backup to S3; retention 1-35 days	Continuous backups with PITR (35-day retention)
Point-in-time recovery	Yes (to any second within retention)	Yes (to any second); Backtrack for in-place rewind (MySQL only)	Yes (to any second within 35 days)
Manual snapshots	Yes (retained until deleted)	Yes (retained until deleted)	On-demand backups (retained until deleted)
Cross-region backup	Copy snapshots cross-region; AWS Backup (Multi-AZ clusters in 17 regions, 2026)	Copy snapshots cross-region; AWS Backup	AWS Backup cross-region/cross-account
Restore method	Creates new DB instance	Creates new cluster (Backtrack restores in-place)	Creates new table

Security

Feature	RDS	Aurora	DynamoDB
Network isolation	VPC, Security Groups, Private Subnets	VPC, Security Groups, Private Subnets	VPC Endpoints (Gateway); no VPC placement needed
Encryption at rest	AES-256 with KMS (must enable at creation)	AES-256 with KMS (must enable at creation)	AES-256 with KMS (enabled by default)
Encryption in transit	SSL/TLS	SSL/TLS (enforced by default)	TLS (all API calls over HTTPS)
Authentication	DB user/password, IAM DB Auth, Kerberos/AD	DB user/password, IAM DB Auth, Kerberos/AD	IAM policies, fine-grained access control (condition keys)
Access control	IAM for management; DB-level GRANT for data	IAM for management; DB-level GRANT for data	IAM policies with item-level and attribute-level control

When to Choose Each Service

Choose Amazon RDS When:

You need a traditional relational database with minimal migration effort
You require Oracle, SQL Server, or MariaDB engine support
Budget is a primary concern and Aurora’s premium isn’t justified
Workload is moderate with predictable growth
You need OS-level access (RDS Custom for Oracle/SQL Server)
Simple Multi-AZ failover meets your HA requirements

Choose Amazon Aurora When:

You need MySQL/PostgreSQL compatibility with significantly higher performance
High availability (99.99% SLA) and fast failover (<30s) are critical
You need 15 read replicas with minimal lag
Storage auto-scaling up to 128 TB is required
You need global distribution with Aurora Global Database
Variable workloads benefit from Aurora Serverless (scale to zero)
You want Backtrack for in-place point-in-time rewind (MySQL)
You need distributed SQL with active-active writes (Aurora DSQL)

Choose Amazon DynamoDB When:

You need single-digit millisecond latency at any scale
Access patterns are well-defined (key-value lookups, simple queries)
You need virtually unlimited horizontal scaling for reads and writes
Serverless/zero-management operation is a priority
Global active-active replication is needed (Global Tables)
Event-driven architectures (DynamoDB Streams → Lambda)
Gaming leaderboards, session stores, IoT data, shopping carts
Traffic is highly variable or unpredictable (on-demand mode)

Comprehensive Comparison Table

Feature	Amazon RDS	Amazon Aurora	Amazon DynamoDB
Database Type	Relational (SQL)	Relational (SQL)	NoSQL (Key-Value/Document)
Engines	MySQL, PostgreSQL, MariaDB, Oracle, SQL Server	MySQL-compatible, PostgreSQL-compatible	Proprietary NoSQL
Management	Managed (provisioned instances)	Managed (provisioned) or Serverless	Fully serverless
Max Storage	64 TB	128 TB (auto-scaling)	Unlimited
Performance	Standard engine performance	5x MySQL / 3x PostgreSQL	Single-digit ms; microseconds with DAX
Read Replicas	Up to 5-15 (async)	Up to 15 (shared storage, <10ms lag)	N/A (automatic distribution)
Write Scaling	Vertical only	Vertical (DSQL: horizontal)	Automatic horizontal
Availability SLA	99.95%	99.99%	99.99% / 99.999% (Global)
Failover	~35-60 seconds	<30 seconds	Automatic (no downtime)
Backup	Automated + Manual snapshots	Continuous + Backtrack + Snapshots	PITR (35 days) + On-demand backups
Global Replication	Cross-Region Read Replicas	Global Database (<1s lag)	Global Tables (active-active, MRSC)
Serverless Option	No	Aurora Serverless (scale to zero)	Fully serverless (always)
Transactions	Full ACID	Full ACID	ACID (limited scope)
Best For	Traditional RDBMS workloads, multi-engine support	High-performance relational, global apps	High-scale key-value, serverless apps

AWS Certification Practice Questions

Question 1

A company is designing a new e-commerce platform that requires sub-millisecond read latency for its product catalog, which contains millions of items accessed by product ID. The application has unpredictable traffic spikes during flash sales. Which database solution is MOST appropriate?

Amazon RDS MySQL with Multi-AZ deployment
Amazon Aurora PostgreSQL with Read Replicas
Amazon DynamoDB with DAX
Amazon RDS PostgreSQL with ElastiCache

Show Answer

Answer: C –

Explanation: DynamoDB with DAX provides microsecond read latency for key-value lookups. Its on-demand mode handles unpredictable traffic spikes without pre-provisioning. The product catalog accessed by product ID is a perfect key-value pattern.

Question 2

A financial services company needs a relational database with 99.99% availability, automatic storage scaling, and the ability to perform point-in-time recovery by rewinding the database to a specific time without creating a new instance. Which service supports this requirement?

Amazon RDS Multi-AZ with automated backups
Amazon Aurora MySQL with Backtrack
Amazon DynamoDB with PITR enabled
Amazon RDS Multi-AZ DB Cluster with AWS Backup

Show Answer

Answer: B –

Explanation: Aurora Backtrack (MySQL only) allows rewinding the database in-place to a specific point in time without creating a new cluster. Combined with Aurora’s 99.99% SLA and auto-scaling storage, it meets all requirements. RDS PITR creates a new instance, not an in-place rewind.

Question 3

A global gaming company needs a database that supports active-active writes across multiple AWS Regions with strong consistency and zero RPO for disaster recovery. The data model is simple player profiles accessed by player ID. Which solution meets these requirements?

Amazon Aurora Global Database with managed failover
Amazon RDS with Cross-Region Read Replicas
Amazon DynamoDB Global Tables with multi-region strong consistency (MRSC)
Amazon Aurora DSQL with multi-Region cluster

Show Answer

Answer: C –

Explanation: DynamoDB Global Tables with MRSC (GA 2025) provides active-active multi-region writes with strong consistency and zero RPO. The simple key-value access pattern (player ID lookup) is ideal for DynamoDB. Aurora Global Database is single-writer with RPO <1s (not zero). Aurora DSQL also supports multi-region active-active writes but the simple data model makes DynamoDB the most appropriate choice.

Question 4

A startup is building a SaaS application with variable traffic. During business hours, the database handles 10,000 transactions/second, but at night traffic drops to near zero. They need MySQL compatibility and want to minimize costs. Which solution is MOST cost-effective?

Amazon RDS MySQL with Reserved Instances
Amazon Aurora MySQL Serverless
Amazon DynamoDB with on-demand capacity
Amazon Aurora MySQL provisioned with Auto Scaling replicas

Show Answer

Answer: B –

Explanation: Aurora Serverless scales compute capacity automatically based on demand and can scale to zero during idle periods. This is the most cost-effective option for variable workloads requiring MySQL compatibility. RDS Reserved Instances charge for 24/7 capacity. DynamoDB doesn’t provide SQL/MySQL compatibility.

Question 5

A company is migrating a legacy Oracle database to AWS. The application uses complex SQL queries with multiple joins, stored procedures, and requires Oracle-specific features. They need Multi-AZ high availability. Which AWS database service should they use? (Select TWO)

Amazon Aurora PostgreSQL (Oracle-compatible mode)
Amazon RDS for Oracle with Multi-AZ
Amazon DynamoDB with complex access patterns
Amazon RDS Custom for Oracle with Multi-AZ
Amazon Aurora MySQL with stored procedures

Show Answer

Answer: B, D

Explanation: Amazon RDS for Oracle provides managed Oracle database with Multi-AZ HA. RDS Custom for Oracle allows OS and database customization for Oracle-specific features requiring elevated access. Aurora does not support Oracle. DynamoDB doesn’t support SQL joins or stored procedures. Note: RDS Custom for Oracle EOL is March 31, 2027 – plan migration to RDS for Oracle or alternative.

Key Takeaways

RDS is the go-to managed relational database when you need specific engines (Oracle, SQL Server, MariaDB) or traditional RDBMS at moderate scale
Aurora is the premium relational choice for MySQL/PostgreSQL workloads needing higher performance, better availability, and advanced features like Serverless, Global Database, and Backtrack
DynamoDB is the choice for NoSQL key-value workloads requiring unlimited scale, serverless operation, single-digit millisecond latency, and global active-active replication
All three services integrate with AWS security features (KMS encryption, IAM, VPC/VPC Endpoints, CloudTrail)
Cost optimization: Use Reserved Instances (RDS/Aurora) or Provisioned + Reserved Capacity (DynamoDB) for steady workloads; use Aurora Serverless or DynamoDB On-Demand for variable workloads

Frequently Asked Questions

What is the difference between Aurora and RDS?

Aurora is AWS’s cloud-native database with up to 5x MySQL and 3x PostgreSQL performance, 6-way storage replication, and up to 128TB auto-scaling storage. Standard RDS uses traditional database engines with simpler architecture and lower cost for small workloads.

When should I use DynamoDB instead of Aurora?

Use DynamoDB for key-value/document workloads needing single-digit millisecond latency at any scale, serverless applications, or when you need global multi-region active-active replication. Use Aurora for complex SQL queries, joins, and transactions.

Is Aurora Serverless good for production?

Aurora Serverless v2 is production-ready and scales instantly in fine-grained 0.5 ACU increments. It’s ideal for variable workloads, dev/test environments, and applications with unpredictable traffic patterns.

References

Google Vertex AI – ML Platform & Model Garden

June 26, 2026 ~ Kiro Agent

Google Cloud Vertex AI – ML Platform Overview

Vertex AI is Google Cloud’s unified machine learning platform that brings together all the tools and services needed to build, deploy, and scale ML models and AI applications.
Vertex AI consolidates previously separate Google Cloud AI services (AI Platform, AutoML, Dialogflow) into a single, cohesive platform.
In April 2026, Google rebranded Vertex AI as the Gemini Enterprise Agent Platform at Google Cloud Next ’26, shifting focus toward an agent-first architecture while retaining all existing ML/AI capabilities.
Vertex AI supports the entire ML lifecycle: data preparation, model training, evaluation, deployment, monitoring, and management.
Integrates natively with BigQuery, Cloud Storage, Dataflow, and other Google Cloud services.
Supports both traditional ML (tabular, vision, NLP) and generative AI (foundation models, LLMs, multimodal).
Provides both no-code/low-code (AutoML, Studio) and code-first (custom training, SDK, notebooks) approaches.

Vertex AI Key Components

Model Garden – Access to 200+ foundation models (first-party, third-party, open)
Vertex AI Studio – Interactive prompt design, testing, and tuning interface
Model Training – AutoML and custom training with GPUs/TPUs
Model Deployment – Online and batch prediction endpoints
Vertex AI Pipelines – ML workflow orchestration (Kubeflow/TFX)
Feature Store – Centralized feature management integrated with BigQuery
Vector Search – High-performance similarity search using ScaNN
Agent Builder – Build, deploy, and govern AI agents
Grounding – Connect models to real-time data sources
Model Evaluation – Comprehensive model assessment tools
Workbench – Managed Jupyter notebook environments
MLOps – Model Registry, Experiments, Monitoring

Model Garden

Model Garden provides access to 200+ curated foundation models, organized into three categories:
- First-party models – Google’s own models: Gemini (text, code, multimodal), Imagen (image generation), Veo (video generation), Chirp (speech), Gemma (open models), Lyria (music)
- Third-party models – Partner models: Anthropic’s Claude family, Mistral AI models
- Open models – Community models: Meta’s Llama 3.2, Gemma 3, and others deployable on Google Cloud infrastructure
First-party and select third-party models are available as managed APIs (serverless, no infrastructure management required).
Open models can be deployed to dedicated endpoints with custom hardware configurations.
Model Garden supports one-click deployment, fine-tuning notebooks, and optimization features for open models.
Key Gemini models available:
- Gemini 2.5 Flash – Fast, cost-efficient model for everyday tasks
- Gemini 2.5 Pro – Advanced reasoning and complex tasks
- Gemini 3 Flash / Gemini 3 Pro – Latest generation (2026)
Supports multimodal inputs: text, images, video, audio, and code.
Enterprise features: data residency controls, VPC Service Controls, customer-managed encryption keys (CMEK), and audit logging.

Vertex AI Studio

Vertex AI Studio is a web-based interface for rapidly prototyping and testing generative AI models.
Key capabilities:
- Prompt Design – Interactive prompt editor with system instructions, few-shot examples, and parameter tuning (temperature, top-k, top-p)
- Prompt Gallery – Pre-built prompt templates for common use cases
- Prompt Optimizer – Automated prompt refinement using Zero-Shot and Data-Driven modes (GA 2025)
- Model Tuning – Fine-tune models with custom datasets without managing infrastructure
- Model Distillation – Train smaller, faster student models from larger teacher models
- Multimodal Testing – Test text, image, video, and audio inputs/outputs
Tuning methods supported:
- Supervised Fine-Tuning (SFT) – Train on labeled input-output pairs
- Reinforcement Learning from Human Feedback (RLHF) – Align model outputs with human preferences
- Distillation – Transfer knowledge from large teacher to smaller student model
- LoRA (Low-Rank Adaptation) – Parameter-efficient fine-tuning
Supports Gemini, Imagen, and Codey model families for tuning.
Provides code export to Python SDK for production integration.

Model Training

Vertex AI provides two primary training approaches:
- AutoML – No-code training for tabular, image, video, and text data with automatic feature engineering, architecture search, and hyperparameter tuning
- Custom Training – Full control using any ML framework (TensorFlow, PyTorch, XGBoost, scikit-learn, JAX) with custom containers or prebuilt containers
AutoML capabilities:
- AutoML Tabular – Classification, regression, forecasting
- AutoML Image – Classification, object detection, segmentation
- AutoML Video – Classification, object tracking, action recognition
- AutoML Text – Classification, entity extraction, sentiment analysis
Custom Training features:
- Supports NVIDIA GPUs: A100, H100, A3 (H100 80GB), and A4X (GB200 NVL72 rack-scale, 2026)
- Supports Cloud TPUs: TPU v2, v3, v5e, v5p for large-scale training
- Distributed training across multiple nodes with automatic orchestration
- Prebuilt containers for TensorFlow, PyTorch, XGBoost, scikit-learn
- Custom containers for any framework or dependency
- Hyperparameter tuning service (Vizier-based)
Vertex AI Training with Cluster Director (2025) – Fully managed, resilient Slurm environment for large-scale training workloads, simplifying multi-node GPU/TPU training.
Training is serverless – no infrastructure provisioning; billed per compute-hour in 30-second increments.
No charges if training fails (except user-initiated cancellations).

Model Deployment

Vertex AI supports multiple deployment options for serving predictions:
- Online Prediction – Low-latency, real-time predictions via HTTPS endpoints
- Batch Prediction – High-throughput predictions on large datasets (asynchronous)
- Private Endpoints – Deploy within VPC for network isolation
Online Prediction features:
- Dedicated endpoints with configurable machine types and accelerators (GPUs, TPUs)
- Automatic scaling (min/max replicas) based on traffic
- Traffic splitting for A/B testing and canary deployments
- Model versioning – deploy multiple model versions to the same endpoint
- Prebuilt containers for TensorFlow, PyTorch, XGBoost, scikit-learn serving
- Custom serving containers for any framework
- TPU VM deployment for high-throughput inference
Batch Prediction features:
- Process large datasets from Cloud Storage or BigQuery
- Serverless – no persistent infrastructure
- Cost-effective for non-real-time workloads
- Supports the same model formats as online prediction
Foundation models (Gemini, Claude, Llama) are served via managed APIs without requiring endpoint deployment.
Supports model explainability (feature attributions) on deployed endpoints.

Vertex AI Pipelines

Vertex AI Pipelines is a serverless orchestration service for automating, monitoring, and governing ML workflows.
Supports two pipeline frameworks:
- Kubeflow Pipelines (KFP) – Python-based pipeline SDK for defining ML workflows as directed acyclic graphs (DAGs)
- TensorFlow Extended (TFX) – End-to-end TensorFlow production ML pipelines
Key features:
- Serverless execution – no cluster management required
- Pipeline scheduling and triggering (event-based or cron)
- Artifact and metadata tracking for lineage
- Integration with Vertex AI services (training, endpoints, Feature Store, Model Registry)
- Reusable pipeline components and templates
- Pipeline versioning and caching for faster iteration
Common pipeline patterns:
- Data ingestion → Feature engineering → Training → Evaluation → Deployment
- Continuous training pipelines triggered by data drift or schedule
- A/B testing pipelines for comparing model versions
Pipelines integrate with Cloud Logging, Cloud Monitoring, and IAM for governance.

Vertex AI Feature Store

Vertex AI Feature Store is a fully managed service for organizing, storing, and serving ML features at scale.
Built on BigQuery – feature data is managed within BigQuery tables or views, eliminating the need for a separate offline store.
Key capabilities:
- Feature Management – Centralized registry of features with metadata, lineage, and versioning
- Online Serving – Low-latency feature retrieval for real-time predictions (single-digit millisecond latency)
- Offline Serving – Point-in-time correct feature retrieval for training data generation
- Feature Sharing – Share features across teams and projects to reduce duplication
- Embedding Support – Store and serve vector embeddings for GenAI/RAG applications
- Vector Similarity Search – Perform approximate nearest neighbor (ANN) search on stored embeddings
Integrated with Dataplex Universal Catalog for feature metadata tracking and discovery.
Supports streaming and batch feature ingestion from BigQuery, Cloud Storage, and Dataflow.
Feature Store integrates with Vertex AI RAG Engine for retrieval-augmented generation workflows.

Vertex AI Vector Search

Vertex AI Vector Search (formerly Matching Engine) is a high-performance, fully managed vector similarity search service.
Built on Google’s ScaNN (Scalable Nearest Neighbors) algorithm for efficient approximate nearest neighbor search.
Key features:
- Supports billions of vectors with low-latency retrieval
- Real-time index updates (streaming inserts)
- Filtering with boolean and numeric predicates
- Multiple distance metrics: cosine, dot product, Euclidean
- Hybrid search combining vector similarity with metadata filters
Vector Search 2.0 (2025-2026):
- Storage-optimized tier – Cost-effective solution for large-scale RAG and semantic search applications
- Auto-tuning – Eliminates manual index configuration, automatically optimizes for workload
- Integrated data retrieval – Returns full item data (not just IDs), eliminating the need for a separate key-value store
Use cases: semantic search, recommendation systems, RAG, image/video retrieval, anomaly detection.
Integrates with Vertex AI Feature Store for embedding storage and serving.

Vertex AI Agent Builder

Vertex AI Agent Builder is Google Cloud’s comprehensive platform for building, scaling, and governing AI agents.
Evolved from and consolidates: Dialogflow CX (now Conversational Agents), Vertex AI Search, and Generative AI App Builder.
Key pillars:
- Agent Development Kit (ADK) – Code-first Python framework for building multi-agent systems
- Agent Studio – Low-code visual builder with 35+ pre-built agent templates
- Agent Engine – Managed runtime for deploying and scaling agents in production
- Persistent Memory – Long-term memory and session management for agents
- Enterprise Governance – Access controls, audit logging, and compliance
Conversational Agents (formerly Dialogflow CX):
- Build hybrid agents combining deterministic flows with generative AI
- Playbooks – Define goals and tools, let the LLM determine execution paths
- Multi-channel support: web, phone, messaging platforms
- Phone Gateway for voice agents
Vertex AI Search:
- Enterprise search over structured, unstructured, and website data
- RAG-powered answers grounded in enterprise documents
- Supports PDF, HTML, Cloud Storage, BigQuery, and third-party data sources
Supports Agent-to-Agent (A2A) protocol for multi-agent collaboration.
200+ foundation models accessible within Agent Builder workflows.

Grounding

Grounding connects generative AI model outputs to verifiable sources of information, reducing hallucinations and improving accuracy.
Grounding options:
- Grounding with Google Search – Connects models to real-time, publicly available web content with cited sources
- Grounding with Vertex AI Search – Connects models to enterprise data (documents, databases, websites)
- Grounding with Parallel Web Search – Multi-hop agents that perform deeper web searches for complex questions
Key features:
- Dynamic Retrieval – Model automatically determines when grounding is needed based on the prompt
- Source Citations – Responses include links to source documents
- Grounding Scores – Confidence scores indicating how well the response is supported by sources
- Supports up to 10 Vertex AI Search data sources per request
- Can combine Google Search grounding with enterprise data grounding
- Limit of 1 million queries per day for Grounding with Google Search
Starting with Gemini 2.0, Google Search is available as a tool – the model can autonomously decide when to search.
Gemini 3 Pro includes 5,000 free Google Search grounding queries per month.

Model Evaluation

Vertex AI provides comprehensive model evaluation capabilities for both traditional ML and generative AI models.
Traditional ML Evaluation:
- Classification metrics: accuracy, precision, recall, F1, AUC-ROC, confusion matrix
- Regression metrics: MAE, RMSE, R-squared
- Feature importance and attribution analysis
- Slice-based evaluation for fairness assessment
Generative AI Evaluation:
- AutoSxS (Auto Side-by-Side) – Automated pairwise comparison between models
- Pointwise evaluation metrics: fluency, coherence, safety, groundedness
- Custom evaluation criteria using LLM-as-a-judge
- RAG evaluation: context relevance, answer faithfulness, answer relevance
- Batch evaluation across datasets
Evaluation results stored in Vertex AI Experiments for tracking and comparison.
Supports evaluation of fine-tuned models against base models to measure improvement.

Vertex AI Workbench

Vertex AI Workbench provides managed Jupyter notebook environments for ML development and experimentation.
Workbench Instances (managed notebooks):
- JupyterLab-based environment with pre-installed ML frameworks
- Configurable machine types with GPU/TPU accelerators
- Automatic idle shutdown to reduce costs
- Integration with Git for version control
- Pre-authenticated access to Google Cloud services
- Supports custom containers for specialized environments
Colab Enterprise – Collaborative notebook environment integrated with Vertex AI:
- Serverless and managed runtimes
- Enterprise security (VPC-SC, CMEK, IAM)
- Shared notebooks with team collaboration features
- Code completion with Gemini Code Assist
Notebooks can directly launch Vertex AI training jobs, access Feature Store, and deploy models.

MLOps

Vertex AI provides integrated MLOps tooling for production ML lifecycle management.
Model Registry:
- Central repository for all trained models
- Model versioning with version aliases and descriptions
- Model lineage tracking (training data, pipeline, hyperparameters)
- Model labels and metadata for organization
- Import models from any source (Vertex AI, BigQuery ML, external)
- Default version management for deployments
Vertex AI Experiments:
- Track and compare training runs with metrics, parameters, and artifacts
- Automatic logging integration with TensorFlow and PyTorch
- Visualization of experiment results
- Integration with Vertex AI Pipelines for automated experimentation
Model Monitoring:
- Detect data drift (training-serving skew) automatically
- Feature distribution monitoring
- Prediction output monitoring for quality degradation
- Configurable alerting thresholds
- Integration with Cloud Monitoring and Cloud Logging
Vertex AI Metadata:
- Track artifacts, executions, and their lineage across the ML lifecycle
- Query metadata for auditing and compliance
- Automatic capture from Vertex AI Pipelines

Pricing Overview

Vertex AI uses a pay-as-you-go pricing model with no upfront commitments.
Generative AI / Foundation Models:
- Priced per million tokens (input and output separately)
- Gemini 2.5 Flash: ~$0.15/1M input tokens, ~$0.60/1M output tokens (cost-efficient)
- Gemini 2.5 Pro: ~$1.25/1M input tokens, ~$5.00/1M output tokens
- Third-party models (Claude, Llama): pricing varies by model
- Grounding with Google Search: $2.50 per 1,000 requests (Gemini 3 Pro includes 5,000 free/month)
Custom Training:
- Charged per node-hour based on machine type and accelerators
- 30-second billing increments (no minimum duration)
- No charge for failed training jobs (except user cancellations)
- GPU pricing varies: A100 40GB ~$3.67/hr, H100 80GB ~$11.54/hr per accelerator
- TPU pricing: v5e ~$1.20/hr per chip
Predictions:
- Online prediction: charged per node-hour for deployed endpoint compute
- Batch prediction: charged per node-hour for processing time
- Automatic scaling to zero available (no charge when idle)
Other Services:
- AutoML Training: per node-hour (varies by data type)
- Feature Store: per GB stored + per million online reads
- Vector Search: per node-hour for deployed indexes + storage
- Pipelines: per pipeline run based on compute consumed
- Agent Builder: per query/session based on usage
Total monthly costs range from under $100 for prototyping to $100,000+ for enterprise production workloads.
Free tier: new customers receive $300 Google Cloud credits applicable to Vertex AI services.

Vertex AI vs AWS SageMaker vs Azure ML

Feature	Google Cloud Vertex AI	AWS SageMaker	Azure ML
Platform Philosophy	Unified, integrated with BigQuery and Google AI research	Broadest feature set, deep AWS integration	Enterprise governance, Microsoft ecosystem integration
Foundation Models	200+ models (Gemini, Claude, Llama) via Model Garden	100+ models via Amazon Bedrock (Titan, Claude, Llama, Mistral)	OpenAI models (GPT-4, o1) + open models via Azure AI Foundry
AutoML	AutoML for tabular, image, video, text	SageMaker Autopilot (tabular focus)	Automated ML for tabular, vision, NLP
Custom Training	GPUs + TPUs, serverless, Cluster Director	GPUs + Trainium/Inferentia, SageMaker Training	GPUs, Azure ML Compute clusters
Notebooks	Workbench + Colab Enterprise	SageMaker Studio Notebooks	Azure ML Notebooks
Pipelines	Kubeflow/TFX-based, serverless	SageMaker Pipelines (proprietary SDK)	Azure ML Pipelines (designer + SDK)
Feature Store	BigQuery-integrated, GenAI-ready	SageMaker Feature Store	Azure ML Feature Store (managed)
Vector Search	Vertex AI Vector Search (ScaNN)	OpenSearch Serverless / Bedrock Knowledge Bases	Azure AI Search (vector + hybrid)
AI Agents	Agent Builder (ADK, Agent Engine, Studio)	Amazon Bedrock Agents	Azure AI Agent Service
Grounding/RAG	Google Search + Vertex AI Search grounding	Bedrock Knowledge Bases + Guardrails	Azure AI Search + On Your Data
MLOps	Model Registry, Experiments, Monitoring, Metadata	Model Registry, Experiments, Model Monitor, Clarify	Model Registry, Managed Endpoints, Responsible AI
Pricing Model	Pay-per-use, 30s increments, TPU cost advantage	Pay-per-use, instance markups 15-40% over EC2	Pay-per-use, lower platform surcharges
Key Differentiator	BigQuery integration, TPU access, Google AI research	Broadest service catalog, largest ecosystem	OpenAI partnership, Power BI/Office integration

Choose Vertex AI when: using BigQuery for data, need TPU training, prefer Google’s Gemini models, want tight integration with Google Cloud services.
Choose SageMaker when: already on AWS, need the broadest feature set, require deep integration with AWS services, want Trainium/Inferentia for cost-effective inference.
Choose Azure ML when: using Microsoft ecosystem (Office, Teams, Power BI), need OpenAI GPT-4/o1 models, have enterprise governance requirements via Azure AD.

GCP Certification Exam Practice Questions

Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).

GCP services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.

GCP exam questions are not updated to keep up the pace with GCP updates, so even if the underlying feature has changed the question might not be updated

Open to further feedback, discussion and correction.

Your data science team needs to quickly build a classification model on structured data without writing custom code. They want Google Cloud to handle feature engineering and architecture selection. Which Vertex AI service should they use?
1. Vertex AI Custom Training with prebuilt containers
2. Vertex AI AutoML Tabular
3. Vertex AI Workbench with scikit-learn
4. Vertex AI Pipelines with Kubeflow
A company wants to deploy a Gemini model that provides answers grounded in their internal product documentation stored in Cloud Storage. Which Vertex AI feature should they use?
1. Grounding with Google Search
2. Grounding with Vertex AI Search
3. Model distillation
4. Vertex AI Feature Store
Your organization needs to orchestrate an ML workflow that includes data preprocessing, model training, evaluation, and conditional deployment. The solution should be serverless and track artifacts automatically. What should you use?
1. Cloud Composer (Apache Airflow)
2. Cloud Run Jobs
3. Vertex AI Pipelines with Kubeflow Pipelines SDK
4. Cloud Scheduler with Cloud Functions
A team needs to serve features with single-digit millisecond latency for real-time fraud detection. They already have feature data in BigQuery. Which service should they use?
1. BigQuery real-time queries
2. Memorystore for Redis
3. Vertex AI Feature Store with optimized online serving
4. Vertex AI Vector Search
Your company wants to build a semantic search application that finds similar products from a catalog of 10 million items using embedding vectors. Which Vertex AI service is best suited?
1. Vertex AI Feature Store
2. BigQuery vector functions
3. Vertex AI Search
4. Vertex AI Vector Search
An ML engineer wants to fine-tune a Gemini model to follow their company’s specific writing style using a small dataset of 500 examples. Which tuning approach provides the most parameter-efficient method?
1. Full supervised fine-tuning
2. RLHF
3. LoRA (Low-Rank Adaptation)
4. Model distillation
A team has deployed a model on a Vertex AI endpoint and notices prediction quality degrading over time. Which Vertex AI capability should they enable to detect this automatically?
1. Vertex AI Experiments
2. Vertex AI Pipelines
3. Vertex AI Metadata
4. Vertex AI Model Monitoring (training-serving skew detection)
Your team needs to deploy multiple versions of a model and gradually shift traffic from the old version to the new version. Which Vertex AI deployment feature enables this?
1. Batch prediction
2. Private endpoints
3. Traffic splitting on online prediction endpoints
4. Model Registry versioning
A startup wants to use Claude (Anthropic) models on Google Cloud with enterprise security features like VPC Service Controls and data residency. Where should they access Claude?
1. Directly from Anthropic’s API
2. Vertex AI Model Garden (managed API)
3. Deploy Claude on GKE
4. Cloud Run with Claude container
Your organization is training a large language model and needs to use TPU v5e pods for distributed training on Vertex AI. Which training method should they use?
1. AutoML
2. Custom Training with TPU VM configuration
3. Vertex AI Studio fine-tuning
4. BigQuery ML
A company wants to build a customer service chatbot that combines deterministic conversation flows for order tracking with generative AI for open-ended questions. Which service should they use?
1. Vertex AI Studio
2. Vertex AI Search
3. Conversational Agents (formerly Dialogflow CX) with Generative Playbooks
4. Cloud Functions with Gemini API
Which of the following is NOT a component of Vertex AI Agent Builder? (Select one)
1. Agent Development Kit (ADK)
2. Agent Engine
3. Agent Studio
4. Agent Trainer

References

Google Cloud AI Services Cheat Sheet

June 26, 2026 ~ Kiro Agent

Google Cloud AI Services Cheat Sheet

Google Cloud provides a comprehensive suite of AI and Machine Learning services spanning the full ML lifecycle — from data preparation and model training to deployment, inference, and responsible AI governance.
In April 2026, Google rebranded Vertex AI as the Gemini Enterprise Agent Platform at Cloud Next ’26, consolidating all AI/ML services under an agent-first architecture.
Google Cloud AI services are broadly categorized into: AI Platform (Vertex AI / Gemini Enterprise Agent Platform), Foundation Models (Gemini), Pre-trained APIs, Conversational AI, AI Infrastructure (TPUs, GPUs), and Responsible AI tools.

Vertex AI / Gemini Enterprise Agent Platform

Vertex AI (now Gemini Enterprise Agent Platform since April 2026) is Google Cloud’s unified, fully managed ML platform for building, training, deploying, and scaling ML models and generative AI applications.
Provides a single environment combining AutoML and custom training with no-code, low-code, and code-first approaches.
Key components include:
- Vertex AI Workbench — managed Jupyter notebook environment for data exploration and ML development.
- Vertex AI Training — custom model training with distributed training support on GPUs and TPUs.
- Vertex AI Predictions — online and batch prediction endpoints with autoscaling.
- Vertex AI Pipelines — serverless ML workflow orchestration based on Kubeflow Pipelines or TFX.
- Vertex AI Model Registry — central repository to manage, version, and deploy models.
- Vertex AI Feature Store — managed feature storage for serving and sharing ML features at scale.
- Vertex AI Model Garden — catalog of 200+ foundation models including Gemini, Claude, Llama, and open-source models.
- Vertex AI Studio — UI for prompt engineering, model tuning, and testing generative AI models.
- Vertex AI Experiments — track, compare, and analyze ML experiments.
- Vertex AI Model Monitoring — detect data drift and model quality degradation in production.
Supports custom containers (Docker) for training and serving with any ML framework (TensorFlow, PyTorch, JAX, XGBoost, scikit-learn).
Provides pre-built containers for popular frameworks optimized for Google Cloud hardware.
Integrates with BigQuery, Cloud Storage, Dataflow, and other Google Cloud data services.
As of May 2026, Vertex AI has been fully migrated to Gemini Enterprise Agent Platform in the Google Cloud Console. All future updates are delivered through the Agent Platform.

Gemini (Foundation Model)

Gemini is Google’s family of multimodal foundation models from Google DeepMind, capable of understanding and generating text, images, audio, video, and code.
Gemini model family includes:
- Gemini 3 Pro — most capable model for complex reasoning, coding, and multimodal tasks.
- Gemini 3 Flash — optimized for speed and efficiency with near-Pro intelligence at lower cost.
- Gemini 3.5 Flash — latest model with Pro-level coding proficiency and parallel agentic execution at Flash-tier pricing.
- Gemini Nano — on-device model for mobile and edge deployments.
Supports multimodal inputs — can process text, images, audio, video, and code in a single prompt.
Offers a 1M+ token context window for processing large documents, codebases, and long videos.
Supports function calling, grounding with Google Search, and tool use for agentic applications.
Available through Vertex AI Studio, Vertex AI API, and Google AI Studio.
Supports fine-tuning and distillation to customize models for specific use cases.
Provides built-in safety filters with configurable thresholds for responsible deployment.
Gemini for Google Cloud (formerly Duet AI) provides AI-powered assistance across Google Cloud Console, Cloud Code, BigQuery, and other services.

Vertex AI Agent Builder

Vertex AI Agent Builder is Google Cloud’s comprehensive platform to build, scale, and govern reliable AI agents.
Key components include:
- Agent Development Kit (ADK) — open-source, code-first framework for building multi-agent systems.
- Agent Studio — low-code visual builder for designing agent workflows.
- Agent Engine — managed runtime for deploying and scaling agents in production.
- Agent Garden — collection of ready-to-use agent samples and tools.
Supports multi-agent orchestration where multiple agents collaborate on complex workflows.
Provider-agnostic — supports Gemini, Claude, Llama, and hundreds of third-party models from Model Garden.
Includes persistent memory, session management, and enterprise governance features.
Integrates with Google Workspace, third-party APIs, and enterprise data sources.
Supports the Agent-to-Agent (A2A) protocol for inter-agent communication across platforms.

Vertex AI Search

Vertex AI Search (part of AI Applications) brings together deep information retrieval, NLP, and LLM processing to understand user intent and return highly relevant results.
Goes beyond basic keyword matching using AI to deliver relevant results grounded in enterprise data.
Supports multiple data sources — websites, unstructured documents, structured data, and Cloud Storage.
Provides generative AI answers grounded in enterprise data with citations.
Includes Vertex AI Search for Commerce (formerly Recommendations AI) for e-commerce with:
- AI-driven product rankings and catalog enhancements.
- Conversational Commerce agent for guiding users from intent to purchase.
- Personalized search results and recommendations optimized for revenue.
Supports RAG (Retrieval Augmented Generation) patterns for grounding LLM responses in enterprise data.
Provides out-of-the-box search widgets and APIs for quick integration.

Document AI

Document AI is a fully managed platform for document understanding that uses ML and generative AI to extract, classify, and enrich data from documents.
Supports structured, semi-structured, and unstructured documents (invoices, receipts, contracts, forms, IDs).
Key capabilities:
- Document OCR — extract printed and handwritten text from documents and images.
- Form Parser — extract key-value pairs, tables, and checkboxes from forms.
- Specialized Processors — pre-trained models for invoices, receipts, bank statements, pay slips, W-2s, and procurement documents.
- Custom Document Extractor — train custom models for domain-specific documents.
- Document Splitter — classify and split multi-page documents.
- Document AI Warehouse — search, store, and govern documents at scale with AI-powered classification.
Integrates with BigQuery, Cloud Storage, and Vertex AI Pipelines for end-to-end document processing workflows.
Supports human-in-the-loop review for critical document processing.
Processes documents asynchronously in batch or synchronously in real-time.

Vision AI

Vision AI provides pre-trained models for image analysis and computer vision tasks via the Cloud Vision API.
Key features:
- Label Detection — identify objects, locations, activities, animal species, and products in images.
- OCR (Text Detection) — extract printed and handwritten text from images.
- Face Detection — detect faces along with associated attributes (joy, sorrow, anger, surprise).
- Landmark Detection — identify popular natural and man-made landmarks.
- Logo Detection — detect popular product and brand logos.
- SafeSearch Detection — detect explicit content (adult, violence, medical, racy).
- Image Properties — detect dominant colors and crop hints.
- Object Localization — detect and locate multiple objects in an image with bounding polygons.
Supports batch image annotation for processing large volumes of images.
Provides a Product Search feature to find similar products in a product catalog.
Imagen on Vertex AI — Google’s text-to-image generation model for creating and editing images from text prompts.
Veo on Vertex AI — video generation model for creating videos from text and image prompts.
Vision AI pre-trained API provides basic capabilities; for custom image classification or object detection, use AutoML on Vertex AI.

Cloud Speech-to-Text

Speech-to-Text converts audio to text using Google’s deep learning neural network algorithms.
Supports 125+ languages and variants with automatic language detection.
Key features:
- Real-time Streaming — transcribe audio from a microphone or streaming source in real-time.
- Batch Recognition — transcribe pre-recorded audio files up to 480 minutes.
- Multi-channel Recognition — transcribe separate channels (e.g., caller and agent in a call center).
- Speaker Diarization — identify who said what in multi-speaker audio.
- Automatic Punctuation — automatically add punctuation to transcripts.
- Word-level Confidence — confidence scores for individual words.
- Speech Adaptation — boost recognition of domain-specific terms and phrases.
- Chirp — universal speech model with state-of-the-art accuracy across languages.
Provides V2 API with improved accuracy using latest foundation models.
Supports noise robustness for transcribing audio in noisy environments.

Cloud Text-to-Speech

Text-to-Speech converts text into natural-sounding speech using Google’s AI.
Offers 700+ voices across 50+ languages and variants, including Neural2, Studio, and WaveNet voices.
Key features:
- WaveNet Voices — high-fidelity voices generated by DeepMind’s WaveNet model.
- Neural2 Voices — next-generation voices combining Tensor2Tensor with WaveNet for improved quality.
- Studio Voices — premium, human-like voices for professional applications.
- Custom Voice — create a unique voice using your own recordings.
- SSML Support — control pronunciation, speaking rate, pitch, and volume with Speech Synthesis Markup Language.
- Multi-speaker — generate audio with multiple distinct speakers in a single request.
Supports audio output in MP3, OGG Opus, LINEAR16, and MULAW formats.
Integrates with Dialogflow for voice-enabled conversational agents.

Cloud Natural Language AI

Natural Language AI uses ML to extract insights from unstructured text.
Key capabilities:
- Sentiment Analysis — understand the overall sentiment (positive/negative) of text at document and sentence level.
- Entity Analysis — identify entities (people, organizations, locations, events, products) and their types.
- Entity Sentiment Analysis — combine entity and sentiment analysis to understand sentiment about specific entities.
- Syntax Analysis — extract tokens and sentences, identify parts of speech, and create dependency parse trees.
- Content Classification — classify documents into 1,000+ predefined categories.
- Text Moderation — classify text into safety categories (toxic, insult, profanity, etc.).
Supports multiple languages for all analysis features.
Provides the Healthcare Natural Language API for extracting medical entities from clinical text.
For custom text classification or entity extraction, use AutoML Natural Language on Vertex AI.

Cloud Translation AI

Cloud Translation provides real-time language translation using neural machine translation (NMT).
Two editions available:
- Translation API Basic (v2) — simple, quick translations for 100+ languages.
- Translation API Advanced (v3) — enterprise features including glossaries, custom models, and batch translation.
Key features:
- AutoML Translation — train custom translation models with domain-specific terminology.
- Adaptive Translation — real-time customization using few-shot examples without training a full model.
- Glossaries — ensure consistent translation of domain-specific terms (brand names, product names).
- Batch Translation — translate large volumes of documents asynchronously.
- Language Detection — automatically detect the source language.
- Document Translation — translate documents while preserving formatting (PDF, DOCX).
Supports 130+ languages for text translation.
Integrates with Cloud Storage for batch processing and BigQuery for analytics.

Video Intelligence AI

Video Intelligence API enables understanding of video content by analyzing stored and streaming video.
Key features:
- Label Detection — recognize 20,000+ objects, places, and actions in video at shot, frame, or segment level.
- Shot Change Detection — detect scene transitions in video.
- Explicit Content Detection — identify inappropriate content in video.
- Speech Transcription — transcribe speech within video content.
- Text Detection (OCR) — detect and extract text appearing in video frames.
- Object Tracking — track objects across video frames with bounding boxes.
- Person Detection — detect people and track their poses in video.
- Face Detection — detect faces in video (without identification).
- Logo Detection — detect and track brand logos in video.
Supports both stored video (Cloud Storage, URIs) and streaming video analysis.
Provides rich metadata at video, shot, and frame levels for building searchable video archives.
Integrates with Cloud Storage, BigQuery, and Pub/Sub for automated video processing pipelines.

Contact Center AI (CCAI)

Contact Center AI Platform is a full-stack contact center solution for managing customer interactions across voice and digital channels.
Key components:
- CCAI Platform — full CCaaS (Contact Center as a Service) with routing, queuing, and workforce management.
- Dialogflow CX Virtual Agents — AI-powered virtual agents that handle customer interactions before routing to human agents.
- Agent Assist — provides real-time suggestions, knowledge articles, and smart replies to human agents during conversations.
- CCAI Insights — analyzes call transcripts to identify call drivers, sentiment, and conversation topics at scale.
- Conversational Agents — new name for Dialogflow CX in the CCAI context (renamed 2025).
Supports IVA-only deployments to add Google’s generative AI virtual agents without replacing existing contact center infrastructure.
Integrates with third-party CRM and telephony systems (Genesys, Avaya, NICE, Cisco).
Provides sentiment analysis, entity extraction, and intent detection for every conversation.
Supports both voice and digital channels (chat, email, SMS, social media).

Dialogflow (CX and ES)

Dialogflow is a natural language understanding platform for building conversational interfaces (chatbots, voice bots, IVR systems).
Two editions available:
- Dialogflow CX (Conversational Agents) — enterprise-grade edition for complex, multi-turn conversations.
  - Uses visual flow builder for designing conversation paths.
  - Supports state-based conversation management with pages, flows, and transition routes.
  - Provides built-in generative AI capabilities using Gemini for dynamic responses.
  - Supports data store agents for grounding responses in enterprise data.
  - Multi-language support with separate flows per language.
  - Advanced analytics and debugging tools.
- Dialogflow ES (Essentials) — standard edition for simpler, single-turn or basic multi-turn conversations.
  - Intent-based conversation model with contexts for state management.
  - Suitable for small to medium chatbots and simple IVR systems.
  - Simpler setup but less control over complex conversation flows.
Dialogflow CX is recommended for new projects. ES is maintained but CX provides superior capabilities for enterprise use cases.
Integrates with telephony partners, Google Chat, Slack, Facebook Messenger, Twilio, and custom channels.
Supports webhook fulfillment for dynamic responses and backend integration.

Recommendations AI

Recommendations AI (now part of Vertex AI Search for Commerce) delivers personalized product recommendations at scale using Google’s ML expertise.
Key recommendation types:
- Recommended for You — personalized suggestions based on user browsing and purchase history.
- Others You May Like — similar product recommendations based on collective user behavior.
- Frequently Bought Together — complementary product suggestions for cross-selling.
- Similar Items — visually or categorically similar products.
- Recently Viewed — personalized recall of previously viewed items.
Supports real-time user events for immediate personalization updates.
Provides A/B testing capabilities to measure recommendation quality impact on revenue.
Requires catalog data (products) and user events (views, add-to-cart, purchases) for model training.
Models improve automatically as more user interaction data is collected.
Recommendations AI has been consolidated into Vertex AI Search for Commerce / AI Commerce Search as of 2025.

Gemini for Google Cloud (formerly Duet AI)

Gemini for Google Cloud is an AI-powered collaborator embedded across Google Cloud services to boost developer and operator productivity.
Previously known as Duet AI for Google Cloud (rebranded to Gemini in February 2024).
Key capabilities across services:
- Gemini Code Assist — AI-powered code completion, generation, and explanation in Cloud Shell Editor, VS Code, JetBrains IDEs, and Cloud Workstations.
- Gemini in BigQuery — generate SQL queries, explain results, suggest optimizations using natural language.
- Gemini in Cloud Console — natural language assistance for cloud operations, troubleshooting, and configuration.
- Gemini in Looker — generate visualizations and formulas from natural language.
- Gemini Cloud Assist — AI-driven recommendations for design, operations, and troubleshooting.
- Gemini in Security — summarize security findings, explain threats, and suggest remediation in Security Command Center.
- Gemini in Databases — generate schemas, optimize queries, and explain database operations (Cloud SQL, Spanner, AlloyDB).
Gemini Code Assist supports 20+ programming languages with full codebase context awareness.
Available in two tiers: Gemini Code Assist Standard and Gemini Code Assist Enterprise with codebase customization.

AutoML

AutoML enables training custom, high-quality ML models with minimal ML expertise using transfer learning and neural architecture search.
AutoML is now integrated into Vertex AI and supports:
- AutoML Image Classification — classify images into custom categories.
- AutoML Object Detection — detect and locate custom objects in images.
- AutoML Text Classification — classify text documents into custom categories.
- AutoML Entity Extraction — extract custom entities from text.
- AutoML Sentiment Analysis — analyze sentiment with custom models.
- AutoML Translation — train custom neural machine translation models.
- AutoML Video Classification — classify video segments.
- AutoML Video Object Tracking — track custom objects in video.
- AutoML Tabular — train models on structured/tabular data for classification, regression, and forecasting.
Uses Google’s state-of-the-art transfer learning and neural architecture search technology.
Requires labeled training data — supports human labeling through Vertex AI Data Labeling service.
Provides model evaluation metrics (precision, recall, F1, confusion matrix) before deployment.
Trained models can be exported for edge deployment (TensorFlow Lite, TF.js, Core ML) or served via Vertex AI Endpoints.
Standalone AutoML products (automl.googleapis.com) have been migrated to Vertex AI. Use Vertex AI for all new AutoML workloads.

Cloud TPUs (Tensor Processing Units)

Cloud TPUs are Google’s custom-designed AI accelerators (ASICs) optimized for training and inference of large ML models using TensorFlow, PyTorch, and JAX.
TPU generations available on Google Cloud:
- TPU v5e — cost-efficient accelerator optimized for training and serving transformer models, text-to-image, and CNNs. 256 chips per Pod.
- TPU v6e (Trillium) — 6th generation with 4.7x peak compute improvement over v5e, doubled HBM capacity/bandwidth, and doubled ICI bandwidth. 256 chips per Pod.
- TPU v5p — high-performance variant optimized for large-scale training workloads.
- TPU7x (Ironwood) — 7th generation, Google’s most powerful TPU:
  - 4.6 petaFLOPS of peak FP8 compute per chip.
  - 192 GiB HBM3e memory per chip with 7.4 TB/s bandwidth.
  - 10x peak performance improvement over v5p.
  - 4x better performance per chip vs. v6e for training and inference.
  - 9,216-chip superpods delivering 42.5 exaFLOPS of FP8 compute.
  - 1.77 PB of directly accessible HBM capacity per superpod.
  - Each chip contains two TensorCores and four SparseCores.
TPUs are connected via high-speed Inter-Chip Interconnect (ICI) for efficient distributed training.
Support multislice training to scale beyond a single TPU Pod for training frontier models.
Available in Google Kubernetes Engine (GKE), Vertex AI, and Cloud TPU VMs.
Optimized for ML frameworks: JAX (best performance), TensorFlow, and PyTorch/XLA.
Support Queued Resources for managing TPU allocation in high-demand scenarios.
Only TPU v5e, v6e, and TPU7x are supported for Vertex AI model deployment. Earlier generations are deprecated for new workloads.

AI Infrastructure (GPUs and VMs)

Google Cloud provides GPU-accelerated VMs optimized for AI/ML workloads as part of AI Hypercomputer — a unified architecture integrating hardware, software, and flexible consumption models.
Key GPU VM families:
- A3 Mega VMs — powered by 8x NVIDIA H100 80GB GPUs with 3.2 Tbps GPU-to-GPU networking. Optimized for large-scale training.
- A3 Ultra VMs — powered by 8x NVIDIA H200 141GB GPUs (GA since late 2024). Superior memory bandwidth for large model training and inference.
- A2 Ultra VMs — powered by NVIDIA A100 80GB GPUs.
- G2 VMs — powered by NVIDIA L4 GPUs, optimized for inference and smaller training workloads.
Hypercompute Cluster — highly scalable clustering system for multi-node GPU workloads (GA 2024).
Key features:
- Dynamic Workload Scheduler — efficiently schedule and manage GPU/TPU workloads.
- Multislice/Multihost Training — scale training across multiple VMs/TPU slices.
- NVIDIA NVLink and NVSwitch — high-bandwidth GPU-to-GPU interconnect within nodes.
- GPUDirect-TCPXO — optimized networking stack for distributed GPU training.
Supports JetStream and vLLM for optimized LLM serving on both TPUs and GPUs.
Available with committed use discounts (CUDs) and on-demand pricing.
Integrates with GKE for container-orchestrated AI workloads and Vertex AI for managed training/serving.

Responsible AI

Google Cloud provides tools and frameworks for developing AI responsibly, aligned with Google’s AI Principles.
Key Responsible AI capabilities:
- Vertex Explainable AI (XAI) — understand model predictions through:
  - Feature-based Explanations — feature attributions showing how each input feature contributed to a prediction (Shapley values, Integrated Gradients, XRAI).
  - Example-based Explanations — identify training examples most similar to the input being explained.
- Model Cards — structured documentation describing model performance, intended use, limitations, and ethical considerations. Supports generating Model Cards automatically via Vertex AI Pipelines.
- Fairness Indicators — evaluate model performance across different demographic groups to identify potential bias.
- Data Cards — document dataset characteristics, collection methodology, and known biases.
- Safety Filters — configurable content filtering for generative AI models across categories (hate speech, harassment, sexually explicit, dangerous content).
- Guardrails — set boundaries on model behavior with system instructions and safety settings.
- Model Evaluation — evaluate generative models on safety, quality, and groundedness metrics.
Safety attribute scoring available in all Vertex AI generative AI APIs with configurable confidence thresholds.
Vertex AI provides built-in content filtering that can be tuned per use case.
Supports Responsible AI practices throughout the ML lifecycle: data collection, training, evaluation, deployment, and monitoring.
Google publishes annual Responsible AI Progress Reports detailing governance, safety testing, and red-teaming practices.

Google Cloud vs AWS AI Services Comparison

Category	Google Cloud Service	AWS Equivalent
ML Platform	Vertex AI / Gemini Enterprise Agent Platform	Amazon SageMaker
Foundation Model	Gemini (3 Pro, 3 Flash, 3.5 Flash, Nano)	Amazon Nova, Claude (via Bedrock)
Model Hub / API	Vertex AI Model Garden	Amazon Bedrock
AI Agent Builder	Vertex AI Agent Builder (ADK, Agent Engine)	Amazon Bedrock Agents
Enterprise Search	Vertex AI Search	Amazon Kendra / Amazon Q Business
AI Code Assistant	Gemini Code Assist	Amazon Q Developer (formerly CodeWhisperer)
Document Processing	Document AI	Amazon Textract
Image Analysis	Cloud Vision AI	Amazon Rekognition (Images)
Image Generation	Imagen on Vertex AI	Amazon Titan Image Generator / Amazon Nova Canvas
Video Generation	Veo on Vertex AI	Amazon Nova Reel
Video Analysis	Video Intelligence AI	Amazon Rekognition Video
Speech-to-Text	Cloud Speech-to-Text	Amazon Transcribe
Text-to-Speech	Cloud Text-to-Speech	Amazon Polly
NLP / Text Analysis	Cloud Natural Language AI	Amazon Comprehend
Translation	Cloud Translation AI	Amazon Translate
Conversational AI	Dialogflow CX (Conversational Agents)	Amazon Lex
Contact Center AI	CCAI Platform	Amazon Connect
Recommendations	Vertex AI Search for Commerce / Recommendations AI	Amazon Personalize
AutoML	Vertex AI AutoML	Amazon SageMaker Autopilot
Custom AI Chips	Cloud TPUs (v5e, v6e Trillium, TPU7x Ironwood)	AWS Trainium / Inferentia
GPU VMs (Training)	A3 Ultra (H200), A3 Mega (H100)	P5 (H100), P5e (H200) instances
GPU VMs (Inference)	G2 (L4 GPUs)	G5 (A10G), Inf2 (Inferentia2)
AI Infrastructure Platform	AI Hypercomputer	AWS AI Infrastructure (UltraClusters)
Explainability	Vertex Explainable AI	SageMaker Clarify
Model Documentation	Model Cards	SageMaker Model Cards
Bias Detection	Fairness Indicators	SageMaker Clarify (Bias Detection)
Forecasting	Vertex AI Forecasting (AutoML Tabular)	Amazon Forecast
Data Labeling	Vertex AI Data Labeling	SageMaker Ground Truth
Feature Store	Vertex AI Feature Store	SageMaker Feature Store
ML Pipelines	Vertex AI Pipelines	SageMaker Pipelines
Notebook Environment	Vertex AI Workbench	SageMaker Studio

AWS VPC Lattice – Application-Layer Service Networking

Key Benefits

VPC Lattice Architecture

Service Network

Services

Target Groups

Listeners

Rules

Resource Configurations and Resource Gateways (GA at re:Invent 2024)

Key Features

Cross-VPC and Cross-Account Communication Without Peering

Weighted Routing (Traffic Splitting)

Auth Policies with IAM (Authorization)

Mutual TLS (mTLS) and Encryption

Observability (CloudWatch, Access Logs, X-Ray)

Service Discovery (DNS-Based)

Availability Zone Affinity

On-Premises Access

Integration with Compute Services

Amazon ECS

Amazon EKS (Kubernetes)

AWS Lambda

Amazon EC2

Mixed Compute Environments

VPC Lattice vs App Mesh vs API Gateway vs PrivateLink vs ALB

When to Choose Which

Use Cases

Microservices Communication

Multi-Account Architectures

Service Mesh Replacement (App Mesh Migration)

Zero Trust Networking

Multi-Tenant SaaS Applications

Hybrid and On-Premises Connectivity

Pricing

Service Pricing (US East – N. Virginia)

Resource Access Pricing

Key Pricing Notes

AWS Certification Exam Practice Questions

Important Points for Certification Exams

Frequently Asked Questions

What is AWS VPC Lattice?

How is VPC Lattice different from PrivateLink?

Does VPC Lattice replace App Mesh?

Related Posts

References

AWS Kinesis Data Streams vs Firehose vs Managed Apache Flink

Service Overview

Amazon Kinesis Data Streams

Amazon Data Firehose

Amazon Managed Service for Apache Flink

Architecture Comparison

Kinesis Data Streams Architecture

Amazon Data Firehose Architecture

Managed Apache Flink Architecture

Data Retention

Scaling Model

Kinesis Data Streams — Shards & On-Demand

Amazon Data Firehose — Fully Automatic

Managed Apache Flink — KPU-based Auto-scaling

Consumers & Processing

Kinesis Data Streams Consumers

Amazon Data Firehose Consumers

Managed Apache Flink Processing

Latency Comparison

Pricing Models (US East – N. Virginia)

Kinesis Data Streams Pricing

Amazon Data Firehose Pricing

Managed Apache Flink Pricing

Data Transformation Capabilities

Ordering & Delivery Guarantees

Comprehensive Comparison Table

When to Choose Each Service

Choose Kinesis Data Streams When:

Choose Amazon Data Firehose When:

Choose Managed Apache Flink When:

Common Architecture Pattern: Using All Three Together

AWS Certification Exam Practice Questions

Related Posts

References

Related Posts