AWS SQS vs SNS vs EventBridge

AWS SQS vs SNS vs EventBridge

  • AWS provides multiple messaging and event-driven services for decoupling application components.
  • SQS is a message queue for point-to-point communication, SNS is a pub/sub notification service, and EventBridge is a serverless event bus for event-driven architectures.
  • These services are often used together but serve different purposes.

SQS vs SNS vs EventBridge Comparison

Feature SQS SNS EventBridge
Pattern Queue (point-to-point) Pub/Sub (fan-out) Event Bus (event-driven)
Delivery Pull-based (consumers poll) Push-based (pushes to subscribers) Push-based (routes to targets)
Consumers Single consumer per message Multiple subscribers Multiple targets per rule
Filtering No native filtering Message attribute filtering Content-based filtering (event patterns)
Retention 1 min to 14 days (default 4 days) No retention (immediate delivery) No retention (replay via archive up to indefinite)
Ordering FIFO queue guarantees order FIFO topic with SQS FIFO No ordering guarantee
Throughput Standard: unlimited; FIFO: 3,000 msg/sec (batching) Standard: unlimited; FIFO: 300 msg/sec Default: varies by region, scalable
Dead Letter Queue Yes Yes (for failed deliveries) Yes (DLQ on target failures)
Targets/Subscribers Consumer applications SQS, Lambda, HTTP/S, Email, SMS, Kinesis Firehose 200+ AWS services, APIs, SaaS apps
Event Sources Producers send messages Producers publish messages 90+ AWS services, custom apps, SaaS partners
Schema No schema enforcement No schema enforcement Schema Registry with discovery
Replay No (message deleted after processing) No Yes (Event Archive and Replay)
Cross-account Yes (resource policy) Yes (resource policy) Yes (cross-account event bus)
Scheduling Delay queues (up to 15 min) No Yes (EventBridge Scheduler)

Amazon SQS – Simple Queue Service

  • Fully managed message queue for decoupling producers from consumers.
  • Standard Queue – at-least-once delivery, best-effort ordering, unlimited throughput.
  • FIFO Queue – exactly-once processing, strict ordering, up to 3,000 msg/sec with batching.
  • Messages are retained up to 14 days – acts as a buffer for traffic spikes.
  • Supports visibility timeout to prevent multiple consumers processing the same message.
  • Supports long polling to reduce empty receives and costs.
  • Dead Letter Queue (DLQ) for messages that fail processing after max retries.
  • Integrates natively with Lambda (event source mapping) for serverless processing.

Amazon SNS – Simple Notification Service

  • Fully managed pub/sub service for fan-out messaging to multiple subscribers.
  • A single message published to a topic is delivered to all subscribers simultaneously.
  • Supports multiple protocols – SQS, Lambda, HTTP/S, Email, SMS, Kinesis Data Firehose, mobile push.
  • Message filtering – subscribers can set filter policies on message attributes to receive only relevant messages.
  • FIFO topics – strict ordering and deduplication when paired with SQS FIFO queues.
  • Fan-out pattern – SNS + multiple SQS queues for parallel processing of the same event.
  • Supports message encryption (SSE-KMS) and cross-account subscriptions.

Amazon EventBridge

  • Serverless event bus for building event-driven architectures at scale.
  • Receives events from 90+ AWS services automatically (no configuration needed).
  • Content-based filtering with event patterns – filter on any field in the event JSON body.
  • Routes events to 200+ AWS service targets including Lambda, Step Functions, API Gateway, SQS, SNS.
  • Schema Registry – automatically discovers and stores event schemas for code generation.
  • Event Archive and Replay – store events indefinitely and replay them for debugging or reprocessing.
  • EventBridge Scheduler – create one-time or recurring schedules (replaces CloudWatch Events cron).
  • EventBridge Pipes – point-to-point integration between sources and targets with filtering, enrichment, and transformation.
  • SaaS partner integrations – receive events from Zendesk, Datadog, Auth0, Shopify, etc.
  • Global endpoints – automatic failover to a secondary region for high availability.

When to Choose Which

  • Choose SQS – Decouple a producer from a single consumer, buffer traffic spikes, guarantee message processing with retries, maintain message ordering (FIFO).
  • Choose SNS – Fan-out a message to multiple subscribers simultaneously, send notifications (email/SMS), simple pub/sub without complex routing.
  • Choose EventBridge – React to AWS service events, route events based on content to different targets, integrate with SaaS applications, need schema discovery, event replay, or scheduling.
  • Combine SNS + SQS – Fan-out pattern where each subscriber needs independent processing with buffering and retry.
  • Combine EventBridge + SQS – Route events to SQS for buffered, reliable processing with backpressure handling.

AWS Certification Exam Practice Questions

  1. A company needs to process orders where each order must be processed exactly once and in the order received. Which service and configuration is most appropriate?
    1. SNS Standard topic
    2. SQS FIFO queue
    3. EventBridge with ordering
    4. SQS Standard queue
  2. An application needs to fan out a single event to three different microservices for parallel processing, each requiring independent retry logic. Which architecture is recommended?
    1. EventBridge with three targets
    2. SQS with three consumers
    3. SNS topic with three SQS queue subscriptions
    4. Three separate SQS queues with direct publishing
  3. A team needs to automatically trigger a Lambda function whenever an S3 object is created, an EC2 instance changes state, or a CodePipeline deployment fails. Which service requires the least configuration?
    1. SNS with S3 event notifications
    2. SQS with CloudWatch Events
    3. EventBridge (receives AWS events automatically)
    4. Lambda with direct triggers
  4. A SaaS application needs to react to events from Shopify and route them to different Lambda functions based on the event type (order_created vs order_cancelled). Which service is best suited?
    1. SNS with message filtering
    2. SQS with message attributes
    3. EventBridge with event pattern rules
    4. API Gateway with Lambda
  5. After a production incident, a team needs to replay all events from the past 7 days to reprocess failed orders. Which service supports this natively?
    1. SQS (messages already consumed)
    2. SNS (no retention)
    3. EventBridge (Archive and Replay)
    4. Kinesis Data Streams

Related Posts

References

Amazon SQS Developer Guide

Amazon SNS Developer Guide

Amazon EventBridge User Guide

AWS SQS Standard vs FIFO Queue – Ordering & Dedup

SQS Standard vs FIFO Queues

AWS SQS Standard vs FIFO Queue

SQS offers two types of queues – Standard & FIFO queues

SQS Standard vs FIFO Queues

SQS Standard vs FIFO Queue Features

Message Order

  • Standard queues provide best-effort ordering which ensures that messages are generally delivered in the same order as they are sent. Occasionally (because of the highly-distributed architecture that allows high throughput), more than one copy of a message might be delivered out of order
  • FIFO queues offer first-in-first-out delivery and exactly-once processing: the order in which messages are sent and received is strictly preserved

Delivery

  • Standard queues guarantee that a message is delivered at least once and duplicates can be introduced into the queue
  • FIFO queues ensure a message is delivered exactly once and remains available until a consumer processes and deletes it; duplicates are not introduced into the queue

Transactions Per Second (TPS)

  • Standard queues allow nearly-unlimited number of transactions per second
  • FIFO queues by default are limited to 300 transactions per second per API action (SendMessage, ReceiveMessage, DeleteMessage).
  • With High Throughput Mode enabled, FIFO queues can support up to 70,000 TPS per API action without batching in select regions (US East N. Virginia, US West Oregon, Europe Ireland), and up to 700,000 messages per second with batching.
  • High Throughput Mode can be enabled from the SQS console and uses message group-level partitioning to achieve higher throughput.
  • In other regions, high throughput quotas vary (up to 18,000 TPS in several regions).

In-Flight Messages

  • Standard queues support approximately 120,000 in-flight messages.
  • FIFO queues now support up to 120,000 in-flight messages (increased from 20,000 in November 2024). In-flight messages are those received by a consumer but not yet deleted from the queue.

Regions

  • Standard & FIFO queues are available in all regions where Amazon SQS is available.

SQS Buffered Asynchronous Client

  • FIFO queues are not compatible with the SQS Buffered Asynchronous Client, where messages are buffered at the client side and sent as a single request to the SQS queue to reduce cost.

Dead-Letter Queue (DLQ) Support

  • Both Standard and FIFO queues support dead-letter queues for handling messages that cannot be processed after a configured number of retries.
  • FIFO queues now support DLQ redrive, allowing messages to be moved from a FIFO dead-letter queue back to the FIFO source queue or a custom FIFO destination queue (launched 2023, expanded to GovCloud in April 2024).

CloudWatch Metrics

  • Standard queues support all standard SQS CloudWatch metrics.
  • FIFO queues support additional metrics (added July 2024):
    • NumberOfDeduplicatedSentMessages – tracks deduplicated messages
    • ApproximateNumberOfGroupsWithInflightMessages – tracks active message groups

AWS Services Supported

  • Standard Queues are supported by all AWS services
  • FIFO Queues now have broader service integration than at launch, but some limitations remain:
    • Supported:
      • Amazon SNS FIFO Topics (can subscribe SQS FIFO queues for ordered fan-out)
      • AWS Lambda (SQS FIFO as event source mapping, with ordered processing per message group)
      • Amazon EventBridge (SQS FIFO as rule target; EventBridge Pipes supports FIFO as source)
      • Auto Scaling Lifecycle Hooks (SQS queue target)
    • Not Supported:
      • S3 Event Notifications (cannot directly target SQS FIFO; use EventBridge as intermediary)
      • Lambda Asynchronous Invocation Destinations (does not support SQS FIFO or SNS FIFO as destination)

Use Cases

  • Standard queues can be used in any scenario, as long as the application can process messages that arrive more than once and out of order
    • Decouple live user requests from intensive background work: Let users upload media while resizing or encoding it.
    • Allocate tasks to multiple worker nodes: Process a high number of credit card validation requests.
    • Batch messages for future processing: Schedule multiple entries to be added to a database.
  • FIFO queues are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can’t be tolerated
    • Ensure that user-entered commands are executed in the right order.
    • Display the correct product price by sending price modifications in the right order.
    • Prevent a student from enrolling in a course before registering for an account.
    • E-commerce order management systems where order processing sequence is critical.
    • Online ticketing systems where tickets are distributed on a first-come-first-served basis.

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A restaurant reservation application needs the ability to maintain a waiting list. When a customer tries to reserve a table, and none are available, the customer must be put on the waiting list, and the application must notify the customer when a table becomes free. What service should the Solutions Architect recommend ensuring that the system respects the order in which the customer requests are put onto the waiting list?
    1. Amazon SNS
    2. AWS Lambda with sequential dispatch
    3. A FIFO queue in Amazon SQS
    4. A standard queue in Amazon SQS
  2. A solutions architect is designing an application for a two-step order process. The first step is synchronous and must return to the user with little latency. The second step takes longer, so it will be implemented in a separate component. Orders must be processed exactly once and in the order in which they are received. How should the solutions architect integrate these components?
    1. Use Amazon SQS FIFO queues.
    2. Use an AWS Lambda function along with Amazon SQS standard queues.
    3. Create an SNS topic and subscribe an Amazon SQS FIFO queue to that topic.
    4. Create an SNS topic and subscribe an Amazon SQS Standard queue to that topic.
  3. A company needs to process over 50,000 messages per second with strict ordering within each customer’s message stream. The system uses Amazon SQS FIFO queues. What should a solutions architect recommend to meet the throughput requirement?
    1. Use multiple standard queues with application-level ordering
    2. Use Amazon Kinesis Data Streams instead of SQS
    3. Enable High Throughput Mode on the FIFO queue and use unique message group IDs per customer
    4. Increase the visibility timeout to allow more concurrent processing
  4. An application sends S3 event notifications to an SQS FIFO queue for ordered processing of uploaded files. The team reports that messages are not being delivered. What is the most likely cause?
    1. The SQS FIFO queue has reached its throughput limit
    2. S3 Event Notifications do not support SQS FIFO queues as a direct destination
    3. The queue’s message deduplication ID is not configured
    4. The IAM role lacks permissions to publish to the queue

References

Kinesis Data Streams vs SQS – Streaming vs Queuing

Kinesis Data Streams vs SQS

Kinesis Data Streams vs SQS

Purpose

  • Amazon Kinesis Data Streams
    • allows real-time processing of streaming big data and the ability to read and replay records to multiple Amazon Kinesis Applications.
    • Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications that read from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).
    • designed for high-volume, real-time data ingestion and processing with multiple concurrent consumers reading the same data.
  • Amazon SQS
    • offers a reliable, highly-scalable hosted queue for storing messages as they travel between applications or microservices.
    • It moves data between distributed application components and helps decouple these components.
    • provides common middleware constructs such as dead-letter queues, poison-pill management, and dead-letter queue redrive (including FIFO DLQ redrive).
    • provides a generic web services API and can be accessed by any programming language that the AWS SDK supports.
    • supports both standard and FIFO queues
    • supports Fair Queues (launched Jul 2025) for standard queues to mitigate noisy neighbor impact in multi-tenant systems.

Scaling

  • Kinesis Data Streams offers three capacity modes:
    • Provisioned mode – requires manual shard management and scaling.
    • On-demand Standard mode – automatically scales throughput (up to 200 MB/s write), eliminating the need for manual shard provisioning.
    • On-demand Advantage mode (launched Nov 2025) – provides warm throughput for instant scaling to handle traffic surges up to 10 GB/s, with 60%+ lower pricing compared to On-demand Standard for high-volume workloads.
  • SQS is fully managed, highly scalable and requires no administrative overhead and little configuration. It scales transparently to handle any volume of messages.
  • SQS FIFO queues support High Throughput mode with up to 70,000 transactions per second per API action (without batching), and up to 700,000 messages per second with batching in select regions.

Ordering

  • Kinesis provides ordering of records within a shard (by partition key), as well as the ability to read and/or replay records in the same order to multiple Kinesis Applications
  • SQS Standard Queue provides best-effort ordering but does not guarantee strict data ordering and provides at least once delivery of messages
  • SQS FIFO Queue guarantees strict data ordering within the message group

Data Retention Period

  • Kinesis Data Streams stores the data for up to 24 hours, by default, and can be extended to 365 days (with extended retention up to 7 days, and long-term retention from 7 to 365 days)
  • SQS stores the message for up to 4 days, by default, and can be configured from 1 minute to 14 days but clears the message once deleted by the consumer

Delivery Semantics

  • Kinesis and SQS Standard Queue both guarantee at least one delivery of the message.
  • SQS FIFO Queue guarantees Exactly once delivery (exactly-once processing via deduplication)

Parallel Clients

  • Kinesis supports multiple consumers reading from the same stream simultaneously
    • With shared throughput (GetRecords), all consumers share the 2 MB/s per shard read capacity
    • With Enhanced Fan-Out, each consumer gets a dedicated 2 MB/s per shard throughput via SubscribeToShard (push-based)
    • On-demand Advantage mode supports up to 50 enhanced fan-out consumers per stream (vs 20 on On-demand Standard or Provisioned)
  • SQS allows the messages to be delivered to only one consumer at a time and requires multiple queues to deliver messages to multiple consumers
  • SQS Fair Queues (Jul 2025) dynamically reorder message delivery to ensure fair processing across tenants when one tenant becomes a noisy neighbor

Message/Record Size

  • Kinesis supports a maximum record size of 1 MB per data record
  • SQS supports a maximum message payload size of 1 MiB (increased from 256 KiB in August 2025). For larger payloads, the Amazon SQS Extended Client Library can be used to store the payload in S3.

Throughput

  • Kinesis Data Streams
    • Provisioned: 1 MB/s write and 2 MB/s read per shard
    • On-demand Standard: automatically scales, default 4 MB/s write, can burst to 200 MB/s
    • On-demand Advantage: supports instant scaling with warm throughput up to 10 GB/s
  • SQS
    • Standard queues: nearly unlimited throughput (no per-queue limits)
    • FIFO queues: 300 TPS default, up to 70,000 TPS with High Throughput mode enabled

Integration with AWS Lambda

  • Kinesis integrates with Lambda via event source mapping with parallelization factor, tumbling windows, and failure handling (bisect on error, max retry)
  • SQS integrates with Lambda via event source mapping with support for batch windows and Provisioned Mode (Nov 2025) that provides 3x faster scaling (up to 1,000 concurrent executions per minute) and 16x higher concurrency (up to 20,000 concurrent executions)

Use Cases

  • Kinesis use cases requirements
    • Ordering of records.
    • Ability to consume records in the same order a few hours later (data replay)
    • Ability for multiple applications to consume the same stream concurrently
    • Routing related records to the same record processor (as in streaming MapReduce)
    • Real-time analytics, log and event data aggregation, IoT telemetry ingestion
  • SQS uses cases requirements
    • Messaging semantics like message-level ack/fail and visibility timeout
    • Leveraging SQS’s ability to scale transparently
    • Dynamically increasing concurrency/throughput at read time
    • Individual message delay, which can be delayed
    • Multi-tenant workloads requiring fair message processing (Fair Queues)
    • Decoupling microservices and serverless event-driven architectures

Kinesis Data Streams vs SQS Comparison Table

Feature Kinesis Data Streams SQS
Primary Use Case Real-time streaming data processing Message queuing and decoupling
Ordering Per-shard (partition key) Best-effort (Standard) / Strict per message group (FIFO)
Delivery At least once At least once (Standard) / Exactly once (FIFO)
Retention 24 hours to 365 days 1 minute to 14 days
Multiple Consumers Yes (shared or enhanced fan-out) No (single consumer per message)
Max Record/Message Size 1 MB 1 MiB
Scaling Provisioned (manual) / On-demand (automatic) Fully automatic
Data Replay Yes No (message deleted after processing)
Provisioning Three modes: Provisioned, On-demand Standard, On-demand Advantage Fully serverless, no provisioning
Fair Processing N/A Fair Queues for multi-tenant workloads

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. You are deploying an application to track GPS coordinates of delivery trucks in the United States. Coordinates are transmitted from each delivery truck once every three seconds. You need to design an architecture that will enable real-time processing of these coordinates from multiple consumers. Which service should you use to implement data ingestion?
    1. Amazon Kinesis
    2. AWS Data Pipeline
    3. Amazon AppStream
    4. Amazon Simple Queue Service
  2. Your customer is willing to consolidate their log streams (access logs, application logs, security logs etc.) in one single system. Once consolidated, the customer wants to analyze these logs in real time based on heuristics. From time to time, the customer needs to validate heuristics, which requires going back to data samples extracted from the last 12 hours? What is the best approach to meet your customer’s requirements?
    1. Send all the log events to Amazon SQS. Setup an Auto Scaling group of EC2 servers to consume the logs and apply the heuristics.
    2. Send all the log events to Amazon Kinesis develop a client process to apply heuristics on the logs (Can perform real time analysis and stores data for 24 hours which can be extended to 365 days)
    3. Configure Amazon CloudTrail to receive custom logs, use EMR to apply heuristics the logs (CloudTrail is only for auditing)
    4. Setup an Auto Scaling group of EC2 syslogd servers, store the logs on S3 use EMR to apply heuristics on the logs (EMR is for batch analysis)
  3. A company runs a multi-tenant SaaS application where different customers submit varying volumes of jobs to an SQS queue. During peak hours, one large customer floods the queue with messages, causing increased dwell time for all other customers. Which SQS feature should the team enable to address this noisy neighbor problem?
    1. SQS FIFO High Throughput mode
    2. SQS Long Polling
    3. SQS Fair Queues (Fair Queues dynamically reorder message delivery to mitigate noisy neighbor impact in multi-tenant standard queues)
    4. SQS Dead-Letter Queue Redrive
  4. A streaming analytics application needs to process real-time clickstream data with five independent consumer applications reading from the same stream simultaneously, each requiring dedicated throughput. The team wants to minimize operational overhead. Which configuration is most appropriate?
    1. Amazon SQS with five separate queues using SNS fan-out
    2. Kinesis Data Streams in Provisioned mode with Enhanced Fan-Out
    3. Kinesis Data Streams in On-demand Advantage mode with Enhanced Fan-Out (On-demand Advantage provides automatic scaling with no shard management and supports up to 50 enhanced fan-out consumers with dedicated 2 MB/s per shard throughput)
    4. Kinesis Data Streams in On-demand Standard mode with shared GetRecords
  5. An application processes order events that are each approximately 500 KB in size. The events need to be placed in a queue for asynchronous processing by a Lambda function. Which approach meets the requirements with the LEAST operational overhead?
    1. Use SQS with the Extended Client Library to store messages in S3
    2. Use SQS directly, as it now supports message payloads up to 1 MiB (Since August 2025, SQS supports up to 1 MiB message payload natively, eliminating the need for S3 offloading for messages under 1 MiB)
    3. Use Kinesis Data Streams with 1 MB record limit
    4. Use Amazon SNS to fan out to multiple SQS queues

References

AWS SQS FIFO Queue – Ordering & Deduplication

AWS SQS FIFO Queue

  • SQS FIFO Queue provides enhanced messaging between applications with the additional features
    • FIFO (First-In-First-Out) delivery
      • order in which messages are sent and received is strictly preserved
      • key when the order of operations & events is critical
    • Exactly-once processing
      • a message is delivered once and remains available until consumer processes and deletes it
      • key when duplicates can’t be tolerated.
      • By default, limited to 300 transactions per second (TPS) per API action (SendMessage, ReceiveMessage, DeleteMessage)
      • With batching (up to 10 messages per API call), effective throughput can reach 3,000 messages per second
      • With High Throughput Mode enabled, supports up to 70,000 TPS per API action (700,000 messages/sec with batching) in select regions
  • FIFO queues provide all the capabilities of Standard queues, improve upon, and complement the standard queue.
  • FIFO queues support message groups that allow multiple ordered message groups within a single queue. There is no quota to the number of message groups within a FIFO queue.
  • FIFO Queue name should end with .fifo
  • SQS FIFO supports one or more producers and messages are stored in the order that they were successfully received by SQS.
  • SQS FIFO queues don’t serve messages from the same message group to more than one consumer at a time.
  • FIFO queues support a maximum of 120,000 in-flight messages (increased from 20,000 in Nov 2024). Messages are considered in-flight after being received by a consumer but not yet deleted.
  • Maximum message payload size is 1 MiB (increased from 256 KiB in Aug 2025), applicable to both standard and FIFO queues. For payloads up to 2 GB, use the Extended Client Library with Amazon S3.
  • AWS Lambda supports SQS FIFO as an event source for building event-driven applications with ordered processing.
  • Not all AWS services support FIFO queues as a direct event destination. For example:
    • Amazon S3 Event Notifications (use Amazon EventBridge as an intermediary to route to FIFO queues)
    • Amazon EventBridge Scheduler Dead-Letter Queues

High Throughput Mode for FIFO Queues

  • High throughput mode increases the transaction limit significantly beyond the default 300 TPS.
  • Supports up to 70,000 transactions per second per API action in select regions (US East N. Virginia, US West Oregon, Europe Ireland).
  • With batching, this translates to up to 700,000 messages per second.
  • Enabling high throughput mode requires two configuration changes:
    • Deduplication scope – Set to Message group (deduplication occurs at the message group level instead of queue level)
    • FIFO throughput limit – Set to Per message group ID (throughput quota applies per message group rather than per queue)
  • If either setting is changed from the required configuration, normal throughput (300 TPS) is in effect.
  • Available in all regions where Amazon SQS is available, though maximum throughput quotas vary by region.
  • To achieve maximum throughput, distribute messages across multiple message groups.

Message Deduplication

  • SQS APIs provide deduplication functionality that prevents message producers from sending duplicates.
  • Message deduplication ID is the token used for the deduplication of sent messages.
  • If a message with a particular message deduplication ID is sent successfully, any messages sent with the same message deduplication ID are accepted successfully but aren’t delivered during the 5-minute deduplication interval.
  • So basically, any duplicates introduced by the message producer are removed within a 5-minute deduplication interval.
  • Message deduplication applies to an entire queue (default), not to individual message groups.
    • With High Throughput Mode enabled, deduplication scope is set to message group level.
  • Content-based deduplication can be enabled on the queue, which uses a SHA-256 hash of the message body to generate the deduplication ID automatically.
  • New FIFO-specific CloudWatch metric NumberOfDeduplicatedSentMessages (added July 2024) tracks the number of messages that were deduplicated.

Message Groups

  • Messages are grouped into distinct, ordered “bundles” within a FIFO queue.
  • Message group ID is the tag that specifies that a message belongs to a specific message group.
  • For each message group ID, all messages are sent and received in strict order.
  • However, messages with different message group ID values might be sent and received out of order.
  • Every message must be associated with a message group ID, without which the action fails.
  • SQS delivers the messages in the order in which they arrive for processing if multiple hosts (or different threads on the same host) send messages with the same message group ID.
  • There is no quota to the number of message groups within a FIFO queue.
  • New FIFO-specific CloudWatch metric ApproximateNumberOfGroupsWithInflightMessages (added July 2024) tracks the approximate number of message groups with in-flight messages.

Dead-Letter Queue (DLQ) Support for FIFO Queues

  • FIFO queues support dead-letter queues. A DLQ for a FIFO queue must also be a FIFO queue.
  • DLQ Redrive for FIFO Queues (launched Nov 2023) allows messages to be moved from a FIFO dead-letter queue back to the source queue or a custom FIFO destination queue.
    • Previously, DLQ redrive was only available for standard queues.
    • Supported via the AWS Console, AWS SDK, and CLI.
    • Available in all commercial regions and AWS GovCloud (US) Regions (April 2024).
  • Configure a redrive policy to specify the maximum number of receives before a message is moved to the DLQ.

SQS Standard Queues vs SQS FIFO Queues

SQS Standard vs FIFO Queues

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. A restaurant reservation application needs the ability to maintain a waiting list. When a customer tries to reserve a table, and none are available, the customer must be put on the waiting list, and the application must notify the customer when a table becomes free. What service should the Solutions Architect recommend to ensure that the system respects the order in which the customer requests are put onto the waiting list?
    1. Amazon SNS
    2. AWS Lambda with sequential dispatch
    3. A FIFO queue in Amazon SQS
    4. A standard queue in Amazon SQS
  2. In relation to Amazon SQS, how can you ensure that messages are delivered in order? Select 2 answers
    1. Increase the size of your queue
    2. Send them with a timestamp
    3. Using FIFO queues
    4. Give each message a unique id
    5. Use sequence number within the messages with Standard queues
  3. A company has run a major auction platform where people buy and sell a wide range of products. The platform requires that transactions from buyers and sellers get processed in exactly the order received. At the moment, the platform is implemented using RabbitMQ, which is a light weighted queue system. The company consulted you to migrate the on-premise platform to AWS. How should you design the migration plan? (Select TWO)
    1. When the bids are received, send the bids to an SQS FIFO queue before they are processed.
    2. When the users have submitted the bids from frontend, the backend service delivers the messages to an SQS standard queue.
    3. Add a message group ID to the messages before they are sent to the SQS queue so that the message processing is in a strict order.
    4. Use an EC2 or Lambda to add a deduplication ID to the messages before the messages are sent to the SQS queue to ensure that bids are processed in the right order.
  4. A company needs to process financial transactions with exactly-once semantics and strict ordering. The system currently handles 500 transactions per second and is expected to grow to 5,000 TPS. Which SQS FIFO configuration should the solutions architect recommend?
    1. Use a standard SQS queue with application-level deduplication
    2. Enable high throughput mode on the FIFO queue with deduplication scope set to message group and distribute transactions across multiple message groups
    3. Use multiple standard queues with sequence numbers
    4. Use a single FIFO queue with default settings and request a quota increase
  5. An application uses an SQS FIFO queue and frequently encounters messages that cannot be processed successfully. The development team needs a mechanism to isolate failed messages for analysis and then reprocess them after fixing the underlying issue. What is the most operationally efficient approach?
    1. Implement application logic to move failed messages to a separate standard queue
    2. Delete failed messages and log them to CloudWatch for later replay
    3. Configure a FIFO dead-letter queue with a redrive policy, then use DLQ redrive to move messages back to the source queue after fixing the issue
    4. Use a Lambda function to periodically check and reprocess failed messages
  6. A solutions architect is designing a system that processes messages from an SQS FIFO queue using AWS Lambda. The system needs to handle partial failures within a batch without blocking the entire message group. Which approach should the architect implement?
    1. Configure the Lambda function with a batch size of 1 to process messages individually
    2. Enable ReportBatchItemFailures in the Lambda event source mapping and implement partial batch response handling in the function code
    3. Use a standard queue instead of FIFO to avoid message group blocking
    4. Set a very short visibility timeout to quickly retry failed messages

References

Amazon SQS Features – Visibility, DLQ & Batching

Amazon SQS Features

  • Visibility timeout defines the period where SQS blocks the visibility of the message and prevents other consuming components from receiving and processing that message.
  • Dead-letter queues – DLQ helps source queues (Standard and FIFO) target messages that can’t be processed (consumed) successfully.
  • DLQ Redrive policy specifies the source queue, the dead-letter queue, and the conditions under which messages are moved from the former to the latter if the consumer of the source queue fails to process a message a specified number of times.
  • DLQ Redrive APIs (StartMessageMoveTask, CancelMessageMoveTask, ListMessageMoveTasks) allow programmatic management of dead-letter queue redrive, enabling messages to be moved from DLQ back to the original source queue or to a custom destination queue.
  • Short and Long polling control how the queues would be polled and Long polling help reduce empty responses.
  • Fair Queues automatically mitigate noisy-neighbor impact in multi-tenant standard queues by prioritizing message delivery for quieter tenants when one tenant creates a backlog.

Queue and Message Identifiers

Queue URLs

  • Queue is identified by a unique queue name within the same AWS account
  • Each queue is assigned with a Queue URL identifier for e.g. http://sqs.us-east-1.amazonaws.com/123456789012/queue2
  • Queue URL is needed to perform any operation on the Queue.

Message ID

  • Message IDs are useful for identifying messages
  • Each message receives a system-assigned message ID that is returned with the SendMessage response.
  • To delete a message, the message’s receipt handle instead of the message ID is needed
  • Message ID can be of is 100 characters max

Receipt Handle

  • When a message is received from a queue, a receipt handle is returned with the message which is associated with the act of receiving the message rather than the message itself.
  • Receipt handle is required, not the message id, to delete a message or to change the message visibility.
  • If a message is received more than once, each time it is received, a different receipt handle is assigned and the latest should be used always.

Message Deduplication ID

  • Message Deduplication ID is used for the deduplication of sent messages.
  • Message Deduplication ID is applicable for FIFO queues.
  • If a message with a particular message deduplication ID is sent successfully, any messages sent with the same message deduplication ID are accepted successfully but aren’t delivered during the 5-minute deduplication interval.

Message Group ID

  • Message Group ID specifies that a message belongs to a specific message group.
  • Message Group ID is applicable for FIFO queues.
  • Messages that belong to the same message group are always processed one by one, in a strict order relative to the message group.
  • However, messages that belong to different message groups might be processed out of order.
  • For Standard queues with Fair Queues enabled, MessageGroupId is used only as a tenant identifier for fair queuing and does not enforce message ordering.

Visibility timeout

Screen Shot 2016-05-05 at 8.17.04 AM.png

  • SQS does not delete the message once it is received by a consumer, because the system is distributed, there’s no guarantee that the consumer will actually receive the message (it’s possible the connection could break or the component could fail before receiving the message)
  • The consumer should explicitly delete the message from the Queue once it is received and successfully processed.
  • As the message is still available in the Queue, other consumers would be able to receive and process and this needs to be prevented.
  • SQS handles the above behavior using Visibility timeout.
  • SQS blocks the visibility of the message for the Visibility timeout period, which is the time during which SQS prevents other consuming components from receiving and processing that message.
  • Consumer should delete the message within the Visibility timeout. If the consumer fails to delete the message before the visibility timeout expires, the message is visible again to other consumers.
  • Once Visible the message is available for other consumers to consume and can lead to duplicate messages.
  • Visibility timeout considerations
    • Clock starts ticking once SQS returns the message
    • should be large enough to take into account the processing time for each message
    • default Visibility timeout for each Queue is 30 seconds and can be changed at the Queue level
    • when receiving messages, a special visibility timeout for the returned messages can be set without changing the overall queue timeout using the receipt handle
    • can be extended by the consumer, using ChangeMessageVisibility , if the consumer thinks it won’t be able to process the message within the current visibility timeout period. SQS restarts the timeout period using the new value.
    • a message’s Visibility timeout extension applies only to that particular receipt of the message and does not affect the timeout for the queue or later receipts of the message
    • Maximum visibility timeout is 12 hours from the time SQS receives the ReceiveMessage request.
  • SQS has a 120,000 limit for the number of inflight messages per queue (both Standard and FIFO queues) i.e. messages received but not yet deleted and any further messages would receive an error after reaching the limit.

Message Lifecycle

Screen Shot 2016-05-05 at 8.16.39 AM.png

  1. Component 1 sends Message A to a queue, and the message is redundantly distributed across the SQS servers.
  2. When Component 2 is ready to process a message, it retrieves messages from the queue, and Message A is returned. While Message A is being processed, it remains in the queue but is not returned to subsequent receive requests for the duration of the visibility timeout.
  3. Component 2 deletes Message A from the queue to avoid the message being received and processed again once the visibility timeout expires.

SQS Dead Letter Queues – DLQ

  • SQS supports dead-letter queues (DLQ), which other queues (source queues – Standard and FIFO) can target for messages that can’t be processed (consumed) successfully.
  • Dead-letter queues are useful for debugging the application or messaging system because DLQ help isolates unconsumed messages to determine why their processing doesn’t succeed.
  • DLQ redrive policy
    • specifies the source queue, the dead-letter queue, and the conditions under which SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times.
    • specifies which source queues can access the dead-letter queue.
    • also helps move the messages back to the source queue.
  • DLQ Redrive APIs
    • StartMessageMoveTask – starts an asynchronous task to move messages from the DLQ to the original source queue or a custom destination queue.
    • CancelMessageMoveTask – cancels a message move task in progress.
    • ListMessageMoveTasks – lists the most recent message move tasks (up to 10) for a specific source queue.
    • Enables programmatic DLQ management via AWS SDK or CLI at scale.
    • FIFO queues also support DLQ redrive.
  • SQS does not create the dead-letter queue automatically. DLQ must first be created before being used.
  • DLQ for the source queue should be of the same type i.e. Dead-letter queue of a FIFO queue must also be a FIFO queue. Similarly, the dead-letter queue of a standard queue must also be a standard queue.
  • DLQ should be in the same account and region as the source queue.

SQS Dead Letter Queue - Redrive Policy

SQS Delay Queues

  • Delay queues help postpone the delivery of new messages to consumers for a number of seconds
  • Messages sent to the delay queue remain invisible to consumers for the duration of the delay period.
  • Minimum delay is 0 seconds (default) and the Maximum is 15 minutes.
  • Delay queues are similar to visibility timeouts as both features make messages unavailable to consumers for a specific period of time.
  • The difference between the two is that, for delay queues, a message is hidden when it is first added to the queue, whereas for visibility timeouts a message is hidden only after it is consumed from the queue.

SQS Fair Queues

  • Fair Queues is a feature of Amazon SQS standard queues that automatically mitigates noisy-neighbor impact in multi-tenant queues.
  • In multi-tenant systems, one tenant can become a “noisy neighbor” by sending a larger volume of messages or requiring longer processing time, creating a backlog that increases message dwell time for all other tenants.
  • Fair Queues detects noisy neighbors by monitoring message distribution among tenants during the in-flight state (messages received by consumers but not yet deleted).
  • When a tenant has a disproportionately large number of in-flight messages, SQS prioritizes message delivery for other (quieter) tenants, reducing dwell time impact.
  • To enable Fair Queues, message producers set a MessageGroupId on outgoing messages as a tenant identifier.
  • MessageGroupId on standard queues with Fair Queues does NOT enforce message ordering (unlike FIFO queues) — it is used only as a tenant identifier.
  • Fair Queues does not limit the consumption rate per tenant — it allows consumers to receive messages from noisy tenants when there is spare consumer capacity.
  • No changes required in consumer code, no impact on API latency, and no throughput limitations.
  • Supports virtually unlimited throughput and unlimited number of tenants.
  • Provides additional CloudWatch metrics:
    • ApproximateNumberOfMessagesVisibleInQuietGroups – backlog for non-noisy tenants
    • ApproximateAgeOfOldestMessageInQuietGroups – oldest message age for quiet groups
  • Best suited for high-throughput multi-tenant queues where dwell time is a quality-of-service metric.

Learn More: Amazon SQS Fair Queues Documentation

Short and Long polling

SQS provides short polling and long polling to receive messages from a queue.

Short Polling

  • ReceiveMessage request queries only a subset of the servers (based on a weighted random distribution) to find messages that are available to include in the response.
  • SQS sends the response right away, even if the query found no messages.
  • By default, queues use short polling.

Long Polling

  • ReceiveMessage request queries all of the servers for messages.
  • SQS sends a response after it collects at least one available message, up to the maximum number of messages specified in the request.
  • SQS sends an empty response only if the polling wait time expires.
  • Wait time greater than 0 triggers long polling with a max of 20 secs.
  • Long polling helps
    • reduce the cost of using SQS by eliminating the number of empty responses (when there are no messages available for a ReceiveMessage request)
    • reduce false empty responses (when messages are available but aren’t included in a response).
    • Return messages as soon as they become available.

SQS Message Size

  • Maximum message payload size is 1 MiB (1,048,576 bytes) for both Standard and FIFO queues. (Increased from 256 KB in August 2025)
  • For messages larger than 1 MiB, use the Amazon SQS Extended Client Library to store the message payload in Amazon S3 and send a reference pointer through SQS.
  • Each message can have up to 10 message attributes (metadata).
  • Message retention period: minimum 60 seconds, default 4 days, maximum 14 days.

SQS Server-Side Encryption (SSE)

  • All SQS queues are encrypted by default using SQS-owned encryption keys (SSE-SQS).
  • SSE-SQS requires no configuration and encrypts all messages at rest at no additional cost.
  • Optionally, queues can be configured with AWS KMS-managed keys (SSE-KMS) for customer-managed encryption keys with more granular access control.
  • With SSE-KMS, only kms:GenerateDataKey permission is needed for SendMessage (kms:Decrypt is no longer required for sending). kms:Decrypt is still required for ReceiveMessage.

SQS FIFO High Throughput

  • FIFO queues by default support 300 transactions per second (TPS) per API action (SendMessage, ReceiveMessage, DeleteMessage).
  • High throughput mode for FIFO queues supports up to 70,000 TPS per API action without batching, and up to 700,000 messages per second with batching in select regions (US East N. Virginia, US West Oregon, Europe Ireland).
  • High throughput mode can be enabled via the Amazon SQS console by setting FifoThroughputLimit to perMessageGroupId and DeduplicationScope to messageGroup.
  • Messages should be distributed across multiple message groups to take advantage of high throughput.

SQS Integration with AWS Lambda

  • SQS can trigger AWS Lambda functions via event source mappings (ESM).
  • Lambda supports both Standard and FIFO queue triggers.
  • Provisioned Mode for SQS ESM (November 2025): Allocates dedicated event polling resources with configurable minimum and maximum limits.
    • Provides 3x faster scaling compared to standard mode.
    • Supports up to 20,000 concurrency (16x higher capacity).
    • Ideal for handling sudden traffic spikes with lower latency processing.
  • The Lambda function and the SQS queue must be in the same AWS Region (but can be in different AWS accounts).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. How does Amazon SQS allow multiple readers to access the same message queue without losing messages or processing them many times?
    1. By identifying a user by his unique id
    2. By using unique cryptography
    3. Amazon SQS queue has a configurable visibility timeout
    4. Multiple readers can’t access the same message queue
  2. If a message is retrieved from a queue in Amazon SQS, how long is the message inaccessible to other users by default?
    1. 0 seconds
    2. 1 hour
    3. 1 day
    4. forever
    5. 30 seconds
  3. When a Simple Queue Service message triggers a task that takes 5 minutes to complete, which process below will result in successful processing of the message and remove it from the queue while minimizing the chances of duplicate processing?
    1. Retrieve the message with an increased visibility timeout, process the message, delete the message from the queue
    2. Retrieve the message with an increased visibility timeout, delete the message from the queue, process the message
    3. Retrieve the message with increased DelaySeconds, process the message, delete the message from the queue
    4. Retrieve the message with increased DelaySeconds, delete the message from the queue, process the message
  4. You need to process long-running jobs once and only once. How might you do this?
    1. Use an SNS queue and set the visibility timeout to long enough for jobs to process.
    2. Use an SQS queue and set the reprocessing timeout to long enough for jobs to process.
    3. Use an SQS queue and set the visibility timeout to long enough for jobs to process.
    4. Use an SNS queue and set the reprocessing timeout to long enough for jobs to process.
  5. You are getting a lot of empty receive requests when using Amazon SQS. This is making a lot of unnecessary network load on your instances. What can you do to reduce this load?
    1. Subscribe your queue to an SNS topic instead.
    2. Use as long of a poll as possible, instead of short polls.
    3. Alter your visibility timeout to be shorter.
    4. Use sqsd on your EC2 instances.
  6. Company B provides an online image recognition service and utilizes SQS to decouple system components for scalability. The SQS consumers poll the imaging queue as often as possible to keep end-to-end throughput as high as possible. However, Company B is realizing that polling in tight loops is burning CPU cycles and increasing costs with empty responses. How can Company B reduce the number of empty responses?
    1. Set the imaging queue visibility Timeout attribute to 20 seconds
    2. Set the Imaging queue ReceiveMessageWaitTimeSeconds attribute to 20 seconds (Long polling. Refer link)
    3. Set the imaging queue MessageRetentionPeriod attribute to 20 seconds
    4. Set the DelaySeconds parameter of a message to 20 seconds
  7. A multi-tenant SaaS application uses a single SQS standard queue shared across all customers. During peak hours, one large customer floods the queue with messages, causing increased dwell time for all other customers. Which SQS feature helps mitigate this noisy neighbor problem?
    1. Enable FIFO queue with message group IDs
    2. Configure visibility timeout to a lower value
    3. Enable Fair Queues by setting MessageGroupId as tenant identifier on standard queue
    4. Create separate DLQs for each customer
  8. A development team needs to programmatically move messages from a dead-letter queue back to the original source queue for reprocessing. Which API action should they use?
    1. SendMessage with the source queue URL
    2. ChangeMessageVisibility on DLQ messages
    3. StartMessageMoveTask
    4. PurgeQueue followed by republishing messages
  9. An application using Amazon SQS FIFO queues needs to process a high volume of ordered messages. What is the maximum throughput achievable with FIFO high throughput mode without batching?
    1. 300 TPS per API action
    2. 3,000 TPS per API action
    3. 18,000 TPS per API action
    4. 70,000 TPS per API action
  10. What is the maximum message payload size supported by Amazon SQS?
    1. 64 KB
    2. 256 KB
    3. 1 MiB (1,048,576 bytes)
    4. 2 MiB

AWS SQS – Simple Queue Service Overview

AWS Simple Queue Service – SQS

  • Simple Queue Service – SQS is a highly available distributed queue system
  • A queue is a temporary repository for messages awaiting processing and acts as a buffer between the component producer and the consumer
  • is a message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components.
  • is fully managed and requires no administrative overhead and little configuration
  • offers a reliable, highly-scalable, hosted queue for storing messages in transit between applications.
  • provides fault-tolerant, loosely coupled, flexibility of distributed components of applications to send & receive without requiring each component to be concurrently available
  • helps build distributed applications with decoupled components
  • supports encryption at rest (SSE-SQS enabled by default since Oct 2022) and encryption in transit using the HTTP over SSL (HTTPS) and Transport Layer Security (TLS) protocols for security.
  • supports a maximum message payload size of 1 MB (increased from 256 KB in January 2026). For payloads up to 2 GB, use the Extended Client Library with Amazon S3.
  • provides two types of Queues

SQS Standard Queue

  • Standard queues are the default queue type.
  • Standard queues support at-least-once message delivery. However, occasionally (because of the highly distributed architecture that allows nearly unlimited throughput), more than one copy of a message might be delivered out of order.
  • Standard queues support a nearly unlimited number of API calls per second, per API action (SendMessage, ReceiveMessage, or DeleteMessage).
  • Standard queues provide best-effort ordering which ensures that messages are generally delivered in the same order as they’re sent.

Refer SQS Standard Queue for detailed information

SQS FIFO Queue

  • FIFO (First-In-First-Out) queues provide messages in order and exactly once delivery.
  • FIFO queues have all the capabilities of the standard queues but are designed to enhance messaging between applications when the order of operations and events is critical, or where duplicates can’t be tolerated.
  • FIFO queues support High Throughput mode with up to 70,000 transactions per second (TPS) per API action without batching (and up to 700,000 messages per second with batching) in select regions.

Refer SQS FIFO Queue for detailed information

SQS Standard Queues vs SQS FIFO Queues

SQS Standard vs FIFO Queues

SQS Fair Queues (New – July 2025)

  • SQS Fair Queues is a feature for standard queues that mitigates the noisy neighbor impact in multi-tenant systems.
  • Fair queues automatically reorder messages when a single tenant causes a backlog, prioritizing message delivery for other tenants.
  • Helps maintain consistent dwell time (time a message spends in queue between being sent and received) across all tenants.
  • Works transparently without requiring changes to existing message processing logic.
  • Supported by Amazon SNS standard topics and Amazon EventBridge as targets.
  • Ideal for SaaS and multi-tenant architectures where tenant isolation at the messaging layer is important.

SQS Use Cases

  • Work Queues
    • Decouple components of a distributed application that may not all process the same amount of work simultaneously.
  • Buffer and Batch Operations
    • Add scalability and reliability to the architecture and smooth out temporary volume spikes without losing messages or increasing latency
  • Request Offloading
    • Move slow operations off of interactive request paths by enqueueing the request.
  • Fan-out
    • Combine SQS with SNS to send identical copies of a message to multiple queues in parallel for simultaneous processing.
  • Auto Scaling
    • SQS queues can be used to determine the load on an application, and combined with Auto Scaling, the EC2 instances can be scaled in or out, depending on the volume of traffic
  • Event-Driven Architectures
    • Use SQS with EventBridge Pipes, Lambda event source mappings, or Step Functions for serverless event-driven processing pipelines.

How SQS Queues Works

  • SQS allows queues to be created, deleted and messages can be sent and received from it
  • SQS queue retains messages for four days, by default.
  • Queues can be configured to retain messages for 1 minute to 14 days after the message has been sent.
  • SQS can delete a queue without notification if any action hasn’t been performed on it for 30 consecutive days.
  • SQS allows the deletion of the queue with messages in it

SQS Features & Capabilities

  • Visibility timeout defines the period where SQS blocks the visibility of the message and prevents other consuming components from receiving and processing that message.
  • SQS Dead-letter queues – DLQ helps source queues (Standard and FIFO) target messages that can’t be processed (consumed) successfully.
  • DLQ Redrive policy specifies the source queue, the dead-letter queue, and the conditions under which SQS moves messages from the former to the latter if the consumer of the source queue fails to process a message a specified number of times.
  • DLQ Redrive to Source – SQS supports programmatic dead-letter queue redrive via APIs (StartMessageMoveTask, ListMessageMoveTasks, CancelMessageMoveTask) allowing you to move messages from DLQ back to the original source queue or a custom destination queue.
  • SQS Short and Long polling control how the queues would be polled and Long polling help reduce empty responses.

SQS Integration with AWS Lambda

  • SQS can trigger AWS Lambda functions using event source mappings (ESM).
  • Lambda automatically polls the SQS queue, retrieves messages in batches, and invokes the Lambda function.
  • Provisioned Mode for SQS ESM (November 2025) – Allows dedicated polling resources for the SQS event source mapping:
    • Provides 3x faster scaling and up to 16x higher capacity (up to 20,000 concurrency).
    • You define minimum and maximum limits for provisioned event pollers.
    • Ideal for handling sudden traffic spikes and high-throughput workloads.
  • Lambda function and SQS queue must be in the same AWS Region (can be in different accounts).
  • Supports both Standard and FIFO queues as triggers.

SQS Integration with EventBridge Pipes

  • Amazon EventBridge Pipes supports SQS (Standard and FIFO) as a source for point-to-point integrations.
  • Pipes poll the SQS queue and deliver messages to configured targets with optional filtering, enrichment, and transformation.
  • Can be configured directly from the SQS console via “Connect SQS queue to pipe” button.
  • Eliminates the need for custom polling code or Lambda functions for simple integrations.

SQS Buffered Asynchronous Client

  • Amazon SQS Buffered Async Client for Java provides an implementation of the AmazonSQSAsyncClient interface and adds several important features:
    • Automatic batching of multiple SendMessage, DeleteMessage, or ChangeMessageVisibility requests without any required changes to the application
    • Prefetching of messages into a local buffer that allows the application to immediately process messages from SQS without waiting for the messages to be retrieved
  • Working together, automatic batching and prefetching increase the throughput and reduce the latency of the application while reducing the costs by making fewer SQS requests.

SQS Security and Reliability

  • SQS stores all message queues and messages within a single, highly-available AWS region with multiple redundant Availability Zones (AZs)
  • SQS supports HTTP over SSL (HTTPS) and Transport Layer Security (TLS) protocols.
  • SQS supports Encryption at Rest with two options:
    • SSE-SQS (SQS-managed encryption keys) – Enabled by default for all new queues created via HTTPS/TLS endpoints since October 2022. No additional cost.
    • SSE-KMS (AWS KMS customer-managed keys) – For customers needing to manage their own encryption keys with fine-grained control.
  • SQS supports dual-stack (IPv4 and IPv6) endpoints (April 2025), allowing queues to be accessed via both IP protocols.
  • SQS supports resource-based permissions and Attribute-Based Access Control (ABAC) using queue tags for flexible and scalable access permissions.
  • SQSUnlockQueuePolicy – AWS-managed policy to unlock a queue and remove a misconfigured queue policy that denies all principals access (November 2024).
  • SQS supports CloudTrail integration for all APIs including data plane events (SendMessage, ReceiveMessage, DeleteMessage) for comprehensive audit logging (January 2025).

SQS Design Patterns

Priority Queue Pattern

SQS Priority Queue Pattern

  1. Use SQS to prepare multiple queues for the individual priority levels.
  2. Place those processes to be executed immediately (job requests) in the high priority queue.
  3. Prepare numbers of batch servers, for processing the job requests of the queues, depending on the priority levels.
  4. Queues have a message “Delayed Send” function, which can be used to delay the time for starting a process.

SQS Job Observer Pattern

Job Observer Pattern - SQS + CloudWatch + Auto Scaling

  1. Enqueue job requests as SQS messages.
  2. Have the batch server dequeue and process messages from SQS.
  3. Set up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with CloudWatch, as the trigger to do so.

SQS vs Kinesis Data Streams

Kinesis Data Streams vs SQS

SQS Recent Updates (2024-2026)

  • January 2026 – Maximum message payload size increased from 256 KB to 1 MB for all SQS queues (Standard and FIFO). Also applies to Lambda async invocations and EventBridge.
  • November 2025 – Lambda Provisioned Mode for SQS ESM with 3x faster scaling and 16x higher concurrency.
  • July 2025 – Fair Queues for multi-tenant standard queues to mitigate noisy neighbor issues.
  • April 2025 – Dual-stack (IPv4/IPv6) endpoint support.
  • January 2025 – CloudTrail integration for all SQS APIs (including data plane events).
  • November 2024 – SQSUnlockQueuePolicy managed policy for recovering locked queues.
  • July 2024 – kms:Decrypt permission no longer required for SendMessage API; only kms:GenerateDataKey needed.
  • July 2024 – New FIFO metrics: NumberOfDeduplicatedSentMessages and ApproximateNumberOfGroupsWithInflightMessages.
  • November 2023 – FIFO High Throughput increased to 70,000 TPS per API action in select regions.
  • June 2023 – DLQ Redrive APIs (StartMessageMoveTask, ListMessageMoveTasks, CancelMessageMoveTask).

AWS Certification Exam Practice Questions

  • Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
  • AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
  • AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
  • Open to further feedback, discussion and correction.
  1. Which AWS service can help design architecture to persist in-flight transactions?
    1. Elastic IP Address
    2. SQS
    3. Amazon CloudWatch
    4. Amazon ElastiCache
  2. A company has a workflow that sends video files from their on-premise system to AWS for transcoding. They use EC2 worker instances that pull transcoding jobs from SQS. Why is SQS an appropriate service for this scenario?
    1. SQS guarantees the order of the messages.
    2. SQS synchronously provides transcoding output.
    3. SQS checks the health of the worker instances.
    4. SQS helps to facilitate horizontal scaling of encoding tasks
  3. Which statement best describes an Amazon SQS use case?
    1. Automate the process of sending an email notification to administrators when the CPU utilization reaches 70% on production servers (Amazon EC2 instances) (CloudWatch + SNS + SES)
    2. Create a video transcoding website where multiple components need to communicate with each other, but can’t all process the same amount of work simultaneously (SQS provides loose coupling)
    3. Coordinate work across distributed web services to process employee’s expense reports (SWF or Step Functions – Steps in order and might need manual steps)
    4. Distribute static web content to end users with low latency across multiple countries (CloudFront + S3)
  4. Your application provides data transformation services. Files containing data to be transformed are first uploaded to Amazon S3 and then transformed by a fleet of spot EC2 instances. Files submitted by your premium customers must be transformed with the highest priority. How should you implement such a system?
    1. Use a DynamoDB table with an attribute defining the priority level. Transformation instances will scan the table for tasks, sorting the results by priority level.
    2. Use Route 53 latency based-routing to send high priority tasks to the closest transformation instances.
    3. Use two SQS queues, one for high priority messages, and the other for default priority. Transformation instances first poll the high priority queue; if there is no message, they poll the default priority queue
    4. Use a single SQS queue. Each message contains the priority level. Transformation instances poll high-priority messages first.
  5. Your company plans to host a large donation website on Amazon Web Services (AWS). You anticipate a large and undetermined amount of traffic that will create many database writes. To be certain that you do not drop any writes to a database hosted on AWS. Which service should you use?
    1. Amazon RDS with provisioned IOPS up to the anticipated peak write throughput.
    2. Amazon Simple Queue Service (SQS) for capturing the writes and draining the queue to write to the database
    3. Amazon ElastiCache to store the writes until the writes are committed to the database.
    4. Amazon DynamoDB with provisioned write throughput up to the anticipated peak write throughput.
  6. A customer has a 10 GB AWS Direct Connect connection to an AWS region where they have a web application hosted on Amazon Elastic Computer Cloud (EC2). The application has dependencies on an on-premises mainframe database that uses a BASE (Basic Available, Soft state, Eventual consistency) rather than an ACID (Atomicity, Consistency, Isolation, Durability) consistency model. The application is exhibiting undesirable behavior because the database is not able to handle the volume of writes. How can you reduce the load on your on-premises database resources in the most cost-effective way?
    1. Use an Amazon Elastic Map Reduce (EMR) S3DistCp as a synchronization mechanism between the onpremises database and a Hadoop cluster on AWS.
    2. Modify the application to write to an Amazon SQS queue and develop a worker process to flush the queue to the on-premises database
    3. Modify the application to use DynamoDB to feed an EMR cluster which uses a map function to write to the on-premises database.
    4. Provision an RDS read-replica database on AWS to handle the writes and synchronize the two databases using Data Pipeline.
  7. An organization has created a Queue named “modularqueue” with SQS. The organization is not performing any operations such as SendMessage, ReceiveMessage, DeleteMessage, GetQueueAttributes, SetQueueAttributes, AddPermission, and RemovePermission on the queue. What can happen in this scenario?
    1. AWS SQS sends notification after 15 days for inactivity on queue
    2. AWS SQS can delete queue after 30 days without notification
    3. AWS SQS marks queue inactive after 30 days
    4. AWS SQS notifies the user after 2 weeks and deletes the queue after 3 weeks.
  8. A user is using the AWS SQS to decouple the services. Which of the below mentioned operations is not supported by SQS?
    1. SendMessageBatch
    2. DeleteMessageBatch
    3. CreateQueue
    4. DeleteMessageQueue
  9. A user has created a queue named “awsmodule” with SQS. One of the consumers of queue is down for 3 days and then becomes available. Will that component receive message from queue?
    1. Yes, since SQS by default stores message for 4 days
    2. No, since SQS by default stores message for 1 day only
    3. No, since SQS sends message to consumers who are available that time
    4. Yes, since SQS will not delete message until it is delivered to all consumers
  10. A user has created a queue named “queue2” in US-East region with AWS SQS. The user’s AWS account ID is 123456789012. If the user wants to perform some action on this queue, which of the below Queue URL should he use?
    1. http://sqs.us-east-1.amazonaws.com/123456789012/queue2
    2. http://sqs.amazonaws.com/123456789012/queue2
    3. http://sqs. 123456789012.us-east-1.amazonaws.com/queue2
    4. http://123456789012.sqs.us-east-1.amazonaws.com/queue2
  11. A user has created a queue named “myqueue” with SQS. There are four messages published to queue, which are not received by the consumer yet. If the user tries to delete the queue, what will happen?
    1. A user can never delete a queue manually. AWS deletes it after 30 days of inactivity on queue
    2. It will delete the queue
    3. It will initiate the delete but wait for four days before deleting until all messages are deleted automatically.
    4. It will ask user to delete the messages first
  12. A user has developed an application, which is required to send the data to a NoSQL database. The user wants to decouple the data sending such that the application keeps processing and sending data but does not wait for an acknowledgement of DB. Which of the below mentioned applications helps in this scenario?
    1. AWS Simple Notification Service
    2. AWS Simple Workflow
    3. AWS Simple Queue Service
    4. AWS Simple Query Service
  13. You are building an online store on AWS that uses SQS to process your customer orders. Your backend system needs those messages in the same sequence the customer orders have been put in. How can you achieve that?
    1. It is not possible to do this with SQS
    2. You can use sequencing information on each message (Note: With FIFO queues now available, using a FIFO queue is the recommended approach for strict ordering)
    3. You can do this with SQS but you also need to use SWF
    4. Messages will arrive in the same order by default
  14. A user has created a photo editing software and hosted it on EC2. The software accepts requests from the user about the photo format and resolution and sends a message to S3 to enhance the picture accordingly. Which of the below mentioned AWS services will help make a scalable software with the AWS infrastructure in this scenario?
    1. AWS Glacier
    2. AWS Elastic Transcoder
    3. AWS Simple Notification Service
    4. AWS Simple Queue Service
  15. Refer to the architecture diagram of a batch processing solution using Simple Queue Service (SQS) to set up a message queue between EC2 instances, which are used as batch processors. Cloud Watch monitors the number of Job requests (queued messages) and an Auto Scaling group adds or deletes batch servers automatically based on parameters set in Cloud Watch alarms. You can use this architecture to implement which of the following features in a cost effective and efficient manner?
    1. Reduce the overall time for executing jobs through parallel processing by allowing a busy EC2 instance that receives a message to pass it to the next instance in a daisy-chain setup.
    2. Implement fault tolerance against EC2 instance failure since messages would remain in SQS and work can continue with recovery of EC2 instances implement fault tolerance against SQS failure by backing up messages to S3.
    3. Implement message passing between EC2 instances within a batch by exchanging messages through SQS.
    4. Coordinate number of EC2 instances with number of job requests automatically thus Improving cost effectiveness
    5. Handle high priority jobs before lower priority jobs by assigning a priority metadata field to SQS messages.
  16. How does Amazon SQS allow multiple readers to access the same message queue without losing messages or processing them many times?
    1. By identifying a user by his unique id
    2. By using unique cryptography
    3. Amazon SQS queue has a configurable visibility timeout
    4. Multiple readers can’t access the same message queue
  17. A user has created photo editing software and hosted it on EC2. The software accepts requests from the user about the photo format and resolution and sends a message to S3 to enhance the picture accordingly. Which of the below mentioned AWS services will help make a scalable software with the AWS infrastructure in this scenario?
    1. AWS Elastic Transcoder
    2. AWS Simple Notification Service
    3. AWS Simple Queue Service
    4. AWS Glacier
  18. How do you configure SQS to support longer message retention?
    1. Set the MessageRetentionPeriod attribute using the SetQueueAttributes method
    2. Using a Lambda function
    3. You can’t. It is set to 14 days and cannot be changed
    4. You need to request it from AWS
  19. A user has developed an application, which is required to send the data to a NoSQL database. The user wants to decouple the data sending such that the application keeps processing and sending data but does not wait for an acknowledgement of DB. Which of the below mentioned applications helps in this scenario?
    1. AWS Simple Notification Service
    2. AWS Simple Workflow
    3. AWS Simple Query Service
    4. AWS Simple Queue Service
  20. If a message is retrieved from a queue in Amazon SQS, how long is the message inaccessible to other users by default?
    1. 0 seconds
    2. 1 hour
    3. 1 day
    4. forever
    5. 30 seconds
  21. Which of the following statements about SQS is true?
    1. Messages will be delivered exactly once and messages will be delivered in First in, First out order
    2. Messages will be delivered exactly once and message delivery order is indeterminate
    3. Messages will be delivered one or more times and messages will be delivered in First in, First out order
    4. Messages will be delivered one or more times and message delivery order is indeterminate (This applies to Standard queues. FIFO queues provide exactly-once processing and strict ordering.)
  22. How long can you keep your Amazon SQS messages in Amazon SQS queues?
    1. From 120 secs up to 4 weeks
    2. From 10 secs up to 7 days
    3. From 60 secs up to 2 weeks
    4. From 30 secs up to 1 week
  23. When a Simple Queue Service message triggers a task that takes 5 minutes to complete, which process below will result in successful processing of the message and remove it from the queue while minimizing the chances of duplicate processing?
    1. Retrieve the message with an increased visibility timeout, process the message, delete the message from the queue
    2. Retrieve the message with an increased visibility timeout, delete the message from the queue, process the message
    3. Retrieve the message with increased DelaySeconds, process the message, delete the message from the queue
    4. Retrieve the message with increased DelaySeconds, delete the message from the queue, process the message
  24. You need to process long-running jobs once and only once. How might you do this?
    1. Use an SNS queue and set the visibility timeout to long enough for jobs to process.
    2. Use an SQS queue and set the reprocessing timeout to long enough for jobs to process.
    3. Use an SQS queue and set the visibility timeout to long enough for jobs to process.
    4. Use an SNS queue and set the reprocessing timeout to long enough for jobs to process.
  25. You are getting a lot of empty receive requests when using Amazon SQS. This is making a lot of unnecessary network load on your instances. What can you do to reduce this load?
    1. Subscribe your queue to an SNS topic instead.
    2. Use as long of a poll as possible, instead of short polls. (Refer link)
    3. Alter your visibility timeout to be shorter.
    4. Use <code>sqsd</code> on your EC2 instances.
  26. You have an asynchronous processing application using an Auto Scaling Group and an SQS Queue. The Auto Scaling Group scales according to the depth of the job queue. The completion velocity of the jobs has gone down, the Auto Scaling Group size has maxed out, but the inbound job velocity did not increase. What is a possible issue?
    1. Some of the new jobs coming in are malformed and unprocessable. (As other options would cause the job to stop processing completely, the only reasonable option seems that some of the recent messages must be malformed and unprocessable)
    2. The routing tables changed and none of the workers can process events anymore. (If changed, none of the jobs would be processed)
    3. Someone changed the IAM Role Policy on the instances in the worker group and broke permissions to access the queue. (If IAM role changed no jobs would be processed)
    4. The scaling metric is not functioning correctly. (scaling metric did work fine as the autoscaling caused the instances to increase)
  27. Company B provides an online image recognition service and utilizes SQS to decouple system components for scalability. The SQS consumers poll the imaging queue as often as possible to keep end-to-end throughput as high as possible. However, Company B is realizing that polling in tight loops is burning CPU cycles and increasing costs with empty responses. How can Company B reduce the number of empty responses?
    1. Set the imaging queue visibility Timeout attribute to 20 seconds
    2. Set the Imaging queue ReceiveMessageWaitTimeSeconds attribute to 20 seconds (Long polling. Refer link)
    3. Set the imaging queue MessageRetentionPeriod attribute to 20 seconds
    4. Set the DelaySeconds parameter of a message to 20 seconds
  28. A multi-tenant SaaS application uses a single SQS standard queue. During peak load from one large tenant, other tenants experience increased message processing latency. What SQS feature can help resolve this?
    1. Enable FIFO mode on the queue
    2. Increase the visibility timeout
    3. Enable SQS Fair Queues to mitigate noisy neighbor impact
    4. Create separate queues for each tenant
  29. An application needs to process messages larger than 256 KB but smaller than 1 MB from an SQS queue. What is the simplest approach as of 2026?
    1. Use the Extended Client Library to store messages in S3
    2. Send the message directly to SQS since the maximum message size is now 1 MB
    3. Compress the message before sending
    4. Split the message into multiple smaller messages
  30. A company wants to programmatically move failed messages from a dead-letter queue back to the original source queue for reprocessing. Which API should they use?
    1. RedriveMessage
    2. MoveMessage
    3. StartMessageMoveTask
    4. RetryMessage

References

AWS Storage Options – SQS & Redshift

SQS

  • is a fully managed message queuing service that provides a reliable, highly scalable, hosted queue for temporary storage and delivery of messages up to 1 MiB in size (increased from 256 KB in August 2025).
  • supports a virtually unlimited number of queues and supports two queue types:
    • Standard queues – unordered, at-least-once delivery with nearly unlimited throughput.
    • FIFO queues – exactly-once processing with strict message ordering, supporting up to 70,000 messages per second with high throughput mode.

Ideal Usage Patterns

  • is ideally suited to any scenario where multiple application components must communicate and coordinate their work in a loosely coupled manner particularly producer consumer scenarios.
  • can be used to coordinate a multi-step processing pipeline, where each message is associated with a task that must be processed.
  • enables the number of worker instances to scale up or down, and also enable the processing power of each single worker instance to scale up or down, to suit the total workload, without any application changes.
  • ideal for multi-tenant workloads using fair queues (launched July 2025) to mitigate noisy neighbor impact and ensure consistent processing across tenants.
  • supports event-driven architectures with AWS Lambda event source mapping, including provisioned mode for 3x faster scaling and 16x higher concurrency.

Anti-Patterns

  • Binary or Large Messages
    • SQS supports messages up to 1 MiB. If the application requires binary or messages exceeding this limit, it is best to use the Amazon SQS Extended Client Library with Amazon S3 to store the payload and SQS to store the pointer.
  • Long Term storage
    • SQS stores messages for max 14 days and if application requires storage period longer than 14 days, Amazon S3 or other storage options should be preferred.
  • High-speed message queuing or very short tasks
    • If the application requires a very high-speed message send and receive response from a single producer or consumer, use of Amazon DynamoDB or a message-queuing system hosted on Amazon EC2 may be more appropriate.

Performance

  • is a distributed queuing system that is optimized for horizontal scalability, not for single-threaded sending or receiving speeds.
  • Standard queues support nearly unlimited throughput (thousands of transactions per second per API action).
  • FIFO queues support up to 3,000 messages per second with batching by default, or up to 70,000 messages per second (700,000 with batching) in high throughput mode in select regions.
  • FIFO queues support up to 120,000 in-flight messages (increased from 20,000 in November 2024).
  • Higher receive performance can be achieved by requesting multiple messages (up to 10) in a single call.
  • Fair queues (July 2025) automatically reorder messages to maintain consistent dwell time across tenants, preventing noisy neighbors from impacting processing latency.

Durability & Availability

  • are highly durable but temporary.
  • stores all messages redundantly across multiple servers and data centers.
  • Message retention time is configurable on a per-queue basis, from a minimum of one minute to a maximum of 14 days.
  • Messages are retained in a queue until they are explicitly deleted, or until they are automatically deleted upon expiration of the retention time.
  • supports dead-letter queues (DLQ) for isolating messages that fail processing, with DLQ redrive capability to move messages back to the source queue or a custom destination for reprocessing.

Cost Model

  • pricing is based on
    • number of requests (per million requests)
    • the amount of data transferred out (priced per GB per month)
    • First 1 million requests per month are free (Free Tier)

Scalability & Elasticity

  • is both highly elastic and massively scalable.
  • is designed to enable a virtually unlimited number of computers to read and write a virtually unlimited number of messages at any time.
  • supports virtually unlimited numbers of queues and messages per queue for any user.
  • supports dual-stack (IPv4 and IPv6) endpoints for flexible network access.

Key Features (Recent Updates)

  • Message payload size increased to 1 MiB (August 2025) – supports larger messages for both standard and FIFO queues without needing the Extended Client Library.
  • Fair queues (July 2025) – automatically mitigates noisy neighbor impact in multi-tenant standard queues by reordering messages to maintain consistent dwell time across tenants.
  • FIFO high throughput mode – up to 70,000 TPS per API action (November 2023), enabling 700,000 messages per second with batching.
  • FIFO in-flight limit increase (November 2024) – increased from 20,000 to 120,000 in-flight messages per FIFO queue.
  • Lambda provisioned mode for SQS (January 2025) – dedicated polling resources providing 3x faster scaling and 16x higher concurrency for event source mapping.
  • Dead-letter queue redrive – move failed messages from DLQ back to source queue or a custom destination for both standard and FIFO queues.
  • Simplified KMS permissions – SendMessage no longer requires kms:Decrypt permission; only kms:GenerateDataKey is needed.
  • Temporary queues – application-managed virtual queues for request-response patterns that reduce cost and development time.

Amazon Redshift

  • is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all data using existing business intelligence tools.
  • is optimized for datasets that range from a few hundred gigabytes to a petabyte or more.
  • manages the work needed to set up, operate, and scale a data warehouse, from provisioning the infrastructure capacity to automating ongoing administrative tasks such as backups and patching.
  • offers two deployment models: Provisioned clusters (RA3 and new RG instances) and Redshift Serverless (pay-per-use with automatic scaling).
⚠️ Important: Amazon Redshift DC2 instances reached End of Life on April 24, 2026. New DC2 clusters cannot be created since May 15, 2025. Migrate to RA3 instances, RG instances (Graviton-powered, GA May 2026), or Redshift Serverless. DS2 instances were previously deprecated in favor of RA3.

Ideal Usage Pattern

  • is ideal for analyzing large datasets using existing business intelligence tools.
  • Common use cases include
    • Analyze global sales data for multiple products
    • Store historical stock trade data
    • Analyze ad impressions and clicks
    • Aggregate gaming data
    • Analyze social trends
    • Measure clinical quality, operation efficiency, and financial performance in the health care space
    • Near real-time analytics using zero-ETL integrations from Aurora, DynamoDB, RDS, and SaaS applications
    • Data lakehouse analytics querying data in S3 data lakes using Redshift Spectrum
    • Generative AI applications using Amazon Bedrock integration for sentiment analysis, text generation, and summarization directly on warehouse data

Anti-Pattern

  • OLTP workloads
    • Redshift is a column-oriented database and more suited for data warehousing and analytics. If application involves online transaction processing, Amazon RDS or Aurora would be a better choice.
  • Blob data
    • For Blob storage, Amazon S3 would be a better choice with metadata in other storage as RDS or DynamoDB.

Performance

  • Amazon Redshift allows very high query performance on datasets ranging in size from hundreds of gigabytes to a petabyte or more.
  • It uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries.
  • It has a massively parallel processing (MPP) architecture that parallelizes and distributes SQL operations to take advantage of all available resources.
  • Underlying hardware is designed for high performance data processing that uses local attached storage to maximize throughput.
  • New RG instances (GA May 2026) powered by AWS Graviton deliver up to 2.4x faster performance than RA3 at 30% lower price per vCPU.
  • AI-driven scaling and optimization in Redshift Serverless automatically provisions and scales capacity for demanding workloads.
  • Query performance improvements (March 2026) speed up new queries in BI dashboards and ETL workloads by up to 7x.
  • Concurrency scaling automatically adds additional cluster capacity to handle burst read and write workloads, with support for data ingestion (COPY queries in Parquet/ORC from S3).

Durability & Availability

  • Amazon Redshift stores three copies of your data—all data written to a node in your cluster is automatically replicated to other nodes within the cluster, and all data is continuously backed up to Amazon S3.
  • Snapshots are automated, incremental, and continuous and stored for a user-defined period (1-35 days).
  • Manual snapshots can be created and are retained until explicitly deleted.
  • Amazon Redshift continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.
  • Multi-AZ deployments (GA for RA3 clusters) run your data warehouse across two Availability Zones simultaneously, providing continued operation during AZ failure scenarios.

Cost Model

  • Provisioned clusters pricing:
    • Compute node hours – total hours run across all compute nodes (RA3 or RG instances)
    • Redshift Managed Storage (RMS) – billed per GB/month, separate from compute (RA3/RG only)
    • Backup storage – for automated and manual snapshots beyond the free tier
    • Data transfer – standard AWS data transfer charges apply
    • Concurrency scaling – free for 1 hour per day per cluster, then per-second billing
    • Spectrum – per TB of data scanned in S3
  • Redshift Serverless pricing:
    • Compute – per RPU-hour (Redshift Processing Unit), billed per second with no charge when idle
    • Storage – per GB/month for managed storage
  • Reserved Instance pricing available for provisioned clusters (1-year or 3-year terms) for significant discounts.

Scalability & Elasticity

  • Provisioned clusters – Elastic resize allows adding or removing nodes within minutes. Classic resize available for node type changes.
  • Redshift Serverless – automatically scales compute capacity up and down based on workload demands with no cluster management required.
  • Data sharing allows securely sharing live, transactionally consistent data across Redshift clusters (cross-account, cross-region) without copying data.
  • Multi-warehouse writes through data sharing (GA November 2024) enable using different warehouses of different types and sizes for ETL workloads.

Key Features (Recent Updates)

  • RG Instances (GA May 2026) – New Graviton-powered instance family delivering 2.4x faster performance than RA3 at 30% lower price per vCPU.
  • DC2 End of Life (April 24, 2026) – Migrate to RA3, RG, or Serverless. New DC2 cluster creation blocked since May 15, 2025.
  • Redshift Serverless – Pay-per-use model with automatic scaling, AI-driven optimization, and per-second billing with no charge when idle.
  • Zero-ETL integrations – Near real-time data replication from Aurora, DynamoDB, RDS, and self-managed databases to Redshift without building ETL pipelines. Also supports SaaS sources (Salesforce, SAP, Zendesk).
  • Multi-AZ deployments – Run RA3 provisioned clusters across two Availability Zones for high availability.
  • Amazon Bedrock integration (October 2024) – Run generative AI tasks (text generation, sentiment analysis, summarization, classification) directly on Redshift data using foundation models via SQL.
  • Amazon Q generative SQL – Generate SQL from natural language prompts in the Redshift Query Editor.
  • Data sharing – Share live data across clusters, accounts, and regions without data movement. Supports multi-warehouse writes for ETL.
  • Redshift Spectrum – Query exabytes of data in S3 without loading it into Redshift, enabling data lakehouse architectures.
  • Concurrency scaling for ingestion (2026) – Automatically scales for COPY queries in Parquet/ORC formats from S3 during traffic spikes.
  • 7x query performance improvement (March 2026) – Faster response for BI dashboards, ETL pipelines, and near real-time analytics.