Kinesis Data Streams vs SQS

Purpose
- Amazon Kinesis Data Streams
- allows real-time processing of streaming big data and the ability to read and replay records to multiple Amazon Kinesis Applications.
- Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications that read from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering).
- Amazon SQS
- offers a reliable, highly-scalable hosted queue for storing messages as they travel between applications or microservices.
- It moves data between distributed application components and helps decouple these components.
- provides common middleware constructs such as dead-letter queues and poison-pill management.
- provides a generic web services API and can be accessed by any programming language that the AWS SDK supports.
- supports both standard and FIFO queues
Scaling
- Kinesis Data Streams offers two capacity modes:
- Provisioned Mode: Requires manual provisioning and scaling by increasing shards
- On-Demand Mode (November 2021): Fully managed with automatic scaling – no manual shard management required. Default capacity of 4 MB/s write, scales up to 200 MB/s (or 1 GB/s with limit increase)
- SQS is fully managed, highly scalable and requires no administrative overhead and little configuration
- Standard Queue: Unlimited throughput, nearly unlimited transactions per second
- FIFO Queue: Default 300 TPS per API action, up to 3,000 TPS with high throughput mode (up to 70,000 TPS in select regions)
Ordering
- Kinesis provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Kinesis Applications
- SQS Standard Queue does not guarantee data ordering and provides at least once delivery of messages
- SQS FIFO Queue guarantees data ordering within the message group and exactly-once processing
Data Retention Period
- Kinesis Data Streams stores the data for up to 24 hours, by default, and can be extended to 365 days (8760 hours)
- SQS stores the message for up to 4 days, by default, and can be configured from 1 minute to 14 days but clears the message once deleted by the consumer
Delivery Semantics
- Kinesis and SQS Standard Queue both guarantee at least once delivery of the message.
- SQS FIFO Queue guarantees exactly once delivery and processing
Parallel Clients
- Kinesis supports multiple consumers reading from the same stream concurrently
- Standard (shared throughput): 2 MB/sec per shard shared across all consumers
- Enhanced fan-out: 2 MB/sec per shard per consumer (dedicated throughput)
- SQS allows the messages to be delivered to only one consumer at a time and requires multiple queues to deliver messages to multiple consumers
Use Cases
- Kinesis use cases requirements
- Ordering of records.
- Ability to consume records in the same order a few hours later
- Ability for multiple applications to consume the same stream concurrently
- Routing related records to the same record processor (as in streaming MapReduce)
- Real-time analytics and processing
- Data replay capability for reprocessing
- SQS uses cases requirements
- Messaging semantics like message-level ack/fail and visibility timeout
- Leveraging SQS’s ability to scale transparently
- Dynamically increasing concurrency/throughput at read time
- Individual message delay, which can be delayed
- Decoupling application components
- Simple message queuing without need for replay
Key Differences Summary
| Feature | Kinesis Data Streams | SQS |
|---|---|---|
| Purpose | Real-time streaming data processing | Message queuing and decoupling |
| Scaling | Provisioned or On-Demand (auto-scaling) | Fully managed (auto-scaling) |
| Ordering | Guaranteed per shard | Standard: No, FIFO: Yes (per message group) |
| Retention | 24 hours to 365 days | 1 minute to 14 days |
| Replay | ✅ Supported | ❌ Not supported |
| Multiple Consumers | ✅ Yes (concurrent) | ❌ No (one at a time) |
| Delivery Semantics | At least once | Standard: At least once, FIFO: Exactly once |
| Latency | ~70-200 ms | Single-digit milliseconds |
AWS Certification Exam Practice Questions
- Questions are collected from Internet and the answers are marked as per my knowledge and understanding (which might differ with yours).
- AWS services are updated everyday and both the answers and questions might be outdated soon, so research accordingly.
- AWS exam questions are not updated to keep up the pace with AWS updates, so even if the underlying feature has changed the question might not be updated
- Open to further feedback, discussion and correction.
- You are deploying an application to track GPS coordinates of delivery trucks in the United States. Coordinates are transmitted from each delivery truck once every three seconds. You need to design an architecture that will enable real-time processing of these coordinates from multiple consumers. Which service should you use to implement data ingestion?
- Amazon Kinesis
- AWS Data Pipeline
- Amazon AppStream
- Amazon Simple Queue Service
- Your customer is willing to consolidate their log streams (access logs, application logs, security logs etc.) in one single system. Once consolidated, the customer wants to analyze these logs in real time based on heuristics. From time to time, the customer needs to validate heuristics, which requires going back to data samples extracted from the last 12 hours? What is the best approach to meet your customer’s requirements?
- Send all the log events to Amazon SQS. Setup an Auto Scaling group of EC2 servers to consume the logs and apply the heuristics.
- Send all the log events to Amazon Kinesis develop a client process to apply heuristics on the logs (Can perform real time analysis and stores data for 24 hours which can be extended to 365 days)
- Configure Amazon CloudTrail to receive custom logs, use EMR to apply heuristics the logs (CloudTrail is only for auditing)
- Setup an Auto Scaling group of EC2 syslogd servers, store the logs on S3 use EMR to apply heuristics on the logs (EMR is for batch analysis)
- A company needs to process streaming data with multiple independent consumers that need to read the same data concurrently. Which service should they use?
- SQS Standard Queue
- SQS FIFO Queue
- Kinesis Data Streams
- Amazon SNS
- A company wants to decouple microservices and needs exactly-once message processing with ordering guarantees. Which service should they use?
- Kinesis Data Streams
- SQS Standard Queue
- SQS FIFO Queue
- Amazon SNS
- A company wants to avoid manual shard management for their streaming data workload. Which Kinesis capacity mode should they use? (Assume November 2021 or later)
- Provisioned mode with Auto Scaling
- On-Demand mode
- Enhanced fan-out mode
- Standard mode
Understanding difference between these two is really difficult specially the use cases.when to use one over other is always an architects tension.Thanks for explaining it nicely.
thanks Preeti, glad that it helped.
This is helpful. Thank you!
This is a rich find for those preparing for the exam.
You think you got it, then the questions.
If you could have hidden your answers, that would be a benefit.
Thank you in any case!
Thanks Christopher, i am trying to find a good plugin to change the Q & A into quiz, hope to make the change soon.