AWS Storage Options - SQS & Redshift

Table of Contents hide

SQS

Amazon Redshift

SQS

is a temporary data repository for messages and provides a reliable, highly scalable, hosted message queuing service for temporary storage and delivery of short (up to 256 KB) text-based data messages.

supports a virtually unlimited number of queues and supports unordered, at-least-once delivery of messages.

Ideal Usage patterns

is ideally suited to any scenario where multiple application components must communicate and coordinate their work in a loosely coupled manner particularly producer consumer scenarios

can be used to coordinate a multi-step processing pipeline, where each message is associated with a task that must be processed.
enables the number of worker instances to scale up or down, and also enable the processing power of each single worker instance to scale up or down, to suit the total workload, without any application changes.

Anti-Patterns

Binary or Large Messages
- SQS is suited for text messages with maximum size of 64 KB. If the application requires binary or messages exceeding the length, it is best to use Amazon S3 or RDS and use SQS to store the pointer
Long Term storage
- SQS stores messages for max 14 days and if application requires storage period longer than 14 days, Amazon S3 or other storage options should be preferred

High-speed message queuing or very short tasks
- If the application requires a very high-speed message send and receive response from a single producer or consumer, use of Amazon DynamoDB or a message-queuing system hosted on Amazon EC2 may be more appropriate.

Performance

is a distributed queuing system that is optimized for horizontal scalability, not for single-threaded sending or receiving speeds.

A single client can send or receive Amazon SQS messages at a rate of about 5 to 50 messages per second. Higher receive performance can be achieved by requesting multiple messages (up to 10) in a single call.

Durability & Availability

are highly durable but temporary.
stores all messages redundantly across multiple servers and data centers.

Message retention time is configurable on a per-queue basis, from a minimum of one minute to a maximum of 14 days.
Messages are retained in a queue until they are explicitly deleted, or until they are automatically deleted upon expiration of the retention time.

Cost Model

pricing is based on
- number of requests and
- the amount of data transferred in and out (priced per GB per month).

Scalability & Elasticity

is both highly elastic and massively scalable.

is designed to enable a virtually unlimited number of computers to read and write a virtually unlimited number of messages at any time.
supports virtually unlimited numbers of queues and messages per queue for any user.

Amazon Redshift

is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.

is optimized for datasets that range from a few hundred gigabytes to a petabyte or more.
manages the work needed to set up, operate, and scale a data warehouse, from provisioning the infrastructure capacity to automating ongoing administrative tasks such as backups and patching.

Ideal Usage Pattern

is ideal for analyzing large datasets using the existing business intelligence tools

Common use cases include
- Analyze global sales data for multiple products
- Store historical stock trade data
- Analyze ad impressions and clicks
- Aggregate gaming data
- Analyze social trends
- Measure clinical quality, operation efficiency, and financial
- performance in the health care space

Anti-Pattern

OLTP workloads
- Redshift is a column-oriented database and more suited for data warehousing and analytics. If application involves online transaction processing, Amazon RDS would be a better choice.
Blob data
- For Blob storage, Amazon S3 would be a better choice with metadata in other storage as RDS or DynamoDB

Performance

Amazon Redshift allows a very high query performance on datasets ranging in size from hundreds of gigabytes to a petabyte or more.
It uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries.
It has a massively parallel processing (MPP) architecture that parallelizes and distributes SQL operations to take advantage of all available resources.

Underlying hardware is designed for high performance data processing that uses local attached storage to maximize throughput.

Durability & Availability

Amazon Redshift stores three copies of your data—all data written to a node in your cluster is automatically replicated to other nodes within the cluster, and all data is continuously backed up to Amazon S3.
Snapshots are automated, incremental, and continuous and stored for a user-defined period (1-35 days)

Manual snapshots can be created and are retained until explicitly deleted.
Amazon Redshift also continuously monitors the health of the cluster and automatically re-replicates data from failed drives and replaces nodes as necessary.

Cost Model

has three pricing components:
- data warehouse node hours – total number of hours run across all the compute node
- backup storage – storage cost for automated and manual snapshots
- data transfer
  - There is no data transfer charge for data transferred to or from Amazon Redshift outside of Amazon VPC
  - Data transfer to or from Amazon Redshift in Amazon VPC accrues standard AWS data transfer charges.

Scalability & Elasticity

provides push button scaling and the number of nodes can be easily scaled in the data warehouse cluster as the demand changes.

Redshift places the existing cluster in the read only mode, so the existing queries can continue to run, while is provisions a new cluster with chosen size and copies the data to it. Once the data is copied, it automatically redirects queries to the new cluster

13 thoughts on “AWS Storage Options – SQS & Redshift”

Javi says:

August 31, 2016 at 5:50 pm

Thansk for the blog, it’s really cool. BTW if I remember it correctly SQS messages has a maximum size of 256KB, not 64.
1. jayendrapatil says:
  
  September 1, 2016 at 9:25 am
  
  Thanks Javi, yup its increased to 256. Updated the blog to reflect the same. AWS Storage Options whitepaper still referred to the 64kb, will keep for any updates.
  1. Puneet says:
    
    July 22, 2018 at 10:25 pm
    
    please update this details
    1. jayendrapatil says:
      
      July 23, 2018 at 7:05 pm
      
      any specific that is missing ? This is the storage options and is more generic. Individual pages have the updated details.
Ganesh Kumar NS says:

November 15, 2016 at 5:50 pm

Durability & Availability

“Message retention time is configurable on a per-queue basis, from a minimum of one hour to a maximum of 14 days.” But when i did the lab i noticed that you can have minimum of 1 minute.
1. jayendrapatil says:
  
  November 15, 2016 at 6:07 pm
  
  Thanks Ganesh, the retention period has been updated by AWS since the whitepaper was published. Correct the same.
Satish says:

November 28, 2016 at 6:06 pm

Hi Jayendra,

Redshift – Cost Model
data transfer
There is no data transfer charge for data transferred to or from Amazon Redshift *** outside of Amazon VPC***
Data transfer to or from Amazon Redshift ***in Amazon VPC*** accrues standard AWS data transfer charges.

I expected to incur charges for data transferred outside VPC and no charges for data transfers within VPC. The documentation may have been revised since you published the blog. I think the following statements may be better:

From https://aws.amazon.com/redshift/pricing/ :
There is no charge for data transferred between Amazon Redshift and Amazon S3 within the same AWS Region for backup, restore, load, and unload operations. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates. In particular, if you run your Amazon Redshift cluster in Amazon VPC, you will see standard AWS data transfer charges for data transfers over JDBC/ODBC to your Amazon Redshift cluster endpoint

Cheers,
Satish
Tariq says:

February 28, 2017 at 8:40 pm

Hi Jayendra:

I know this sounds abnormal but how do we read this blog? the categories do not have all the articles in them. Is it more relevant to just click previous/next?
1. jayendrapatil says:
  
  February 28, 2017 at 10:41 pm
  
  The articles are indeed spread across. If you are preparing for a certification, go the preparation guide and then you can navigate to the relevant topics
Tim says:

April 5, 2017 at 3:41 am

Couple of corrections, which may be as a result of AWS changes

1) Durability & Availability, copies stored.
Your information is out of date. See here: https://aws.amazon.com/redshift/faqs/

Q: How does Amazon Redshift back up my data?

Amazon Redshift replicates all your data within your data warehouse cluster when it is loaded and also continuously backs up your data to S3. Amazon Redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3). Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.

2) Cost model, data transfer
This doesn’t look correct. See information from here https://aws.amazon.com/redshift/pricing/

Data Transfer
There is no charge for data transferred between Amazon Redshift and Amazon S3 within the same AWS Region for backup, restore, load, and unload operations. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates. In particular, if you run your Amazon Redshift cluster in Amazon VPC, you will see standard AWS data transfer charges for data transfers over JDBC/ODBC to your Amazon Redshift cluster endpoint. In addition, when you use Enhanced VPC Routing and unload data to Amazon S3 in a different region, you will incur standard AWS data transfer charges. For more information about AWS data transfer rates, see the Amazon EC2 pricing page.
1. jayendrapatil says:
  
  April 5, 2017 at 9:00 am
  
  Thanks Tim, let me recheck the current documentation and correct the entries.
RahulB says:

February 11, 2020 at 9:51 pm

Thanks for wonderful blog Jayendra, SQS also have FIFO queque which I would suggest worth mentioning
1. jayendrapatil says:
  
  February 12, 2020 at 7:40 pm
  
  thanks for the feedback Rahul, will add the same.

Comments are closed.