enables real-time processing of streaming data at massive scale
provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Kinesis applications
data is replicated across three data centers within a region and preserved for 24 hours, by default and can be extended to 7 days
streams can be scaled using multiple shards, based on the partition key, with each shard providing the capacity of 1MB/sec data input and 2MB/sec data output with 1000 PUT requests per second
Data encryption can be supported either using client side encryption before pushing the data to data streams or server side encryption.
Kinesis vs SQS
real-time processing of streaming big data vs reliable, highly scalable hosted queue for storing messages
ordered records, as well as the ability to read and/or replay records in the same order vs no guarantee on data ordering (with the standard queues before the FIFO queue feature was released)
data storage up to 24 hours, extended to 7 days vs up to 14 days, can be configured from 1 minute to 14 days but cleared if deleted by the consumer
supports multiple consumers vs single consumer at a time and requires multiple queues to deliver message to multiple consumers
Producer & Consumers
API PutRecord and PutRecords are synchronous, while KPL producer supports synchronous or asynchronous use cases
KCL uses a unique DynamoDB table to keep track of the application’s state, so if Kinesis Data Streams application receives provisioned-throughput exceptions, increase the provisioned throughput for the DynamoDB table
helps analyze streaming data, gain actionable insights, and respond to the business and customer needs in real time.
reduces the complexity of building, managing, and integrating streaming applications with other AWS service
Redshift is a fast, fully managed data warehouse
provides simple and cost-effective solution to analyze all the data using standard SQL and the existing Business Intelligence (BI) tools.
manages the work needed to set up, operate, and scale a data warehouse, from provisioning the infrastructure capacity to automating ongoing administrative tasks such as backups, and patching.
automatically monitors your nodes and drives to help you recover from failures.
only supports Single-AZ deployments.
replicates all the data within the data warehouse cluster when it is loaded and also continuously backs up your data to S3.
attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in S3).
supports cross-region snapshot replication to another region for disaster recovery
Redshift supports four distribution styles; AUTO, EVEN, KEY, or ALL.
KEY distribution uses a single column as distribution key (DISTKEY) and helps place matching values on the same node slice
Even distribution distributes the rows across the slices in a round-robin fashion, regardless of the values in any particular column
ALL distribution replicates whole table in every compute node.
AUTO distribution lets Redshift assigns an optimal distribution style based on the size of the table data
Redshift supports Compound and Interleaved sort keys
A compound key is made up of all of the columns listed in the sort key definition, in the order they are listed and is more efficient when query predicates use a prefix, or query’s filter applies conditions, such as filters and joins, which is a subset of the sort key columns in order.
An interleaved sort key gives equal weight to each column in the sort key, so query predicates can use any subset of the columns that make up the sort key, in any order.
Column encodings CANNOT be changed once created.
Redshift provides query queues for Workload Management, in order to manage concurrency and resource planning. It is a best practice to have separate queues for long running resource-intensive queries and fast queries that don’t require big amounts of memory and CPU