data transfer solution for delivering real time streaming data to destinations such as S3, Redshift, Elasticsearch service, and Splunk.
is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration
is Near Real Time (min. 60 secs) as it buffers incoming streaming data to a certain size or for a certain period of time before delivering it
supports batching, compression, and encryption of the data before loading it, minimizing the amount of storage used at the destination and increasing security
supports data compression, minimizing the amount of storage used at the destination. It currently supports GZIP, ZIP, and SNAPPY compression formats. Only GZIP is supported if the data is further loaded to Redshift.
supports out of box data transformation as well as custom transformationusing Lambda function to transform incoming source data and deliver the transformed data to destinations
uses at least once semantics for data delivery.
supports multiple producers as datasource, which include Kinesis data stream, KPL, Kinesis Agent, or the Kinesis Data Firehose API using the AWS SDK, CloudWatch Logs, CloudWatch Events, or AWS IoT
does NOT support consumers like Spark and KCL
supports interface VPC endpoint to keep traffic between the VPC and Kinesis Data Firehose from leaving the Amazon network.
Kinesis Data Streams vs Kinesis Data Firehose
Kinesis Data Analytics
helps analyze streaming data, gain actionable insights, and respond to the business and customer needs in real time.
reduces the complexity of building, managing, and integrating streaming applications with other AWS service
is made up of all of the columns listed in the sort key definition, in the order they are listed and is more efficient when query predicates use a prefix, or query’s filter applies conditions, such as filters and joins, which is a subset of the sort key columns in order.
Interleaved sort key
gives equal weight to each column in the sort key, so query predicates can use any subset of the columns that make up the sort key, in any order.
Not ideal for monotonically increasing attributes
Column encodings CANNOT be changed once created.
supports query queues for Workload Management, in order to manage concurrency and resource planning. It is a best practice to have separate queues for long running resource-intensive queries and fast queries that don’t require big amounts of memory and CPU
is a very fast, easy-to-use, cloud-powered business analytics service that makes it easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device.
delivers fast and responsive query performance by using a robust in-memory engine (SPICE).
“SPICE” stands for a Super-fast, Parallel, In-memory Calculation Engine
can also be configured to keep the data in SPICE up-to-date as the data in the underlying sources change.
automatically replicates data for high availability and enables QuickSight to scale to support users to perform simultaneous fast interactive analysis across a wide variety of AWS data sources.
Excel files and flat files like CSV, TSV, CLF, ELF
on-premises databases like PostgreSQL, SQL Server and MySQL
SaaS applications like Salesforce
and AWS data sources such as Redshift, RDS, Aurora, Athena, and S3
supports various functions to format and transform the data.
supports assorted visualizations that facilitate different analytical approaches:
Comparison and distribution – Bar charts (several assorted variants)
Elasticsearch Service is a managed service that makes it easy to deploy, operate, and scale Elasticsearch clusters in the AWS Cloud.
real-time, distributed search and analytics engine
ability to provision all the resources for Elasticsearch cluster and launches the cluster
easy to use cluster scaling options. Scaling Elasticsearch Service domain by adding or modifying instances, and storage volumes is an online operation that does not require any downtime.
provides self-healing clusters, which automatically detects and replaces failed Elasticsearch nodes, reducing the overhead associated with self-managed infrastructures
domain snapshots to back up and restore ES domains and replicate domains across AZs
enhanced security with IAM, Network, Domain access policies, and fine-grained access control
storage volumes for the data using EBS volumes
ability to span cluster nodes across multiple AZs in the same region, known as zone awareness, for high availability and redundancy. Elasticsearch Service automatically distributes the primary and replica shards across instances in different AZs.
dedicated master nodes to improve cluster stability
data visualization using the Kibana tool
integration with CloudWatch for monitoring ES domain metrics
integration with CloudTrail for auditing configuration API calls to ES domains
integration with S3, Kinesis, and DynamoDB for loading streaming data
ability to handle structured and Unstructured data
supports encryption at rest through KMS, node-to-node encryption over TLS, and the ability to require clients to communicate of HTTPS