AWS Kinesis Data Firehose
- Kinesis Data Firehose is a fully managed service as there is no need to write applications or manage resources
- data transfer solution for delivering real time streaming data to destinations such as S3, Redshift, Elasticsearch service, and Splunk.
- is NOT real time as it buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. Buffer
- Size is in MBs and Buffer Interval is in seconds.
- supports multiple producers as datasource, which include Kinesis data stream, Kinesis Agent, or the Kinesis Data Firehose API using the AWS SDK, CloudWatch Logs, CloudWatch Events, or AWS IoT
- supports out of box data transformation as well as custom transformation using Lambda function to transform incoming source data and deliver the transformed data to destinations
- supports interface VPC endpoint to keep traffic between the Amazon VPC and Kinesis Data Firehose from leaving the Amazon network. Interface VPC endpoints don’t require an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection
Kinesis Data Streams vs Kinesis Firehose
Refer Kinesis Data Streams vs Kinesis Firehose blog post.
AWS Certification Exam Practice Questions
- A user is designing a new service that receives location updates from 3600 rental cars every hour. The cars location needs to be uploaded to an Amazon S3 bucket. Each location must also be checked for distance from the original rental location. Which services will process the updates and automatically scale?
- Amazon EC2 and Amazon EBS
- Amazon Kinesis Firehose and Amazon S3
- Amazon ECS and Amazon RDS
- Amazon S3 events and AWS Lambda
- You need to perform ad-hoc SQL queries on massive amounts of well-structured data. Additional data comes in constantly at a high velocity, and you don’t want to have to manage the infrastructure processing it if possible. Which solution should you use?
- Kinesis Firehose and RDS
- EMR running Apache Spark
- Kinesis Firehose and Redshift
- EMR using Hive
- Your organization needs to ingest a big data stream into their data lake on Amazon S3. The data may stream in at a rate of hundreds of megabytes per second. What AWS service will accomplish the goal with the least amount of management?
- Amazon Kinesis Firehose
- Amazon Kinesis Streams
- Amazon CloudFront
- Amazon SQS
- A startup company is building an application to track the high scores for a popular video game. Their Solution Architect is tasked with designing a solution to allow real-time processing of scores from millions of players worldwide. Which AWS service should the Architect use to provide reliable data ingestion from the video game into the datastore?
- AWS Data Pipeline
- Amazon Kinesis Firehose
- Amazon DynamoDB Streams
- Amazon Elasticsearch Service
- A company has an infrastructure that consists of machines which keep sending log information every 5 minutes. The number of these machines can run into thousands and it is required to ensure that the data can be analyzed at a later stage. Which of the following would help in fulfilling this requirement?
- Use Kinesis Firehose with S3 to take the logs and store them in S3 for further processing.
- Launch an Elastic Beanstalk application to take the processing job of the logs.
- Launch an EC2 instance with enough EBS volumes to consume the logs which can be used for further processing.
- Use CloudTrail to store all the logs which can be analyzed at a later stage.