Kinesis Streams vs Kinesis Firehose

Last Updated : 04-Oct-2020

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.

How Amazon Kinesis Data Streams works

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt your data streams before loading, minimizing the amount of storage used and increasing security.

How Amazon Kinesis Data Firehose works

Kinesis Usage Patterns

Feature Data Streams Firehose
Purpose Low latency streaming service Store streaming data to S3, Redshift, Elasticsearch or Splunk
Provisioning Managed service with manual shard provisioning Fully managed with autoscaling
Processing latency 200ms for classic; 70ms for fan-out near real time or buffered every 60 seconds minimum
Scaling Manual shard config automatic
Data Storage 1 day but extendable up to 7 days None
Replay Yes No
Producers Kinesis Prodcur Library(KPL), Kinesis Agent, CloudWatch, IoT KPL, Kinesis Agent, CloudWatch, IoT plus Data Streams
Consumers Open ended – multiple – supports AWS SDK,  KCL and Spark Targets limited as defined above.

When should I use AWS Glue Streaming and when should I use Amazon Kinesis Data Analytics?

Both AWS Glue and Amazon Kinesis Data Analytics can be used to process streaming data. AWS Glue is recommended when your use cases are primarily ETL and when you want to run jobs on a serverless Apache Spark-based platform. Amazon Kinesis Data Analytics is recommended when your use cases are primarily analytics and when you want to run jobs on a serverless Apache Flink-based platform.

Streaming ETL in AWS Glue enables advanced ETL on streaming data using the same serverless, pay-as-you-go platform that you currently use for your batch jobs. AWS Glue generates customizable ETL code to prepare your data while in flight and has built-in functionality to process streaming data that is semi-structured or has an evolving schema. Use Glue to apply both its built-in and Spark-native transforms to data streams and load them into your data lake or data warehouse.

Using Template: Template Post
magnifier linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram