Kinesis Streams vs Kinesis Firehose

Updated : 04-Oct-2020

Amazon Kinesis Data Streams

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.

How Amazon Kinesis Data Streams works

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt your data streams before loading, minimizing the amount of storage used and increasing security.

How Amazon Kinesis Data Firehose works

Kinesis Usage Patterns

Feature	Data Streams	Firehose
Purpose	Low latency streaming service	Store streaming data to S3, Redshift, Elasticsearch or Splunk
Provisioning	Managed service with manual shard provisioning	Fully managed with autoscaling
Processing	latency 200ms for classic; 70ms for fan-out	near real time or buffered every 60 seconds minimum
Scaling	Manual shard config	automatic
Data Storage	1 day but extendable up to 7 days	None
Replay	Yes	No
Producers	Kinesis Prodcur Library(KPL), Kinesis Agent, CloudWatch, IoT	KPL, Kinesis Agent, CloudWatch, IoT plus Data Streams
Consumers	Open ended – multiple – supports AWS SDK, KCL and Spark	Targets limited as defined above.

When should I use AWS Glue Streaming and when should I use Amazon Kinesis Data Analytics?

Both AWS Glue and Amazon Kinesis Data Analytics can be used to process streaming data. AWS Glue is recommended when your use cases are primarily ETL and when you want to run jobs on a serverless Apache Spark-based platform. Amazon Kinesis Data Analytics is recommended when your use cases are primarily analytics and when you want to run jobs on a serverless Apache Flink-based platform.

Streaming ETL in AWS Glue enables advanced ETL on streaming data using the same serverless, pay-as-you-go platform that you currently use for your batch jobs. AWS Glue generates customizable ETL code to prepare your data while in flight and has built-in functionality to process streaming data that is semi-structured or has an evolving schema. Use Glue to apply both its built-in and Spark-native transforms to data streams and load them into your data lake or data warehouse.

Russell Jamieson

Share This Post