What is AWS Kinesis?

What is AWS Kinesis?

Amazon Web Services (AWS) Kinesis is a cloud-based service that can fully manage large distributed data streams in real-time. This serverless data service captures, processes, and stores large amounts of data. It is a functional and secure global cloud platform with millions of customers from nearly every industry. Companies from Comcast to the Hearst Corporation are using AWS Kinesis.

Jump to:

What are the 4 Types of AWS Kinesis?

There are different types of AWS Kinesis data streams. These include Video Streams, Data Firehose, Data Streams, and Data Analytics. While each of these are different methods of processing and storing data, there are overlapping similarities. The following provides detailed information regarding each of these services.

Video Streams

What it is:

Amazon Video Streams offers users an easy method to stream video from various connected devices to AWS. Whether it’s machine learning, playback, or analytics, Video Streams will automatically scale the infrastructure from streaming data and then encrypt, store, and index the video data. This enables live, on-demand viewing. The process allows integrations with libraries such as OpenCV, TensorFlow, and Apache MxNet.

How it works:

The Amazon Video Streams starts with the use of the AWS Management Console. After installing Kinesis Video Streams on a device, users can stream media to AWS for analytics, playback, and storage. The Video Streams features a specific platform for streaming video from devices with cameras to Amazon Web Services. This includes internet video streaming or storing security footage. This platform also offers WebRTC support and connecting devices that use the Application Programming Interface. 

Data consumers: 

MxNet, HLS-based media playback, Amazon SageMaker, Amazon Rekognition

Benefits:

  • There are no minimum fees or upfront commitments.
  • Users only pay for what they use.
  • Users can stream video from literally millions of different devices.
  • Users can build video-enabled apps with real-time computer-assisted vision capabilities.
  • Users can playback recorded and live video streams.
  • Users can extract images for machine learning applications.
  • Users can enjoy searchable and durable storage.
  • There is no infrastructure to manage.

Use Cases:

  • Users can engage in peer-to-peer media streaming.
  • Users can engage in video chat, video processing, and video-related AI/ML.
  • Smart homes can use Video Streams to stream live audio and video from devices such as baby monitors, doorbells, and various home surveillance systems.
  • Users can enjoy real-time interaction when talking with a person at the door.
  • Users can control, from their mobile phone, a robot vacuum.
  • Secure Video Streams provides access to streams using Access Management (IAM) and AWS Identity.
  • City governments can use Video Streams to securely store and analyze large amounts of video data from cameras at traffic lights and other public venues.
  • An Amber Alert system is a specific example of using Video Streams.
  • Industrial uses include using Video Streams to collect time-coded data such as LIDAR and RADAR signals.
  • Video Streams are also helpful for extracting and analyzing data from various industrial equipment and using it for predictive maintenance and even predicting the lifetime of a particular part.

Data Firehose

What it is:

Data Firehose is a service that can extract, capture, transform, and deliver streaming data to analytic services and data lakes. Data Firehose can take raw streaming data and convert it into various formats, including Apache Parquet. Users can select a destination, create a delivery stream, and start streaming in real-time in only a few steps. 

How it works:

Data Firehose allows users to connect with potentially dozens of fully integrated AWS services and streaming destinations. The Firehose is basically a steady stream of all of a user’s available data and can deliver data constantly as updated data comes in. The amount of data coming through may increase substantially or just trickle through. All data continues to make its way through, crunching until it’s ready for visualizing, graphing, or publishing. Data Firehose loads data onto Amazon Web Services while transforming the data into Cloud services that are basically in use for analytical purposes.

Data consumers: 

Consumers include Splunk, MongoDB, Amazon Redshift, Amazon Elasticsearch, Amazon S3, and generic HTTP endpoints.

Benefits:

  • Users can pay as they go and only pay for the data they transmit.
  • Data Firehose offers easy launch and configurations.
  • Users can convert data into specific formats for analysis without processing pipelines.
  • The user can specify the size of a batch and control the speed for uploading data.
  • After launching, the delivery streams provide elastic scaling.
  • Firehose can support data formats like Apache ORC and Apache Parquet.
  • Before storing, Firehose can convert data formats from JSON to ORC formats or Parquet. This saves on analytics and storage costs.
  • Users can deliver their partitioned data to S3 using dynamically defined or static keys. Data Firehose will group data by different keys.
  • Data Firehose automatically applies various functions to all input data records and loads transformed data to each destination.
  • Data Firehose gives users the option to encrypt data automatically after uploading. Users can specifically appoint an AWS Key Management encryption key.
  • Data Firehose features a variety of metrics that are found through the console and Amazon CloudWatch. Users can implement these metrics to monitor their delivery streams and modify destinations.

Use Cases: 

  • Users can build machine learning streaming applications. This can help users predict inference endpoints and analyze data.
  • Data Firehose provides support for a variety of data destinations. A few it currently supports include Amazon Redshift, Amazon S3, MongoDB, Splunk, Amazon OpenSearch Service, and HTTP endpoints.
  • Users can monitor network security with Event Management (SIEM) tools and supported Security Information.
  • Firehose supports compression algorithms such as Zip, Snappy, GZip, and Hadoop-Compatible Snappy.
  • Users can monitor in real-time IoT analytics.
  • Users can create Clickstream sessions and create log analytics solutions.
  • Firehose provides several security features.

Data Streams

What it is:

Data Streams is a real-time streaming service that provides durability and scalability and can continuously capture gigabytes from hundreds of thousands of different sources. Users can collect log events from their servers and various mobile deployments. This particular platform puts a strong emphasis on security. Data streams allow users to encrypt sensitive data with AWS KMS master keys and a server-side encryption system. With Kinesis Producer Library, users can easily create Data Streams.

How it works:

Users can create Kinesis Data Streams applications and other types of data processing applications with Data Streams. Users can also send their processed records to dashboards and then use them when generating alerts, changing advertising strategies, and changing pricing.

Data consumers:

Amazon EC2, Amazon EMR, AWS Lambda, and Kinesis Data Analytics

Benefits:

  • Data Streams provide real-time data aggregation after loading the aggregate data into a map-reduce cluster or data warehouse.
  • Kinesis Data Streams features a delay time between when records are put in the stream and when users can retrieve them, which is approximately less than a second.
  • Data Streams applications can consume data from the stream almost instantly after adding the data.
  • Data Streams allow users to scale up or down, so users never lose any data before expiration.
  • The Client Library supports fault-tolerant data consumption and offers support for scaling support Data Streams applications.

Use Cases:

  • Data Streams can work with IT infrastructure log data, market data feeds, web clickstream data, application logs, and social media.
  • Data Streams provides application logs and a push system that features processing in only seconds. This also prevents losing log data even if the application or front-end server fails.
  • Users don’t batch data on servers before submitting it for intake. This accelerates the data intake.
  • Users don’t have to wait to receive batches of data but can work on metrics and application logs as the data is streaming in.
  • Users can analyze site usability engagement while multiple Data Streams applications run parallel.
  • Gaming companies can feed data into their gaming platform.

Data Analytics

What it is:

Data Analytics provides open-source libraries such as AWS service integrations, AWS SDK, Apache Beam, Apache Zeppelin, and Apache Flink. It’s for transforming and analyzing streaming data in real-time.

How it works:

Its primary function is to serve as a tracking and analytics platform. It specifically can set up goals, run fast analyses, add tracking codes to various sites, and track events. It’s important to distinguish Data Analytics from Data Studio. Data Studio can access a lot of the same data as Data Analytics but displays site traffic in different ways. Data Studio can help users share their data with others who are perhaps less technical and don’t understand analytics well.

Data consumers:

Results are sent to a Lambda function, Kinesis Data Firehose delivery stream, or another Kinesis stream.

Benefits:

  • Users can deliver their streaming data in a matter of seconds. They can develop applications that deliver the data to a variety of services.
  • Users can enjoy advanced integration capabilities that include over 10 Apache Flink connectors and even the ability to put together custom integrations.
  • With just a few lines of code, users can modify integration abilities and provide advanced functionality.
  • With Apache Flink primitives, users can build integrations that enable reading and writing from sockets, directories, files, or various other sources from the internet.

Use Cases: 

  • Data Analytics is compatible with AWS Glue Schema Registry. It’s serverless and lets users control and validate streaming data while using Apache Avro schemes. This is at no additional charge.
  • Data Analytics features APIs in Python, SQL, Scala, and Java. These offer specialization for various use cases such as streaming ETL, stateful event processing, and real-time analytics.
  • Users can deliver data to the following and implement Data Analytics libraries for Amazon Simple Storage Service, Amazon OpenSearch Service, Amazon DynamoDB, AWS Glue Schema Registry, Amazon CloudWatch, and Amazon Managed Streaming for Apache Kafka. 
  • Users can enjoy “Exactly Once Processing.” This involves using Apache Flink to build applications in which processed records affect results. Even if there are disruptions, such as internal service maintenance, the data will still process without any duplicate data.
  • Users can also integrate with the AWS Glue Data Catalog store. This allows users to search multiple AWS datasets
  • Data Analytics provides the schema editor to find and edit input data structure. The system will recognize standard data formats like CSV and JSON automatically. The editor is easy to use, infers the data structure, and aids users in further refinement.
  • Data Analytics can integrate with both Amazon Kinesis Data Firehose and Data Streams. Pointing data analytics at the input stream will cause it to automatically read, parse, and make the data available for processing. 
  • Data Analytics allows for advanced processing functions that include top-K analysis and anomaly detection on the streaming data. 

What are the Main Differences Between Data Firehose and Data Streams?

One of the primary differences is in the architecture of each. For example, data enters through Kinesis Data Streams, which is, at the most basic level, a group of shards. Each shard has its own sequence of data records. Firehose delivery stream assists in automatically sending data to specific destinations such as S3, Redshift, or Splunk.

The primary objectives between the two are also different. Data Streams is basically a low latency service and ingesting at scale. Firehose is generally a data transfer and loading service. Data Firehose is constantly loading data to the destinations users choose, while Streams generally ingests and stores the date for processing. Firehose will store data for analytics while Streams builds customized, real-time applications. 

Why Choose LogicMonitor?

LogicMonitor is the leading SaaS-based IT data collaboration and observability platform. Companies that need a seamless infrastructure monitoring platform can count on LogicMonitor to provide a single source of observability. They can provide cutting-edge cloud resources and visibility into servers, networks, applications, and log data in one platform. LogicMonitor can analyze both Kinesis and Firehose data by analyzing a wide range of metrics automatically. Contact LogicMonitor for a demo.