Data streaming

Data streaming is the continuous transfer of data at a steady, high-speed rate.

Although the concept of data streaming is not new, its practical applications are a relatively recent development. This is because in the early years of the world wide web, internet connectivity was not always reliable and bandwidth limitations often prevented streaming data to arrive at its destination in an unbroken sequence. Developers created buffers to allow data streams to catch up, but the resulting jitter caused the user experience to be so poor that most consumers preferred to download content rather than stream it.

Today, with the advent of broadband internet, cloud computing and the internet of things (IoT), there is an increased interest in analyzing the data from streaming sources to make data-driven decisions in real time. To facilitate the need for real-time information from disparate data sources, many companies have replaced traditional batch processing with streaming data architectures that can accommodate batch processing.

In batch processing, newly arriving data elements are collected in a group and the entire group is processed at some future time. In contrast, a streaming data architecture processes data in motion and an ETL batch is treated as just one more event in a continuous stream of events.

To benefit from data streaming, businesses supported by streaming architectures require powerful analytics tools for ingesting and processing information. Popular tools for working with data streams include:

Amazon Kinesis Firehose - an Amazon Web Service (AWS) for processing big data in real time. Kinesis is capable of processing hundreds of terabytes per hour from high volumes of streaming data from sources such as operating logs, financial transactions and social media feeds.

Apache Flink - a distributed data processing platform for use in big data applications, primarily involving analysis of data stored in Hadoop clusters. Flink handles both batch and stream processing jobs, with data streaming the default implementation and batch jobs running as special-case versions of streaming applications.

Yonas Tibebu

Search This Blog

Data streaming

Comments

Post a Comment

Popular posts from this blog

Black swan

A Graphics Processing Unit (GPU)

6G (sixth-generation wireless)