Machine learning over real-time streaming data with TensorFlow

In many applications where data is generated continuously, combining machine learning with streaming data is imperative to discover useful information in real time. Yong Tang explores TensorFlow I/O, which can be used to easily build a data pipeline with TensorFlow and stream frameworks such as Apache Kafka, AWS Kinesis, or Google Cloud PubSub.


Talk Title	Machine learning over real-time streaming data with TensorFlow
Speakers	Yong Tang (MobileIron)
Conference	O’Reilly TensorFlow World
Conf Tag
Location	Santa Clara, California
Date	October 28-31, 2019
URL	Talk Page
Slides	Talk Slides
Video

Applying machine learning over streaming data to discover useful information has been a topic of interest for some time. In many real-world applications such as IoT sensors, web transactions, GPS positions, or social media updates, large volumes of data is generated continuously. It’s critical to have a data pipeline that’s able to reliably and conveniently receive, preprocess, and provide data for model inference and training purposes. Yong Tang explores the TensorFlow I/O package for streaming data processing with TensorFlow. Developed by SIG IO of the TensorFlow project, TensorFlow I/O is a software package with a focus on data I/O, streaming, and file formats for TensorFlow. It supports a wide variety of open source software and frameworks beyond machine learning itself. In the field of streaming data, TensorFlow I/O provides supports for Apache Kafka, AWS Kinesis, and Google Cloud PubSub, which are the most widely used streaming frameworks at the moment. TensorFlow I/O is built on top of tf.data and is fully compatible with the succinct tf.keras API. That means model inference of streaming data with Kafka, Kinesis, and PubSub could be as easy as a one-liner. Coupled with the data transformation functions in tf.data, the model training over batches of streaming data could also be done in a straightforward way. In addition to streaming input, TensorFlow I/O also provides streaming output support so that the data generated by machine learning algorithms in real time could be delivered back to Kafka, allowing the continuous data ingestion by another application. With both input and output support, it’s possible to build a TensorFlow-centric streaming pipeline with minimal components, which greatly reduces infrastructure maintenance over the long run. You’ll see a demo showcasing the convenience of TensorFlow I/O usage and the ability of having a complete streaming data pipeline for machine learning with ease.

Machine learning over real-time streaming data with TensorFlow

Unleashing Apache Kafka and TensorFlow in hybrid architectures

Analytics Zoo: Distributed TensorFlow in production on Apache Spark

Architecting a data analytics service both in the public cloud and in the on-premise private cloud: ETL, BI, and machine learning (sponsored by SK Holdings)

Deep learning on Apache Spark at CERNs Large Hadron Collider with Analytics Zoo

Now you see me; now you compute: Building event-driven architectures with Apache Kafka

Open and Neutral Edge Computing Architecture on Heterogeneous Devices