February 23, 2020

388 words 2 mins read

Machine learning over real-time streaming data with TensorFlow

Machine learning over real-time streaming data with TensorFlow

In many applications where data is generated continuously, combining machine learning with streaming data is imperative to discover useful information in real time. Yong Tang explores TensorFlow I/O, which can be used to easily build a data pipeline with TensorFlow and stream frameworks such as Apache Kafka, AWS Kinesis, or Google Cloud PubSub.

Talk Title Machine learning over real-time streaming data with TensorFlow
Speakers Yong Tang (MobileIron)
Conference O’Reilly TensorFlow World
Conf Tag
Location Santa Clara, California
Date October 28-31, 2019
URL Talk Page
Slides Talk Slides

Applying machine learning over streaming data to discover useful information has been a topic of interest for some time. In many real-world applications such as IoT sensors, web transactions, GPS positions, or social media updates, large volumes of data is generated continuously. It’s critical to have a data pipeline that’s able to reliably and conveniently receive, preprocess, and provide data for model inference and training purposes. Yong Tang explores the TensorFlow I/O package for streaming data processing with TensorFlow. Developed by SIG IO of the TensorFlow project, TensorFlow I/O is a software package with a focus on data I/O, streaming, and file formats for TensorFlow. It supports a wide variety of open source software and frameworks beyond machine learning itself. In the field of streaming data, TensorFlow I/O provides supports for Apache Kafka, AWS Kinesis, and Google Cloud PubSub, which are the most widely used streaming frameworks at the moment. TensorFlow I/O is built on top of tf.data and is fully compatible with the succinct tf.keras API. That means model inference of streaming data with Kafka, Kinesis, and PubSub could be as easy as a one-liner. Coupled with the data transformation functions in tf.data, the model training over batches of streaming data could also be done in a straightforward way. In addition to streaming input, TensorFlow I/O also provides streaming output support so that the data generated by machine learning algorithms in real time could be delivered back to Kafka, allowing the continuous data ingestion by another application. With both input and output support, it’s possible to build a TensorFlow-centric streaming pipeline with minimal components, which greatly reduces infrastructure maintenance over the long run. You’ll see a demo showcasing the convenience of TensorFlow I/O usage and the ability of having a complete streaming data pipeline for machine learning with ease.

comments powered by Disqus