October 20, 2019

251 words 2 mins read

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)

Kazunori Sato and Amy Unruh explore how you can use TensorFlow to drive large-scale distributed machine learning against your analytic data sitting in Google BigQuery, with data preprocessing driven by Dataflow (now Apache Beam). Kazunori and Amy dive into practical examples of how these technologies can work together to enable a powerful workflow for distributed machine learning.


Talk Title	TensorFlow: Large-scale analytics and distributed machine learning with TensorFlow, BigQuery, and Dataflow (Apache Beam)
Speakers	Kaz Sato (Google), Amy Unruh (Google)
Conference	Strata + Hadoop World
Conf Tag	Big Data Expo
Location	San Jose, California
Date	March 29-31, 2016
URL	Talk Page
Slides	Talk Slides
Video

TensorFlow is an open source software library for machine learning, based on previous generations of software within Google for training and deploying neural networks. BigQuery is Google’s fully managed, low-cost analytics data warehouse, which lets you do interactive queries on petabyte-sized datasets. Google Cloud Dataflow (now Beam, an Apache incubator project) is a unified programming model and service for developing and executing a wide range of data processing and analytics patterns. Together, they enable a powerful workflow for distributed machine learning. Kazunori Sato and Amy Unruh describe these technologies and explain how they work together. Kazunori and Amy offer practical examples of how you can use them to empower large-scale distributed training of neural networks and how you can use the trained models for prediction. They’ll also demonstrate how to use Google machine-learning APIs to make ML accessible to everybody. This session is sponsored by Google.

prediction api google apache dataset tensorflow ml large-scale open source analytics data warehouse network programming machine learning cloud neural network

comments powered by Disqus

TensorFlow: Machine learning for everyone

TensorFlow: Machine learning for everyone

October 20, 2019

TensorFlow is an open source software library for numerical computation with a focus on machine learning. Rajat Monga offers an introduction to TensorFlow and explains how to use it to train and deploy machine-learning models to make your next application smarter.

Visualizing millions of datapoints with GPUs in the client and server

Visualizing millions of datapoints with GPUs in the client and server

October 13, 2019

The ability to visualize millions of data points opens up a world of applications. But its hard to quickly render that much data, let alone compute anything with it. Thibaud Hottelier shares the GPU technologya hybrid client/server GPU engine using Node-OpenCL and a new library, CL.jsthat Graphistry used to break the million datapoint barrier for fast, interactive visualizations.

Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks

Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks

October 20, 2019

Celtra provides a platform for customers like Porsche and Fox to create, track, and analyze digital display advertising. Celtra's platform processes billions of ad events daily to give analysts fast and easy access to reports and ad hoc analytics. Grega Kepret outlines Celtras data-pipeline challenges and explains how it solved them by combining Snowflake's cloud data warehouse with Spark.

What's next for BDAS (the Berkeley Data Analytics Stack)?

What's next for BDAS (the Berkeley Data Analytics Stack)?

October 18, 2019

Michael Franklin offers an overview of the Berkeley Data Analytics Stack, outlines the current directions it's taking, and settles once and for all how BDAS should be pronounced.

Offline-first apps with PouchDB

Offline-first apps with PouchDB

October 15, 2019

Web and mobile apps shouldn't stop working when there's no network connection. PouchDB is an open source syncing JavaScript database that runs within a web browser. Offline-first apps built using PouchDB can provide a better, faster user experienceboth on- and offline. Bradley Holt demonstrates how to use PouchDB and CouchDB to build offline-enabled web and mobile apps.

Self-service, interactive analytics at multipetabyte scale in capital markets regulation on the cloud

Self-service, interactive analytics at multipetabyte scale in capital markets regulation on the cloud

October 20, 2019

Scott Donaldson and Matt Cardillo detail the security measures and system architecture needed to bring alive a multipetabyte data warehouse via interactive analytics and directed graphs from several trillions of market events, using HBase, EMR, Hive, Redshift, and S3 technologies in a cost-efficient manner.