February 24, 2020

240 words 2 mins read

Effective sampling methods within TensorFlow input functions

Effective sampling methods within TensorFlow input functions

Many real-world machine learning applications require generative or reductive sampling of data. Laxmi Prajapat and William Fletcher demonstrate sampling techniques applied to training and testing data directly inside the input function using the tf.data API.


Talk Title	Effective sampling methods within TensorFlow input functions
Speakers	Laxmi Prajapat (Datatonic), William Fletcher (Datatonic)
Conference	O’Reilly TensorFlow World
Conf Tag
Location	Santa Clara, California
Date	October 28-31, 2019
URL	Talk Page
Slides	Talk Slides
Video

Many real-world machine learning applications require generative or reductive sampling of data. At training time this may be to deal with class imbalance (e.g., rarity of positives in a binary classification problem or a sparse user-item interaction matrix) or to augment the data stored on file; it may also simply be a matter of efficiency. Laxmi Prajapat and William Fletcher explore some sampling techniques in the context of recommender systems, using tools available in the tf.data API, and detail which methods are beneficial with given data and hardware demands. They present quantitative results, along with a closer examination of potential pros and cons. Naively, a precomputed subsample of data will make for a fast input function. But to take advantage of random samples, more must be done. Laxmi and William consider how to select from a large dataset containing all possible inputs, and they look at generating these in memory using tf.random and exploiting hash tables where appropriate. These methods grant additional flexibility and reduce data preparation workloads.

api dataset exploit tensorflow machine learning hardware

comments powered by Disqus

A novel solution for a data augmentation and bias problem in NLP using TensorFlow

A novel solution for a data augmentation and bias problem in NLP using TensorFlow

February 24, 2020

Join KC Tung to discover a way to use TensorFlow to solve a natural language processing (NLP) model bias problem with data augmentation for an enterprise customer (one of the largest airlines in the world). KC leveraged hidden gems in tf.data and the new API to easily find a novel use for text generation and found it surprisingly improved his NLP model.

Node-RED and TensorFlow.js: Developing deep learning IoT apps in the browser

Node-RED and TensorFlow.js: Developing deep learning IoT apps in the browser

February 23, 2020

Va Barbosa and Paul Van Ec highlight the benefits of using TensorFlow.js and Node-RED together as an educational tool to engage developers and provide you with a powerful, creativity-inspiring platform for interacting and developing with machine learning models.

Accelerating training, inference, and ML applications on NVIDIA GPUs

Accelerating training, inference, and ML applications on NVIDIA GPUs

February 24, 2020

Maggie Zhang, Nathan Luehr, Josh Romero, Pooya Davoodi, and Davide Onofrio give you a sneak peek at software components from NVIDIAs software stack so you can get the best out of your end-to-end AI applications on modern NVIDIA GPUs. They also examine features and tips and tricks to optimize your workloads right from data loading, processing, training, inference, and deployment.

Dont beat the market; beat the bots: Adversarial networks in finance

Dont beat the market; beat the bots: Adversarial networks in finance

February 24, 2020

Automated investing has brought an immense amount of stability to the market, but it has also brought predictability. Garrett Lander and Al Kari examine if an adversarial network can game the behavior of automated investors by learning the patterns in market activity to which they are most vulnerable.

How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

How Criteo optimized and sped up its TensorFlow models by 10x and served them under 5 ms

February 23, 2020

Criteo's real-time bidding of ad spaces requires its TensorFlow (TF) models to make online predictions in less than 5 ms. Nicolas Kowalski and Axel Antoniotti explain why Criteo moved away from high-level APIs and rewrote its models from scratch, reimplementing cross-features and hashing functions using low-level TF operations in order to factorize as much as possible all TF nodes in its model.

MLIR: Accelerating AI

MLIR: Accelerating AI

February 23, 2020

MLIR is TensorFlow's open source machine learning compiler infrastructure that addresses the complexity caused by growing software and hardware fragmentation and makes it easier to build AI applications. Chris Lattner and Tatiana Shpeisman explain how MLIR is solving this growing hardware and software divide and how it impacts you in the future.