February 24, 2020

240 words 2 mins read

Effective sampling methods within TensorFlow input functions

Effective sampling methods within TensorFlow input functions

Many real-world machine learning applications require generative or reductive sampling of data. Laxmi Prajapat and William Fletcher demonstrate sampling techniques applied to training and testing data directly inside the input function using the tf.data API.

Talk Title Effective sampling methods within TensorFlow input functions
Speakers Laxmi Prajapat (Datatonic), William Fletcher (Datatonic)
Conference O’Reilly TensorFlow World
Conf Tag
Location Santa Clara, California
Date October 28-31, 2019
URL Talk Page
Slides Talk Slides
Video

Many real-world machine learning applications require generative or reductive sampling of data. At training time this may be to deal with class imbalance (e.g., rarity of positives in a binary classification problem or a sparse user-item interaction matrix) or to augment the data stored on file; it may also simply be a matter of efficiency. Laxmi Prajapat and William Fletcher explore some sampling techniques in the context of recommender systems, using tools available in the tf.data API, and detail which methods are beneficial with given data and hardware demands. They present quantitative results, along with a closer examination of potential pros and cons. Naively, a precomputed subsample of data will make for a fast input function. But to take advantage of random samples, more must be done. Laxmi and William consider how to select from a large dataset containing all possible inputs, and they look at generating these in memory using tf.random and exploiting hash tables where appropriate. These methods grant additional flexibility and reduce data preparation workloads.

comments powered by Disqus