February 12, 2020

252 words 2 mins read

Learning with limited labeled data

Learning with limited labeled data

Supervised machine learning requires large labeled datasetsa prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities.


Talk Title	Learning with limited labeled data
Speakers	Shioulin Sam (Cloudera Fast Forward Labs)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 24-26, 2019
URL	Talk Page
Slides	Talk Slides
Video

Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist but has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement, but is an expensive undertaking. Classical approaches to this problem rely on human and machine collaboration. In these approaches, engineered heuristics are used to smartly select “best” instances of data to label in order to reduce cost. A human steps in to provide the label; the model then learns from this smaller labeled dataset. Recent advancements have made these approaches amenable to deep learning, enabling models to be built with limited labeled data. Shioulin Sam explores algorithmic approaches that drive this capability and provides practical guidance for translating this capability into production. You’ll view a live demonstration to understand how and why these algorithms work.

algorithm dataset deep learning

comments powered by Disqus

Spark NLP in action: How Indeed applies NLP to standardize rsum content at scale

Spark NLP in action: How Indeed applies NLP to standardize rsum content at scale

January 6, 2020

Alexander Thomas and Alexis Yelton demonstrate how to use Spark NLP and Apache Spark to standardize semistructured text, illustrated by Indeed's standardization process for rsum content.

Leveraging AI for social good

Leveraging AI for social good

December 30, 2019

The hardware, software, and algorithms that automatically tag our images or recommend the next book to read can also improve medical diagnosis and protect our natural resources. Jack Dashwood and Anna Bethke discuss a variety of technical projects at Intel that have enabled social good organizations and provide guidance on creating or engaging in these types of projects.

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

December 20, 2019

User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations.

Open-endedness: A new grand challenge for AI

Open-endedness: A new grand challenge for AI

February 4, 2020

We think a lot in machine learning about encouraging computers to solve problems, but there's another kind of learning, called open-endedness, that's just beginning to attract attention in the field. Kenneth Stanley walks you through how open-ended algorithms keep on inventing new and ever-more complex tasks and solving them continuallyeven endlessly.

Scaling AI at Cerebras

Scaling AI at Cerebras

February 3, 2020

Long training times are the single biggest factor slowing down innovation in deep learning. Today's common approach of scaling large workloads out over many small processors is inefficient and requires extensive model tuning. Urs Kster explains why with increasing model and dataset sizes, new ideas are needed to reduce training times.

Building machine learning inference pipelines at scale

Building machine learning inference pipelines at scale

January 31, 2020

Real-life ML workloads require more than training and predicting: data often needs to be preprocessed and postprocessed. Developers and data scientists have to train and deploy a sequence of algorithms that collaborate in delivering predictions from raw data. Julien Simon outlines how to build machine learning inference pipelines using open source libraries and how to scale them on AWS.