Learning with limited labeled data
Supervised machine learning requires large labeled datasetsa prohibitive limitation in many real world applications. But this could be avoided if machines could earn with a few labeled examples. Shioulin Sam explores and demonstrates an algorithmic solution that relies on collaboration between human and machine to label smartly, and she outlines product possibilities.
Talk Title | Learning with limited labeled data |
Speakers | Shioulin Sam (Cloudera Fast Forward Labs) |
Conference | Strata Data Conference |
Conf Tag | Make Data Work |
Location | New York, New York |
Date | September 24-26, 2019 |
URL | Talk Page |
Slides | Talk Slides |
Video | |
Being able to teach machines with examples is a powerful capability, but it hinges on the availability of vast amounts of data. The data not only needs to exist but has to be in a form that allows relationships between input features and output to be uncovered. Creating labels for each input feature fulfills this requirement, but is an expensive undertaking. Classical approaches to this problem rely on human and machine collaboration. In these approaches, engineered heuristics are used to smartly select “best” instances of data to label in order to reduce cost. A human steps in to provide the label; the model then learns from this smaller labeled dataset. Recent advancements have made these approaches amenable to deep learning, enabling models to be built with limited labeled data. Shioulin Sam explores algorithmic approaches that drive this capability and provides practical guidance for translating this capability into production. You’ll view a live demonstration to understand how and why these algorithms work.