February 17, 2020

260 words 2 mins read

The quest for high-quality data

The quest for high-quality data

Ihab Ilyas highlights the data-quality problem and describes the HoloClean framework, a state-of-the-art prediction engine for structured data with direct applications in detecting and repairing data errors, as well as imputing missing labels and values.

Talk Title The quest for high-quality data
Speakers Ihab Ilyas (University of Waterloo)
Conference O’Reilly Artificial Intelligence Conference
Conf Tag Put AI to Work
Location London, United Kingdom
Date October 15-17, 2019
URL Talk Page
Slides Talk Slides
Video Talk Video

“AI starts with good data” is a statement that receives wide agreement from data scientists, analysts, and business owners. There has been a significant increase in our ability to build complex AI models for prediction, classification, and various analytics tasks, and there’s an abundance of (fairly easy to use) tools that allow data scientists and analysts to provision complex models within days. However, the lack of data or data-quality issues remains the main bottleneck holding back further adoption of AI technologies. Even with advances in building robust models, the reality is that noisy data and incomplete data remain the biggest hurdles to effective end-to-end solutions. Multiple studies prove that cleaning data is a much more effective investment than enhancing learning robustness. Ihab Ilyas highlights this data quality problem and describes the HoloClean framework, a state-of-the-art prediction engine for structured data with direct applications in detecting and repairing data errors, as well as imputing missing labels and values. The framework uses techniques such as data augmentation and self-supervised learning to build models that describe how data is generated and how errors and anomalies are introduced.

comments powered by Disqus