December 22, 2019

223 words 2 mins read

NLP from scratch: Solving the cold start problem for natural language processing

NLP from scratch: Solving the cold start problem for natural language processing

How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition.

Talk Title NLP from scratch: Solving the cold start problem for natural language processing
Speakers Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Francisco, California
Date March 26-28, 2019
URL Talk Page
Slides Talk Slides
Video

Unstructured data in the form of documents, web pages, and social media interactions is an ever-growing, ever-more valuable data source for addressing present business problems, from exploring brand sentiment to identifying sensitive information in internal documents. Unfortunately, the classification and annotation algorithms behind solving these problems often require significant amounts of labeled training data to produce desired accuracy. Michael Johnson and Norris Heintzelman share several techniques they’ve implemented to build classification and NER models from scratch. They lead a tour through this space as it applies to NLP and demonstrate their approach and architecture for the following techniques: For each of these topics, Michael and Norris outline the theoretical foundation, the implementation architecture, and tools used and discuss the problems they encountered—so you can avoid making the same mistakes.

comments powered by Disqus