December 22, 2019

223 words 2 mins read

NLP from scratch: Solving the cold start problem for natural language processing

NLP from scratch: Solving the cold start problem for natural language processing

How do you train a machine learning model with no training data? Michael Johnson and Norris Heintzelman share their journey implementing multiple solutions to bootstrapping training data in the NLP domain, covering topics including weak supervision, building an active learning framework, and annotation adjudication for named-entity recognition.


Talk Title	NLP from scratch: Solving the cold start problem for natural language processing
Speakers	Michael Johnson (Lockheed Martin), Norris Heintzelman (Lockheed Martin)
Conference	Strata Data Conference
Conf Tag	Big Data Expo
Location	San Francisco, California
Date	March 26-28, 2019
URL	Talk Page
Slides	Talk Slides
Video

Unstructured data in the form of documents, web pages, and social media interactions is an ever-growing, ever-more valuable data source for addressing present business problems, from exploring brand sentiment to identifying sensitive information in internal documents. Unfortunately, the classification and annotation algorithms behind solving these problems often require significant amounts of labeled training data to produce desired accuracy. Michael Johnson and Norris Heintzelman share several techniques they’ve implemented to build classification and NER models from scratch. They lead a tour through this space as it applies to NLP and demonstrate their approach and architecture for the following techniques: For each of these topics, Michael and Norris outline the theoretical foundation, the implementation architecture, and tools used and discuss the problems they encountered—so you can avoid making the same mistakes.

comments powered by Disqus

New directions in record linkage

New directions in record linkage

December 22, 2019

The US Census Bureau has been involved in record linkage projects for over 40 years. In that time, there's been a lot of change in computing capabilities and new techniques, and the Census Bureau is reviewing an inventory of linkage methodologies. Yves Thibaudeau describes the progress made so far in identifying specific record linkage techniques for specific applications.

The magic behind your Lyft ride prices: A case study on machine learning and streaming

The magic behind your Lyft ride prices: A case study on machine learning and streaming

December 20, 2019

Rakesh Kumar and Thomas Weise explore how Lyft dynamically prices its rides with a combination of various data sources, ML models, and streaming infrastructure for low latency, reliability, and scalabilityallowing the pricing system to be more adaptable to real-world changes.

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

User-based real-time product recommendations leveraging deep learning using Analytics Zoo on Apache Spark and BigDL

December 20, 2019

User-based real-time recommendation systems have become an important topic in ecommerce. Lu Wang, Nicole Kong, Guoqiong Song, and Maneesha Bhalla demonstrate how to build deep learning algorithms using Analytics Zoo with BigDL on Apache Spark and create an end-to-end system to serve real-time product recommendations.

Building a robust content recommendation platform for 60 million news readers

Building a robust content recommendation platform for 60 million news readers

December 18, 2019

Matt Chapman leads a walkthrough of the architecture and open source components that serve Tribune Publishing's content recommendation system, powered by online machine learning at scale. Find out how multiple publications, multiple recommendation algorithms, and one scalable architecture regularly achieve double the performance of the legacy solution.

Introducing KFServing: Serverless Model Serving on Kubernetes

Introducing KFServing: Serverless Model Serving on Kubernetes

December 17, 2019

Production-grade serving of ML models is a challenging task for data scientists. In this talk, we'll discuss how KFServing powers some real-world examples of inference in production at Bloomberg, whic …

Leveling Up Your CD: Unlocking Progressive Delivery on Kubernetes

Leveling Up Your CD: Unlocking Progressive Delivery on Kubernetes

December 8, 2019

Kubernetes Continuous Delivery methods have continued to evolve to more advanced strategies such as canary, A/B testing, and blue-green. Progressive delivery is the next step of CD, enabling service p …