Document vectors in the wild: Building a content recommendation system for Reuters.com

James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content.


Talk Title	Document vectors in the wild: Building a content recommendation system for Reuters.com
Speakers	James Dreiss (Reuters)
Conference	Strata Data Conference
Conf Tag	Make Data Work
Location	New York, New York
Date	September 11-13, 2018
URL	Talk Page
Slides	Talk Slides
Video

In the summer of 2017, Reuters.com embarked on an ambitious redesign of its article pages, specifically a scroll design in which articles that users request to read are immediately followed by related (or possibly unrelated) articles. The initial launch of the scroll model made recommendations based on content alone, independent of user behavior. Given the advantages of word and document embedding models and the particularities of Reuters.com content, the system was designed to use document vectors to to determine article similarity. Being unsupervised, document vectors need some supervised learning assistance if being used in a production system. James Dreiss discusses the development of the supervised topic filtering model that sits on top of the document vector model, as well as additional filtering strategies. Measuring performance of word and document vectors is notoriously difficult, but some heuristics have been developed. James offers a brief overview of measuring word and document vector performance and explains how he ultimately tackled the problem. James also details how he tested a pet theory that users would want diversity in content, especially given the wall-to-wall coverage of certain subjects, such as Donald Trump, and shares the results of serving both similarly and dissimilarly related content to users. James concludes by covering the cookie-based personalization system that was later implemented for content recommendation on article scrolls, including test results comparing the two systems.

Document vectors in the wild: Building a content recommendation system for Reuters.com

Machine learning for nonstationary streaming data using Structured Streaming and StreamDM

Building deep reinforcement learning applications on BigDL and Spark

Deep learning-based search and recommendation systems using TensorFlow

How Komatsu is improving mining efficiencies using the IoT and machine learning

Improving patient screening by applying predictive analytics to electronic medical records.

Lightning Talk: Artificial Intelligence the Next Digital Wave for Telcos