January 24, 2020

312 words 2 mins read

Document vectors in the wild: Building a content recommendation system for Reuters.com

Document vectors in the wild: Building a content recommendation system for Reuters.com

James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, Reuters.com. The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content.

Talk Title Document vectors in the wild: Building a content recommendation system for Reuters.com
Speakers James Dreiss (Reuters)
Conference Strata Data Conference
Conf Tag Make Data Work
Location New York, New York
Date September 11-13, 2018
URL Talk Page
Slides Talk Slides
Video

In the summer of 2017, Reuters.com embarked on an ambitious redesign of its article pages, specifically a scroll design in which articles that users request to read are immediately followed by related (or possibly unrelated) articles. The initial launch of the scroll model made recommendations based on content alone, independent of user behavior. Given the advantages of word and document embedding models and the particularities of Reuters.com content, the system was designed to use document vectors to to determine article similarity. Being unsupervised, document vectors need some supervised learning assistance if being used in a production system. James Dreiss discusses the development of the supervised topic filtering model that sits on top of the document vector model, as well as additional filtering strategies. Measuring performance of word and document vectors is notoriously difficult, but some heuristics have been developed. James offers a brief overview of measuring word and document vector performance and explains how he ultimately tackled the problem. James also details how he tested a pet theory that users would want diversity in content, especially given the wall-to-wall coverage of certain subjects, such as Donald Trump, and shares the results of serving both similarly and dissimilarly related content to users. James concludes by covering the cookie-based personalization system that was later implemented for content recommendation on article scrolls, including test results comparing the two systems.

comments powered by Disqus