January 3, 2020

368 words 2 mins read

AI within O'Reilly Media

AI within O'Reilly Media

Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video.

Talk Title AI within O'Reilly Media
Speakers Paco Nathan (derwen.ai)
Conference Strata + Hadoop World
Conf Tag Make Data Work
Location Singapore
Date December 6-8, 2016
URL Talk Page
Slides Talk Slides
Video

Paco Nathan explains how O’Reilly Media employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Paco offers an overview of AI resources available through O’Reilly Media before taking a detailed look at how O’Reilly itself has undergone a transformation to leverage AI and deep learning both for customer needs and to augment editors’ work in curation. The foundation of this work centers on O’Reilly’s ontology aka its knowledge graph, which complements what deep learning can provide. That graph describes the semantics of O’Reilly’s content areas, its audience interactions, vendor and sponsor relations, etc. One lesson that was quickly learned was the importance of maintaining integrity between the human-scale ontology graph and the large-scale data products produced by ML automation. Two open source projects support this work: PyTextRank, which builds atop spaCy, NetworkX, and datasketch for graph-based NLP, and nbtransom, which enables people and machines to collaborate on ML pipelines that support “human-in-the-loop” as a design pattern for management using Project Jupyter. Some of these experiences at O’Reilly are relatively unique, since the company’s content includes many different publishers (all on Safari) and across a broad range of disciplines and content types, served to thousands of enterprise organizations. Overall, this work reflects recent major changes in industry away from “reference” content, with substantially more emphasis now placed on learning—that is, less about topics and keywords and more about job roles and skills.

comments powered by Disqus