Unlocking unstructured text data with summarization
Our ability to extract meaning from unstructured text data has not kept pace with our ability to produce and store it, but recent breakthroughs in recurrent neural networks are allowing us to make exciting progress in computer understanding of language. Building on these new ideas, Michael Williams explores three ways to summarize text and presents prototype products for each approach.
|Talk Title||Unlocking unstructured text data with summarization|
|Conference||Strata + Hadoop World|
|Conf Tag||Make Data Work|
|Location||New York, New York|
|Date||September 27-29, 2016|
We’ve seen significant progress in infrastructure for using data effectively in the last half-decade. But this hasn’t applied to all types of data equally. Unstructured text, in particular, has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. Rather than being limited by what we can collect, we are now constrained by the tools, time, and techniques to make good use of it. But we are beginning to gain the ability to do remarkable things with unstructured text data. Michael Williams explores text summarization—taking text in and returning a shorter document that contains the same information—covering both single document and multidocument summarization. Michael demonstrates ways to solve the summarization problem that range from extremely simple algorithms that date back to the 1950s to the latest recurrent neural networks, explains how to choose between these approaches, and shows working prototype products for each. Summarizing tens or hundreds of thousands of articles at once represents an entirely new capability. But this capability is a solution to a bigger problem: it’s a gateway to quantified representations of text. The breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to the semantic meaning of text are poised to transform all the ways in which computers process language.