Chargrid: Understanding 2D documents
Anoop Katti explores the shortcomings of the existing techniques for understanding 2D documents and offers an overview of the Character Grid (Chargrid), a new processing pipeline pioneered by data scientists at SAP.
|Talk Title||Chargrid: Understanding 2D documents|
|Speakers||Anoop Katti (SAP)|
|Conference||O’Reilly Artificial Intelligence Conference|
|Conf Tag||Put AI to Work|
|Location||New York, New York|
|Date||April 16-18, 2019|
Textual information is often represented through structured documents, which have an inherent 2D structure—particularly with the advent of new types of media and communications such as presentations, websites, blogs, and formatted notebooks. In such documents, the layout, positioning, and sizing might be crucial to understanding its semantic content and provide strong guidance for the human perception. Natural language processing (NLP) addresses the task of processing and understanding plain text. However, it processes text by serializing it, completely ignoring any 2D structure in the text. On the other hand, computer vision (CV) may be used to process document images, retaining the structure but learning the document semantics from the image pixels. Anoop Katti explores the shortcomings of the existing techniques for understanding 2D documents and offers an overview of the Character Grid (Chargrid), a new processing pipeline pioneered by data scientists at SAP that retains the original 2D structure while directly encoding the characters in the text. The Character Grid representation can readily be used with deep neural networks, for example. Anoop applies Chargrid to the task of information extraction from invoices to show how it captures the best of both NLP and CV. Chargrid is accepted for presentation at EMNLP 2018 and is also deployed in the production system of SAP Concur, currently processing tens of thousands of invoices every month.