November 27, 2019

249 words 2 mins read

Deep learning for domain-specific entity extraction from unstructured text

Deep learning for domain-specific entity extraction from unstructured text

Mohamed AbdelHady and Zoran Dzunic demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained on a Spark cluster using millions of PubMed abstracts and then used as features to train a LSTM recurrent neural network for entity extraction.

Talk Title Deep learning for domain-specific entity extraction from unstructured text
Speakers Mohamed AbdelHady (Microsoft), Zoran Dzunic (Microsoft)
Conference Strata Data Conference
Conf Tag Big Data Expo
Location San Jose, California
Date March 6-8, 2018
URL Talk Page
Slides Talk Slides
Video

Biomedical named entity recognition is a critical step for complex biomedical NLP tasks such as understanding the interactions between different entity types, such as the drug-disease relationship or the gene-protein relationship. Feature generation for such tasks is often complex and time consuming. However, neural networks can obviate the need for feature engineering and use original data as input. Mohamed AbdelHady and Zoran Dzunic demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained with word2vec learning algorithm on a Spark cluster using millions of Medline PubMed abstracts and then used as features to train a LSTM recurrent neural network for entity extraction, using Keras with TensorFlow or CNTK on a GPU-enabled Azure Data Science Virtual Machine (DSVM). Results show that training a domain-specific word embedding model boosts performance when compared to embeddings trained on generic data such as Google News.

comments powered by Disqus