Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 28, 2019
Open Peer Review Period: May 28, 2019 - Jun 4, 2019
Date Accepted: Oct 19, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study

Jiang M, Sanger T, Liu X

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study

JMIR Med Inform 2019;7(4):e14850

DOI: 10.2196/14850

PMID: 31719024

PMCID: 6913757

Combining contextualized embeddings and prior knowledge for clinical named entity recognition

  • Min Jiang; 
  • Todd Sanger; 
  • Xiong Liu

ABSTRACT

Background:

Named Entity Recognition (NER) is a key step in clinical natural language processing (NLP). Traditionally, ruled based systems leverage prior knowledge to define rules to identify named entity. Recently, deep learning based NER system become more and more popular and contextualized word embedding, as a new type of representation of the word, has been proposed to dynamically capture word sense using context information and proved successful in many deep-learning based systems in either general domain or medical domain. However, there are very few studies to investigate the effects of combining multiple contextualized embeddings and prior knowledge on the clinical NER task.

Objective:

The aim of the study is to improve the performance of named entity recognizer in the clinical text by combining contextual embedding and prior knowledge.

Methods:

In this study, we investigate the effects of combing multiple contextualized word embedding with classic word embedding in the deep neural networks to predict named entities in the clinical text. We also study if using semantic lexicon could further improve the performance of the clinical NER system.

Results:

As a result, by combining contextualized embeddings such as ELMO and Flair, using our model achieves the F-1 score of 87.30% only using a portion of the 2010 I2b2 NER task dataset. After incorporating the medical lexicon into the word embedding, the F-1 score is further increased to 87.44%. We also found that our model still could achieve the F-1 score of 85.36% when the size of the data is reduced to 40%.

Conclusions:

In conclusion, combined contextualized embedding could be beneficial for the clinical NER task. Moreover, the semantic lexicon could be used to further improve the performance of the clinical NER system. Clinical Trial: NA


 Citation

Please cite as:

Jiang M, Sanger T, Liu X

Combining Contextualized Embeddings and Prior Knowledge for Clinical Named Entity Recognition: Evaluation Study

JMIR Med Inform 2019;7(4):e14850

DOI: 10.2196/14850

PMID: 31719024

PMCID: 6913757

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.