Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 28, 2020
Date Accepted: Nov 20, 2020

The final, peer-reviewed published version of this preprint can be found here:

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models

Yang X, Zhang H, He X, Bian J, Wu Y

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models

JMIR Med Inform 2020;8(12):e22982

DOI: 10.2196/22982

PMID: 33320104

PMCID: 7772072

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Extracting Family History of Patients from Clinical Narratives: Using Deep Learning Models

  • Xi Yang; 
  • Hansi Zhang; 
  • Xing He; 
  • Jiang Bian; 
  • Yonghui Wu

ABSTRACT

Background:

Patients’ family history (FH) is a critical risk factor associated with numerous diseases. However, FH information is not well captured in the structured database but often documented in clinical narratives. Natural language processing (NLP) is the key technology to extract patients’ FH from clinical narratives. In 2019, the National NLP Clinical Challenge (N2C2) organized shared tasks to solicit NLP methods for FH information extraction.

Objective:

This study presents our end-to-end FH extraction system developed during the 2019 N2C2 challenge as well as the new transformer-based models that we developed after the challenge.

Methods:

We developed deep learning-based systems for FH concept extraction and relation identification. We explored deep learning models including Long-short term memory – conditional random fields (LSTM-CRFs) and Bidirectional Encoder Representations from Transformers (BERT) as well as developed ensemble models using a majority voting strategy. To further optimize performance, we systematically compared three different strategies to use BERT output representations for relation identification.

Results:

Our system was among the top-ranked systems in the challenge. Our best system submitted during this challenge achieved micro-averaged F1-scores of 0.7944 and 0.6544 for concept extraction and relation identification, respectively. After challenge, we further explored new transformer-based models and improved the performances of both subtasks to 0.8249 and 0.6775, respectively. For relation identification, our system achieved a performance comparable to the best system (0.6810) reported in the challenge.

Conclusions:

This study demonstrated the feasibility of utilizing deep learning methods to extract family history information from clinical narratives automatically.


 Citation

Please cite as:

Yang X, Zhang H, He X, Bian J, Wu Y

Extracting Family History of Patients From Clinical Narratives: Exploring an End-to-End Solution With Deep Learning Models

JMIR Med Inform 2020;8(12):e22982

DOI: 10.2196/22982

PMID: 33320104

PMCID: 7772072

Per the author's request the PDF is not available.