Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 9, 2022
Date Accepted: Jul 6, 2022

The final, peer-reviewed published version of this preprint can be found here:

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

Shi J, Morgan KL, Bradshaw RL, Jung SH, Kohlmann WK, Kaphingst KA, Kawamoto K, Fiol GD

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

JMIR Med Inform 2022;10(8):e37842

DOI: 10.2196/37842

PMID: 35969459

PMCID: 9412758

Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the EHR: a natural language processing approach

  • Jianlin Shi; 
  • Keaton L. Morgan; 
  • Richard L. Bradshaw; 
  • Se-Hee Jung; 
  • Wendy K. Kohlmann; 
  • Kimberly A. Kaphingst; 
  • Kensaku Kawamoto; 
  • Guilherme Del Fiol

ABSTRACT

Background:

Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is a labor-intensive process.

Objective:

Develop and assess the effectiveness of using natural language processing (NLP) to enhance the identification of patients who meet genetic testing criteria for hereditary cancers using family health history data in the electronic health record (EHR). A rule-based algorithm using structured data alone was compared with itself augmented with NLP.

Methods:

Algorithms were developed based on National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast/ovarian and colorectal cancer. The NLP-augmented algorithm used both structured family health history data and associated free-text comments. The algorithms were compared against a reference standard of 200 patients with family health history in the EHR.

Results:

In terms of identifying the reference standard patients meeting NCCN criteria, the NLP-augmented algorithm compared to the structured data algorithm yielded significantly higher recall of 0.95 (95% CI 0.90-0.99) vs. 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) vs. 0.81 (95% CI 0.65-0.95). On the whole dataset, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients being found to meet the NCCN criteria.

Conclusions:

Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR improved both precision and recall, when identifying patients meeting NCCN criteria for genetic testing for hereditary breast/ovarian and colorectal cancers.


 Citation

Please cite as:

Shi J, Morgan KL, Bradshaw RL, Jung SH, Kohlmann WK, Kaphingst KA, Kawamoto K, Fiol GD

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

JMIR Med Inform 2022;10(8):e37842

DOI: 10.2196/37842

PMID: 35969459

PMCID: 9412758

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.