JMIR Preprints #37842: Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the EHR: a natural language processing approach

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the EHR: a natural language processing approach

Jianlin Shi;
Keaton L. Morgan;
Richard L. Bradshaw;
Se-Hee Jung;
Wendy K. Kohlmann;
Kimberly A. Kaphingst;
Kensaku Kawamoto;
Guilherme Del Fiol

ABSTRACT

Background:

Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is a labor-intensive process.

Objective:

Develop and assess the effectiveness of using natural language processing (NLP) to enhance the identification of patients who meet genetic testing criteria for hereditary cancers using family health history data in the electronic health record (EHR). A rule-based algorithm using structured data alone was compared with itself augmented with NLP.

Methods:

Algorithms were developed based on National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast/ovarian and colorectal cancer. The NLP-augmented algorithm used both structured family health history data and associated free-text comments. The algorithms were compared against a reference standard of 200 patients with family health history in the EHR.

Results:

In terms of identifying the reference standard patients meeting NCCN criteria, the NLP-augmented algorithm compared to the structured data algorithm yielded significantly higher recall of 0.95 (95% CI 0.90-0.99) vs. 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) vs. 0.81 (95% CI 0.65-0.95). On the whole dataset, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients being found to meet the NCCN criteria.

Conclusions:

Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR improved both precision and recall, when identifying patients meeting NCCN criteria for genetic testing for hereditary breast/ovarian and colorectal cancers.

Citation

Please cite as:

Shi J, Morgan KL, Bradshaw RL, Jung SH, Kohlmann WK, Kaphingst KA, Kawamoto K, Fiol GD

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

JMIR Med Inform 2022;10(8):e37842

DOI: 10.2196/37842

PMID: 35969459

PMCID: 9412758

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 9, 2022

Date Accepted: Jul 6, 2022

Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the EHR: a natural language processing approach

ABSTRACT

Citation