Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 9, 2022
Date Accepted: Jul 6, 2022
Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the EHR: a natural language processing approach
ABSTRACT
Background:
Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is a labor-intensive process.
Objective:
Develop and assess the effectiveness of using natural language processing (NLP) to enhance the identification of patients who meet genetic testing criteria for hereditary cancers using family health history data in the electronic health record (EHR). A rule-based algorithm using structured data alone was compared with itself augmented with NLP.
Methods:
Algorithms were developed based on National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast/ovarian and colorectal cancer. The NLP-augmented algorithm used both structured family health history data and associated free-text comments. The algorithms were compared against a reference standard of 200 patients with family health history in the EHR.
Results:
In terms of identifying the reference standard patients meeting NCCN criteria, the NLP-augmented algorithm compared to the structured data algorithm yielded significantly higher recall of 0.95 (95% CI 0.90-0.99) vs. 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) vs. 0.81 (95% CI 0.65-0.95). On the whole dataset, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients being found to meet the NCCN criteria.
Conclusions:
Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR improved both precision and recall, when identifying patients meeting NCCN criteria for genetic testing for hereditary breast/ovarian and colorectal cancers.
Citation
Request queued. Please wait while the file is being generated. It may take some time.