Identifying Asthma-related Symptoms from Electronic Health Records within a Large Integrated Healthcare System: A Hybrid Natural Language Processing Approach
ABSTRACT
Background:
Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in free text format. Methods that can effectively capture the asthma-related symptoms from the unstructured data are lacking.
Objective:
The study aims to develop a natural language process (NLP) algorithm and process to identify symptoms associated with asthma from clinical notes within a large integrated healthcare system.
Methods:
We used unstructured data within two years prior to asthma diagnosis visits in 2013-2018 and 2021-2022 to identify four common asthma-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with inputs from clinicians and chart review. A rule-based NLP algorithm was first iteratively developed and refined via multiple rounds of chart review followed by adjudication, and then transformer-based deep learning algorithms were developed and validated using the same manually annotated datasets. Subsequently, a hybrid algorithm was generated by combining the rule-based and the transformer-based algorithms. Finally, the developed algorithms were implemented in all the study notes.
Results:
A total of 11,374,552 eligible study clinical notes with 128,211,793 sentences were retrieved. At least one symptom was identified in 1,663,450 (1.30%) sentences and 858,350 (7.55%) notes, respectively. Cough had the highest frequency at both sentence (1.07%) and note (5.81%) levels while chest tightness had the lowest one at both sentence (0.11%) and note (0.57%) levels. The frequencies of concomitant symptoms ranged from 0.03% to 0.38% at the sentence level and 0.10% to 1.85% at the note level. The validation of the hybrid algorithm against the annotated result of 1,600 clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level, sensitivity ranged from 93.90% (dyspnea) to 95.95% (cough) at the sentence level and 96.00% (chest tightness) to 99.07% (cough) at the note level. The corresponding F1 scores of all four symptoms were > 0.95 at both sentence and note levels regardless of NLP algorithms.
Conclusions:
The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured notes. These algorithms could be utilized to examine asthma burden and prediction of asthma exacerbation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.