JMIR Preprints #69132: Identifying Asthma-related Symptoms from Electronic Health Records within a Large Integrated Healthcare System: A Hybrid Natural Language Processing Approach

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Identifying Asthma-related Symptoms from Electronic Health Records within a Large Integrated Healthcare System: A Hybrid Natural Language Processing Approach

Fagen Xie;
Robert S Zeiger;
Mary Saparudin;
Sahar Al-Salman;
Eric J Puttock;
WILLIAM CRAWFORD;
Michael Schatz;
Stanley Xu;
William M Vollmer;
Wansu Chen

ABSTRACT

Background:

Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in free text format. Methods that can effectively capture the asthma-related symptoms from the unstructured data are lacking.

Objective:

The study aims to develop a natural language process (NLP) algorithm and process to identify symptoms associated with asthma from clinical notes within a large integrated healthcare system.

Methods:

We used unstructured data within two years prior to asthma diagnosis visits in 2013-2018 and 2021-2022 to identify four common asthma-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with inputs from clinicians and chart review. A rule-based NLP algorithm was first iteratively developed and refined via multiple rounds of chart review followed by adjudication, and then transformer-based deep learning algorithms were developed and validated using the same manually annotated datasets. Subsequently, a hybrid algorithm was generated by combining the rule-based and the transformer-based algorithms. Finally, the developed algorithms were implemented in all the study notes.

Results:

A total of 11,374,552 eligible study clinical notes with 128,211,793 sentences were retrieved. At least one symptom was identified in 1,663,450 (1.30%) sentences and 858,350 (7.55%) notes, respectively. Cough had the highest frequency at both sentence (1.07%) and note (5.81%) levels while chest tightness had the lowest one at both sentence (0.11%) and note (0.57%) levels. The frequencies of concomitant symptoms ranged from 0.03% to 0.38% at the sentence level and 0.10% to 1.85% at the note level. The validation of the hybrid algorithm against the annotated result of 1,600 clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level, sensitivity ranged from 93.90% (dyspnea) to 95.95% (cough) at the sentence level and 96.00% (chest tightness) to 99.07% (cough) at the note level. The corresponding F1 scores of all four symptoms were > 0.95 at both sentence and note levels regardless of NLP algorithms.

Conclusions:

The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured notes. These algorithms could be utilized to examine asthma burden and prediction of asthma exacerbation.

Citation

Please cite as:

Xie F, Zeiger RS, Saparudin M, Al-Salman S, Puttock EJ, CRAWFORD W, Schatz M, Xu S, Vollmer WM, Chen W

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

JMIR AI 2025;4:e69132

DOI: 10.2196/69132

PMID: 40611521

PMCID: 12231518

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Nov 22, 2024

Date Accepted: Mar 15, 2025

Identifying Asthma-related Symptoms from Electronic Health Records within a Large Integrated Healthcare System: A Hybrid Natural Language Processing Approach

ABSTRACT

Citation

Copyright