Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Nov 22, 2024
Date Accepted: Mar 15, 2025

The final, peer-reviewed published version of this preprint can be found here:

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

Xie F, Zeiger RS, Saparudin M, Al-Salman S, Puttock EJ, CRAWFORD W, Schatz M, Xu S, Vollmer WM, Chen W

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

JMIR AI 2025;4:e69132

DOI: 10.2196/69132

PMID: 40611521

PMCID: 12231518

Identifying Asthma-related Symptoms from Electronic Health Records within a Large Integrated Healthcare System: A Hybrid Natural Language Processing Approach

  • Fagen Xie; 
  • Robert S Zeiger; 
  • Mary Saparudin; 
  • Sahar Al-Salman; 
  • Eric J Puttock; 
  • WILLIAM CRAWFORD; 
  • Michael Schatz; 
  • Stanley Xu; 
  • William M Vollmer; 
  • Wansu Chen

ABSTRACT

Background:

Asthma-related symptoms are significant predictors of asthma exacerbation. Most of these symptoms are documented in clinical notes in free text format. Methods that can effectively capture the asthma-related symptoms from the unstructured data are lacking.

Objective:

The study aims to develop a natural language process (NLP) algorithm and process to identify symptoms associated with asthma from clinical notes within a large integrated healthcare system.

Methods:

We used unstructured data within two years prior to asthma diagnosis visits in 2013-2018 and 2021-2022 to identify four common asthma-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with inputs from clinicians and chart review. A rule-based NLP algorithm was first iteratively developed and refined via multiple rounds of chart review followed by adjudication, and then transformer-based deep learning algorithms were developed and validated using the same manually annotated datasets. Subsequently, a hybrid algorithm was generated by combining the rule-based and the transformer-based algorithms. Finally, the developed algorithms were implemented in all the study notes.

Results:

A total of 11,374,552 eligible study clinical notes with 128,211,793 sentences were retrieved. At least one symptom was identified in 1,663,450 (1.30%) sentences and 858,350 (7.55%) notes, respectively. Cough had the highest frequency at both sentence (1.07%) and note (5.81%) levels while chest tightness had the lowest one at both sentence (0.11%) and note (0.57%) levels. The frequencies of concomitant symptoms ranged from 0.03% to 0.38% at the sentence level and 0.10% to 1.85% at the note level. The validation of the hybrid algorithm against the annotated result of 1,600 clinical notes yielded a positive predictive value ranging from 96.53% (wheezing) to 97.42% (chest tightness) at the sentence level and 96.76% (wheezing) to 97.42% (chest tightness) at the note level, sensitivity ranged from 93.90% (dyspnea) to 95.95% (cough) at the sentence level and 96.00% (chest tightness) to 99.07% (cough) at the note level. The corresponding F1 scores of all four symptoms were > 0.95 at both sentence and note levels regardless of NLP algorithms.

Conclusions:

The developed NLP algorithms could effectively capture asthma-related symptoms from unstructured notes. These algorithms could be utilized to examine asthma burden and prediction of asthma exacerbation.


 Citation

Please cite as:

Xie F, Zeiger RS, Saparudin M, Al-Salman S, Puttock EJ, CRAWFORD W, Schatz M, Xu S, Vollmer WM, Chen W

Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study

JMIR AI 2025;4:e69132

DOI: 10.2196/69132

PMID: 40611521

PMCID: 12231518

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.