JMIR Preprints #53367: Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection from Physician Notes: Retrospective Cohort Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection from Physician Notes: Retrospective Cohort Study

Andrew J McMurry;
Amy R Zipursky;
Alon Geva;
Karen L Olson;
James R Jones;
Vladimir Ignatov;
Timothy A Miller;
Kenneth D Mandl

ABSTRACT

Background:

Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.

Objective:

To validate and test an artificial intelligence (AI) based Natural Language Processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes.

Methods:

Subjects in this retrospective cohort study are patients 21 years old and younger, who presented to a pediatric emergency department (ED) at a large academic children’s hospital between March 1, 2020 and May 31, 2022. ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on CDC criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1=98.6; PPV=97.2; sensitivity=100.0). F1, PPV, and sensitivity were used to compare the performance of both NLP and ICD-10 to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras.

Results:

There were 85,678 ED encounters during the study period, 4.0% with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1=79.6) than ICD-10 codes (F1=45.1%). NLP accuracy was higher for positive symptoms (sensitivity=93%) than ICD-10 (sensitivity=30%). However, ICD-10 accuracy was higher for negative symptoms (specificity=99.4%) than NLP (specificity=91.7%). Congestion or runny nose showed the highest accuracy difference: NLP F1=82.8%, ICD-10 F1=4.2%. For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras.

Conclusions:

This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.

Citation

Please cite as:

McMurry AJ, Zipursky AR, Geva A, Olson KL, Jones JR, Ignatov V, Miller TA, Mandl KD

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study

J Med Internet Res 2024;26:e53367

DOI: 10.2196/53367

PMID: 38573752

PMCID: 11027052

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 6, 2023

Date Accepted: Feb 27, 2024

Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection from Physician Notes: Retrospective Cohort Study

ABSTRACT

Citation

Copyright