Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 23, 2025
Date Accepted: Jun 18, 2025
Large Language Model Symptom Identification from Clinical Text: A Multi-Center Study
ABSTRACT
Background:
Recognizing patient symptoms is fundamental to medicine, research, and public health. However, symptoms are often underreported in coded formats despite being routinely documented in physician notes. Large language models (LLMs) could help bridge this gap by extracting symptoms through prompts based on expert annotation guidelines, mimicking human chart reviewers.
Objective:
We sought to evaluate the ability of LLMs to identify symptoms from clinical text and assess their generalizability across healthcare sites.
Methods:
Four LLMs were evaluated: GPT-4, GPT-3.5, Llama2, and Mixtral 8x7B. LLM prompts were engineered to follow chart review guidelines. We identified optimal prompting strategies for each model using a Development cohort (N=103) from Site 1. We compared model performances using a Test cohort (N=204) from Site 1. We evaluated the best model’s generalizability using a Validation cohort (N=308) from an independent Site 2.
Results:
For our Development cohort, each LLM outperformed ICD-10-based identification and our prior study of BERT-based NLP approaches. GPT-4 had highest tested accuracy, F1-score 91.4% vs. 45.1% for ICD-10. For our Validation cohort, GPT-4 performance was even higher with an F1-score of 94.0% vs 26.9% for ICD-10, a drop in performance across sites.
Conclusions:
LLMs outperformed ICD-10-based symptom identification and demonstrated superior generalizability across healthcare sites.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.