JMIR Preprints #90547: Evaluation of Large Language Models for Structured Data Extraction from Interstitial Lung Disease Clinical Notes: A Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluation of Large Language Models for Structured Data Extraction from Interstitial Lung Disease Clinical Notes: A Comparative Study

Stephanie Ji Chen;
Manoj Venkat Maddali;
Curtis Langlotz;
Christian Bluethgen;
Jonathan Chen;
Rishi Raj

ABSTRACT

Background:

Most clinically relevant data is contained in unstructured text within clinical notes. Clinical notes are prone to verbosity and imprecision, making structured data extraction a major bottleneck and a costly endeavor when screening patients for studies, or creating and maintaining healthcare registries or databases

Objective:

We aim to compare the performance of various large language models (LLMs) for structured data extraction from unstructured interstitial lung disease (ILD) clinic notes. Our primary aim evaluated LLM extraction of binary structured data from clinical notes. A secondary analysis evaluated select LLMs for extraction of multi-class data.

Methods:

We used 12 different LLMs to extract binary answers to 10 ILD clinical questions from clinic notes for 100 ILD clinic patients. We additionally used 2 LLMs to extract multi-class data regarding ILD classification. Ground truth was established by consensus among three ILD physicians. LLM performance was evaluated using accuracy, precision, recall, and F1 scores.

Results:

LLMs processed each clinical note-prompt combination in 1-2 seconds, at an estimated cost of less than $0.02 for each note-prompt combination. Of the 12 LLMs assessed, Claude 3.5 Sonnet (Anthropic, San Francisco), GPT-4, GPT-4o-mini, GPT-4o, o1, o1-mini, o3-mini, gpt-oss-20b, and gpt-oss-120b (OpenAI, San Francisco) consistently achieved high accuracy, similar to that of the three ILD clinicians (96.2%). Multi-class data extraction demonstrated lower accuracy than binary data extraction.

Conclusions:

Multiple LLMs consistently achieved human level accuracy in extracting structured binary data from ILD clinical notes, while being orders of magnitude faster and cheaper. LLMs are promising tools that can be used for clinical data extraction to improve clinical research efficiency. Clinical Trial: None

Citation

Please cite as:

Chen SJ, Maddali MV, Langlotz C, Bluethgen C, Chen J, Raj R

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study

J Med Internet Res 2026;28:e90547

DOI: 10.2196/90547

PMID: 42361337

PMCID: 13354945

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 30, 2025

Date Accepted: Jun 9, 2026

Evaluation of Large Language Models for Structured Data Extraction from Interstitial Lung Disease Clinical Notes: A Comparative Study

ABSTRACT

Citation

Copyright