JMIR Preprints #68776: Enhanced medical data extraction: leveraging LLMs for accurate retrieval of patient information from medical reports

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Enhanced medical data extraction: leveraging LLMs for accurate retrieval of patient information from medical reports

Angel Manuel Garcia-Carmona;
Maria-Lorena Prieto;
Enrique Puertas;
Juan-Jose Beunza

ABSTRACT

Background:

The digital transformation of healthcare has introduced both opportunities and challenges, particularly in managing and analyzing the vast amounts of unstructured medical data generated daily. There is a need to explore the feasibility of generative solutions in extracting data from medical reports, categorized by specific criteria.

Objective:

This study investigates the application of Large Language Models (LLMs) for the automated extraction of structured information from unstructured medical reports, employing the LangChain framework in Python.

Methods:

Through a systematic evaluation of leading LLMs—GPT-4o, LLaMA 3, LLaMA 3.1, Gemma 2, Qwen 2, and Qwen 2.5—using zero-shot prompting techniques and embedding results into a vector database, the research assesses their performance in extracting patient demographics, diagnostic details, and pharmacological data.

Results:

Evaluation metrics, including accuracy, precision, recall, and F1 scores, revealed high efficacy across most categories, with GPT-4o achieving the highest overall performance (91.4% accuracy).

Conclusions:

The findings highlight notable differences in precision and recall between models, particularly in extracting names and age-related information. Challenges in processing unstructured medical text, including variability in model performance across data types, are discussed. The study demonstrates the feasibility of integrating LLMs into healthcare workflows, offering significant improvements in data accessibility and supporting clinical decision-making processes. Additionally, it explores the role of retrieval-augmented generation (RAG) techniques in enhancing information retrieval accuracy, addressing issues such as hallucinations and outdated data in LLM outputs. Future work emphasizes the need for optimization through larger and more diverse training datasets, advanced prompting strategies, and the integration of domain-specific knowledge to improve model generalizability and precision.

Citation

Please cite as:

Garcia-Carmona AM, Prieto ML, Puertas E, Beunza JJ

Leveraging Large Language Models for Accurate Retrieval of Patient Information From Medical Reports: Systematic Evaluation Study

JMIR AI 2025;4:e68776

DOI: 10.2196/68776

PMID: 40608403

PMCID: 12271962

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Nov 14, 2024

Date Accepted: Apr 27, 2025

Enhanced medical data extraction: leveraging LLMs for accurate retrieval of patient information from medical reports

ABSTRACT

Citation

Copyright