Enhanced medical data extraction: leveraging LLMs for accurate retrieval of patient information from medical reports
ABSTRACT
Background:
The digital transformation of healthcare has introduced both opportunities and challenges, particularly in managing and analyzing the vast amounts of unstructured medical data generated daily. There is a need to explore the feasibility of generative solutions in extracting data from medical reports, categorized by specific criteria.
Objective:
This study investigates the application of Large Language Models (LLMs) for the automated extraction of structured information from unstructured medical reports, employing the LangChain framework in Python.
Methods:
Through a systematic evaluation of leading LLMs—GPT-4o, LLaMA 3, LLaMA 3.1, Gemma 2, Qwen 2, and Qwen 2.5—using zero-shot prompting techniques and embedding results into a vector database, the research assesses their performance in extracting patient demographics, diagnostic details, and pharmacological data.
Results:
Evaluation metrics, including accuracy, precision, recall, and F1 scores, revealed high efficacy across most categories, with GPT-4o achieving the highest overall performance (91.4% accuracy).
Conclusions:
The findings highlight notable differences in precision and recall between models, particularly in extracting names and age-related information. Challenges in processing unstructured medical text, including variability in model performance across data types, are discussed. The study demonstrates the feasibility of integrating LLMs into healthcare workflows, offering significant improvements in data accessibility and supporting clinical decision-making processes. Additionally, it explores the role of retrieval-augmented generation (RAG) techniques in enhancing information retrieval accuracy, addressing issues such as hallucinations and outdated data in LLM outputs. Future work emphasizes the need for optimization through larger and more diverse training datasets, advanced prompting strategies, and the integration of domain-specific knowledge to improve model generalizability and precision.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.