Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 16, 2024
Date Accepted: Oct 21, 2024

The final, peer-reviewed published version of this preprint can be found here:

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis

Wals Zurita AJ, Miras del Rio H, Ugarte Ruiz de Aguirre N, Nebrera Navarro C, Rubio Jimenez M, Muñoz Carmona D, Miguez Sanchez C

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis

JMIR Med Inform 2025;13:e58457

DOI: 10.2196/58457

PMID: 39746191

PMCID: 11739723

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: A Study in Data Science and Health Informatics

  • Amadeo Jesus Wals Zurita; 
  • Hector Miras del Rio; 
  • Nerea Ugarte Ruiz de Aguirre; 
  • Cristina Nebrera Navarro; 
  • Maria Rubio Jimenez; 
  • David Muñoz Carmona; 
  • Carlos Miguez Sanchez

ABSTRACT

Background:

Clinical Natural Language Processing (cNLP), a subfield dedicated to the analysis of clinical texts within artificial intelligence, has experienced a significant development over the last decades. Recent advancements in computing power and algorithms have enabled its expanded application in oncological research.

Objective:

To explore the potential of Large Language Models (LLMs) to extract and structure information from free-text clinical reports, with a specific focus on identifying and classifying patient comorbidities in the electronic health records of oncology. We specifically evaluate the gpt-3.5-turbo- 1106 and gpt-4-1106-preview models in comparison with the capabilities of specialized human evaluators.

Methods:

We implemented a script using the OpenAI API to extract structured information in JSON format from comorbidities reported in 250 personal history reports. These reports were manually reviewed in batches of 50 by five specialists in radiation oncology. We compared the results using metrics such as Sensitivity, Specificity, Precision, Accuracy, F-value, Kappa index, and the McNemar test, in addition to examining the common causes of errors in both humans and GPT models.

Results:

The GPT-3.5 model exhibited slightly lower performance compared to physicians across all metrics, though the differences were not statistically significant. GPT-4 demonstrated clear superiority in several key metrics. Notably, it achieved a sensitivity of 96.8%, compared to 88.2% for GPT-3.5 and 88.8% for physicians. However, physicians marginally outperformed GPT-4 in precision (97.7% vs. 96.8%). GPT-4 showed greater consistency, replicating exact results in 76% of the reports after 10 analyses, in contrast to 59% for GPT-3.5. Physicians were more likely to miss explicit comorbidities, while the GPT models more frequently inferred non-explicit comorbidities, sometimes correctly, though this also resulted in more false positives.

Conclusions:

The studied LLMs, with carefully designed prompts, demonstrate competence comparable to that of medical specialists in interpreting clinical reports, even in complex and confusingly written texts. Considering also their superior efficiency in terms of time and costs, these models represent a preferable option over human analysis for data mining and structuring information in large collections of clinical reports.


 Citation

Please cite as:

Wals Zurita AJ, Miras del Rio H, Ugarte Ruiz de Aguirre N, Nebrera Navarro C, Rubio Jimenez M, Muñoz Carmona D, Miguez Sanchez C

The Transformative Potential of Large Language Models in Mining Electronic Health Records Data: Content Analysis

JMIR Med Inform 2025;13:e58457

DOI: 10.2196/58457

PMID: 39746191

PMCID: 11739723

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.