Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Oct 3, 2023
Date Accepted: Nov 17, 2024

The final, peer-reviewed published version of this preprint can be found here:

Guideline-Incorporated Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

Venkataramani V, Schubert MC, Wick W

Guideline-Incorporated Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

JMIR Form Res 2025;9:e53335

DOI: 10.2196/53335

PMID: 40272831

PMCID: 12045122

Guideline-incorporated Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

  • Varun Venkataramani; 
  • Marc Cicero Schubert; 
  • Wolfgang Wick

ABSTRACT

Large Language Models (LLMs) have been utilized across a multitude of applications, demonstrating enormous potential in processing and comprehending complex datasets in healthcare. One area yet to be thoroughly explored is the application of LLMs for the reliable and reproducible evaluation of medical documents. Automatic evaluation of these documents, if achieved effectively, has the potential to improve healthcare, enhance patient safety, reduce the risk of cognitive and other biases, and refine the training process of LLMs. Importantly, it is essential that the system's reasoning process is a) transparent and comprehensible to human evaluators such as a checklist completion, and b) is guided by established medical guidelines proven to increase patient safety and the gold standard for implementing clinical care, thereby elevating the overall performance and applicability of AI-driven healthcare. In this study, we introduce a framework which is based on a multi-step approach for medical record evaluation that incorporates guidelines directly into the evaluation process, a concept we term 'guideline-in-the-loop'. Our proposed algorithm, named MedCheckLLM, is an LLM-driven structured, layered reasoning mechanism designed to automate the evaluation of medical records, with a particular emphasis on the evaluation against evidence-based guidelines. Crucially, the guidelines are deterministally accessed by the LLM as out-of-training data. This rigorous separation of LLM and guidelines is expected to lead to increased validity and interpretability of the evaluations and offers flexibility for updating guidelines. The primary objective of this research is to introduce the conceptual framework and assess its feasibility. This approach is expected to have significant implications on healthcare quality and the transparent and efficient application of LLMs in clinical settings.


 Citation

Please cite as:

Venkataramani V, Schubert MC, Wick W

Guideline-Incorporated Large Language Model-Driven Evaluation of Medical Records Using MedCheckLLM

JMIR Form Res 2025;9:e53335

DOI: 10.2196/53335

PMID: 40272831

PMCID: 12045122

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.