Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 27, 2025
Open Peer Review Period: Mar 27, 2025 - Apr 11, 2025
Date Accepted: May 12, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study

Chen YH, Ruan SJ, Chen Pf

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study

J Med Internet Res 2025;27:e75052

DOI: 10.2196/75052

PMID: 40460423

PMCID: 12174870

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study

  • Ying-Hao Chen; 
  • Shanq-Jang Ruan; 
  • Pei-fu Chen

ABSTRACT

Background:

Accurately assessing perioperative risk is critical for informed surgical planning and patient safety. However, current prediction models often rely solely on structured data and overlook the nuanced clinical reasoning embedded in free-text preoperative notes. Recent advances in large language models (LLMs) have opened new opportunities for harnessing unstructured clinical data, yet their application in perioperative prediction remains limited by concerns about factual accuracy. Retrieval-augmented generation (RAG) offers a promising solution—enhancing LLM performance by grounding outputs in authoritative medical sources, potentially improving both predictive accuracy and clinical interpretability.

Objective:

This study aimed to investigate whether integrating LLMs with RAG can improve the prediction of 30-day postoperative mortality and American Society of Anesthesiologists physical status classification using unstructured preoperative clinical notes.

Methods:

We conducted a retrospective cohort study using over 24,491 medical records from a tertiary medical center, including preoperative anesthesia assessments, discharge summaries, and surgical information. To extract clinical insights from free-text data, we employed the LLaMA 3.1-8B language model with retrieval-augmented generation (RAG), using MedEmbed for text embedding and Miller’s Anesthesia as the primary retrieval source. We systematically evaluated model performance under various configurations—embedding models, chunk sizes, and few-shot prompting—using weighted area under the precision-recall curve (AUPRC) for mortality prediction and micro F1 score for American Society of Anesthesiologists (ASA) classification.

Results:

The LLaMA-RAG model consistently outperformed traditional machine learning baselines. For 30-day postoperative mortality, it achieved the highest AUROC of 0.9570 (95% CI 0.9543–0.9597) and AUPRC of 0.6536 (95% CI 0.6479–0.6593). For ASA classification, it attained the highest micro F1 score of 0.8409 (95% CI 0.8238–0.8551). Notably, the model demonstrated exceptional sensitivity in identifying rare but high-risk cases, such as ASA Class 5 patients and postoperative deaths.

Conclusions:

The LLaMA-RAG model significantly improved prediction of postoperative mortality and ASA classification, especially for rare high-risk cases. By grounding outputs in domain knowledge, retrieval-augmented generation enhanced both accuracy and interpretability.


 Citation

Please cite as:

Chen YH, Ruan SJ, Chen Pf

Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study

J Med Internet Res 2025;27:e75052

DOI: 10.2196/75052

PMID: 40460423

PMCID: 12174870

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.