JMIR Preprints #89540: Classifying American Society of Anesthesiologists Physical Status With a Low-Rank Adapted Large Language Model: Development and Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Classifying American Society of Anesthesiologists Physical Status With a Low-Rank Adapted Large Language Model: Development and Validation Study

Min-Chia Chen;
Shanq-Jang Ruan;
Jo-Hsin Wu;
Pei-fu Chen

ABSTRACT

Background:

The American Society of Anesthesiologists Physical Status (ASA-PS) classification is integral to preoperative risk assessment, yet assignment remains subjective and labor-intensive. Recent large language models (LLMs) process free-text electronic health records (EHRs), but few studies have evaluated parameter-efficient adaptations that both predict ASA-PS and provide clinician-readable rationales. Low-rank adaptation (LoRA) is a parameter-efficient technique that updates only a small set of add-on parameters rather than the entire model, enabling efficient fine-tuning on modest data and hardware. A lightweight, instruction-tuned LLM with these capabilities could streamline workflow and broaden access to explainable decision support.

Objective:

We aimed to develop and evaluate a LoRA-fine-tuned LLaMA-3 model for ASA-PS classification from preoperative clinical narratives and benchmark it against traditional machine-learning classifiers and domain-specific LLMs.

Methods:

Preoperative anesthesia notes and discharge summaries were extracted from the EHR of a tertiary center and reformatted into an Alpaca-style instruction–response prompt, where the instruction requested the ASA-PS class and the response contained the ground-truth label (I–V) annotated by anesthesiologist. The LoRA-enhanced LLaMA-3 model was fine-tuned with mixed-precision training and evaluated on a held-out test set. Baselines included random forest classifier, XGBoost classifier, support-vector machine (SVM), fastText, BioBERT, ClinicalBERT (each fine-tuned on the same corpus), and the untuned LLaMA-3. Performance was assessed with micro-averaged F1-score and Matthews correlation coefficient (MCC), each with 95 % bootstrap confidence intervals CI.

Results:

The LoRA-LLaMA-3 model achieved an F1-score of 0.780 (CI 0.769–0.792) and an MCC of 0.533 (0.518–0.546), outperforming other LLM baselines. After fine-tuning, BioBERT reached an F1-score of 0.762 (0.750–0.774) and an MCC of 0.508 (0.494–0.522), whereas ClinicalBERT achieved an F1-score of 0.757 (0.745–0.769) and an MCC of 0.515 (0.501–0.529). fastText yielded an F1-score of 0.762 (0.750–0.774) and an MCC of 0.536 (0.522–0.550). The untuned LLaMA-3 performed poorly (F1-score 0.073, CI 0.066–0.081; MCC 0.002, CI 0.001–0.002). Among all models, XGBoost obtained the highest scores (F1-score 0.815, CI 0.804–0.826; MCC 0.613, CI 0.599–0.626). Ablation experiments identified dropout = 0.3, learning rate = 3 × 10⁻⁵, temperature = 0.1, and top-p = 0.1 as the optimal hyperparameter settings. The LoRA model also produced rationales that highlighted medically pertinent terms. Attention visualizations showed interactions among related phrase pairs and a focus on comorbidities, patient age, and time references.

Conclusions:

LoRA turned a general-purpose LLaMA-3 into an ASA-PS classifier that outperformed other language-model baselines and came close to the top traditional machine-learning model. In addition to predictive accuracy, LoRA-LLaMA-3 delivers concise, clinician-oriented explanations that make its decisions auditable. Because the approach reformats routine EHR narratives into instruction-response pairs and relies on lightweight parameter adaptation, it offers a practical, resource-efficient blueprint for introducing explainable LLMs to specialized clinical tasks.

Citation

Please cite as:

Chen MC, Ruan SJ, Wu JH, Chen Pf

Classifying American Society of Anesthesiologists Physical Status With a Low-Rank–Adapted Large Language Model: Development and Validation Study

J Med Internet Res 2026;28:e89540

DOI: 10.2196/89540

PMID: 42013456

PMCID: 13146231

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 14, 2025

Date Accepted: Mar 24, 2026

Classifying American Society of Anesthesiologists Physical Status With a Low-Rank Adapted Large Language Model: Development and Validation Study

ABSTRACT

Citation

Copyright