JMIR Preprints #90724: ASA Physical Status Classification: A Hybrid Machine Learning-Large Language Model Ensemble Retrospective Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

ASA Physical Status Classification: A Hybrid Machine Learning-Large Language Model Ensemble Retrospective Validation Study

Christabelle Pabalan;
Alvin Stewart;
Gabrielle Fisher;
Andrew Fisher

ABSTRACT

Background:

The American Society of Anesthesiologists Physical Status (ASA-PS) Classification System fundamentally shapes perioperative care delivery but suffers from poor inter-rater reliability (0.4-0.6). Machine learning (ML) models process structured data consistently but lack clinical reasoning, while large language models (LLMs) provide explanations but may miss subtle patterns in structured data.

Objective:

This study aimed to develop and evaluate a parallel ML-LLM ensemble that combines the complementary strengths of both approaches for automated ASA-PS classification.

Methods:

We retrospectively analyzed 2,500 adult surgical encounters from the University of Arkansas for Medical Sciences (UAMS) between August 2024 and May 2025. Cases were randomly allocated to training (n=2,000) and test sets (n=500). We developed multiple architectures including traditional ML models (Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), ExtraTrees), an LLM-only baseline (Generative Pre-trained Transformer-4o, GPT-4o), and hybrid approaches. The parallel ensemble processed structured data through XGBoost and unstructured clinical notes through GPT-4o independently, with outputs combined via weighted averaging. Model performance was evaluated using macro-F1 score, exact match accuracy, and within-one-class accuracy.

Results:

A theoretical performance ceiling of macro-F1=0.59 was determined through evaluation by an expert panel of three board-certified anesthesiologists who independently rated 50 patient charts for ASA-PS scores. The parallel ensemble (α=0.30) achieved the highest macro-F1 score of 0.58, with 67% exact match accuracy and 98.4% within-one-class accuracy. This outperformed traditional ML models (XGBoost: F1=0.34), the LLM-only baseline (F1=0.64 but with potential overfitting), and sequential hybrid approaches (F1=0.41-0.46). The LLM component generated explanations detailing comorbidities, severity descriptors, and functional status indicators.

Conclusions:

The parallel ML-LLM ensemble achieved performance approaching the theoretical ceiling established by human inter-rater reliability while providing interpretable clinical explanations. The 98.4% within-one-class accuracy ensures operational safety by minimizing extreme misclassifications. This approach demonstrates how complementary AI architectures can enhance perioperative risk assessment, particularly valuable given current healthcare workforce shortages. Clinical Trial: N/A

Citation

Please cite as:

Pabalan C, Stewart A, Fisher G, Fisher A

ASA Physical Status Classification: A Hybrid Machine Learning-Large Language Model Ensemble Retrospective Validation Study

JMIR Preprints. 17/01/2026:90724

DOI: 10.2196/preprints.90724

URL: https://preprints.jmir.org/preprint/90724

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jan 17, 2026

ASA Physical Status Classification: A Hybrid Machine Learning-Large Language Model Ensemble Retrospective Validation Study

ABSTRACT

Citation

Copyright