Accepted for/Published in: JMIR Formative Research
Date Submitted: Feb 3, 2026
Open Peer Review Period: Feb 3, 2026 - Feb 6, 2026
Date Accepted: May 28, 2026
Date Submitted to PubMed: Jun 2, 2026
(closed for review but you can still tweet)
Improving models to predict care utilization using machine learning: a retrospective observational study
ABSTRACT
Background:
Use of artificial intelligence (AI) and machine learning (ML) tools is now ubiquitous in the advancement of healthcare services and clinical risk estimation. Legacy systems make use of highly informative feature sets, developed from years of clinical expertise and research to estimate different outcomes but only recently have been tested against novel statistical approaches. One such system, the Johns Hopkins Adjusted Clinical Groups (ACG) system, is a longstanding and widely used approach to categorizing clinical risk factors and amenable to ML techniques.
Objective:
This study aims to test the ACG system using a contrasted AUROC and F1 classification optimization strategy and compare performance against traditional logistic regression methods. Assuming selected ML algorithms can be tuned to enhance overall measures of performance, it would enhance arguments for incorporating them into ACG-related workflows.
Methods:
Using a retrospective observational design, prospective year estimates of all cause hospitalization and elevated total cost were modeled using a cross-validation framework. Hyperparameter settings for XGBoost, random forest and elastic net were discovered using average cross validated performances for F1 and area under receiver operating characteristic (AUROC) in a grid search for maximizing either statistic.
Results:
There were 350,463 patients selected in 2019 from the Johns Hopkins Healthcare System. Model features indicated by the ACG system for adult populations in predicting prospective year hospitalization and total cost were included in these analyses. Findings suggest small but statistically significant improvement to cross-validated AUROC and F1 over logistic regression, using either optimization strategy and XGBoost. The clinical implications of these findings and effect of class imbalance on model calibration are explored with limitations of these data and approach.
Conclusions:
Logistic regression remains very well suited to these tasks, especially in situations where efficiency or interpretability of models is critical. Nevertheless, the findings also underscore a diversity of suitable models depending on clinical use cases each having their own tradeoffs for evaluating performance. As such, there is no concise answer to whether these approaches improved model performance over regression-based tools.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.