JMIR Preprints #92820: Improving models to predict care utilization using machine learning: a retrospective observational study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Improving models to predict care utilization using machine learning: a retrospective observational study

Christopher Kitchen;
Talan Zhang;
Klaus Lemke;
Chintan Pandya;
Hadi Kharrazi;
Jonathan Weiner

ABSTRACT

Background:

Use of artificial intelligence (AI) and machine learning (ML) tools is now ubiquitous in the advancement of healthcare services and clinical risk estimation. Legacy systems make use of highly informative feature sets, developed from years of clinical expertise and research to estimate different outcomes but only recently have been tested against novel statistical approaches. One such system, the Johns Hopkins Adjusted Clinical Groups (ACG) system, is a longstanding and widely used approach to categorizing clinical risk factors and amenable to ML techniques.

Objective:

This study aims to test the ACG system using a contrasted AUROC and F1 classification optimization strategy and compare performance against traditional logistic regression methods. Assuming selected ML algorithms can be tuned to enhance overall measures of performance, it would enhance arguments for incorporating them into ACG-related workflows.

Methods:

Using a retrospective observational design, prospective year estimates of all cause hospitalization and elevated total cost were modeled using a cross-validation framework. Hyperparameter settings for XGBoost, random forest and elastic net were discovered using average cross validated performances for F1 and area under receiver operating characteristic (AUROC) in a grid search for maximizing either statistic.

Results:

There were 350,463 patients selected in 2019 from the Johns Hopkins Healthcare System. Model features indicated by the ACG system for adult populations in predicting prospective year hospitalization and total cost were included in these analyses. Findings suggest small but statistically significant improvement to cross-validated AUROC and F1 over logistic regression, using either optimization strategy and XGBoost. The clinical implications of these findings and effect of class imbalance on model calibration are explored with limitations of these data and approach.

Conclusions:

Logistic regression remains very well suited to these tasks, especially in situations where efficiency or interpretability of models is critical. Nevertheless, the findings also underscore a diversity of suitable models depending on clinical use cases each having their own tradeoffs for evaluating performance. As such, there is no concise answer to whether these approaches improved model performance over regression-based tools.

Citation

Please cite as:

Kitchen C, Zhang T, Lemke K, Pandya C, Kharrazi H, Weiner J

Improving Models to Predict Care Utilization Using Machine Learning: Retrospective Observational Study

JMIR Form Res 2026;10:e92820

DOI: 10.2196/92820

PMID: 42228151

PMCID: 13308755

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Feb 3, 2026

Open Peer Review Period: Feb 3, 2026 - Feb 6, 2026

Date Accepted: May 28, 2026

Date Submitted to PubMed: Jun 2, 2026

(closed for review but you can still tweet)

Improving models to predict care utilization using machine learning: a retrospective observational study

ABSTRACT

Citation

Copyright