JMIR Preprints #92079: Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

Nana Kofi Sarpong Morgan;
Patrick Annan-Noonoo

ABSTRACT

Background:

Health facilities globally face increasing operational pressure from rising Communicable and Non-Communicable disease burdens, with low- and middle-income countries experiencing the greatest challenges. To improve operational efficiency, the timely identification of healthcare use patterns and recurring care needs is essential.

Objective:

This study aimed to develop machine learning (ML) models that predict (1) patient revisits within 30, 90, and 180 days and (2) the most likely diagnosis at revisit, using longitudinal national health insurance scheme (NHIS) claims data from a medical facility in Ghana.

Methods:

We conducted a retrospective cohort study using electronic health records (EHR) spanning January 2015 to August 2025. The analytical dataset comprised 111,488 visits from 34,486 unique patients. We compared five machine learning approaches: logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and TabM (a recent parameter-efficient ensemble architecture for tabular data). Patient-level data splitting prevented information leakage between training and evaluation sets. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, and top-3 accuracy for multiclass disease prediction (31-54 categories depending on horizon). Feature importance was assessed using Shapley Additive exPlanations (SHAP) analysis for XGBoost and permutation importance for TabM.

Results:

For revisit prediction, TabM achieved the highest AUC-ROC across all horizons (0.891 at 30 days, 0.942 at 90 days, 0.973 at 180 days), followed closely by XGBoost (0.884, 0.927, 0.964). Disease prediction proved more challenging given the multiclass nature of the task; TabM achieved the highest top-3 accuracy (0.420 at 30 days, 0.626 at 90 days, and 0.635 at 180 days) and standard accuracy for 90 and 180 days, respectively (0.494 and 0.492), while XGBoost achieved the highest AUC-ROC (0.666, 0.710, and 0.690). Feature importance analysis revealed that clinical visit pattern features (total visits, visit frequency) dominated revisit prediction, while demographic features (age) and current diagnosis drove disease prediction.

Conclusions:

Machine learning models using NHIS claims data can effectively predict hospital revisits and narrow diagnostic possibilities to clinically useful shortlists in a resource-limited hospital setting. TabM, a recent tabular deep learning architecture, has demonstrated competitive or superior performance compared to gradient boosting methods, challenging assumptions about the limitations of neural networks on tabular healthcare data. These findings support the feasibility of deploying predictive analytics in Sub-Saharan African health systems with modest data infrastructure.

Citation

Please cite as:

Morgan NKS, Annan-Noonoo P

Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

JMIR Preprints. 23/01/2026:92079

DOI: 10.2196/preprints.92079

URL: https://preprints.jmir.org/preprint/92079

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: JMIR Medical Informatics (no longer under consideration since Feb 25, 2026)

Date Submitted: Jan 23, 2026

Open Peer Review Period: Feb 3, 2026 - Feb 25, 2026

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

ABSTRACT

Citation

Copyright