Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Previously submitted to: JMIR Medical Informatics (no longer under consideration since Feb 25, 2026)

Date Submitted: Jan 23, 2026
Open Peer Review Period: Feb 3, 2026 - Feb 25, 2026
(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).

Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.

Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).

Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.

Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.

Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

  • Nana Kofi Sarpong Morgan; 
  • Patrick Annan-Noonoo

ABSTRACT

Background:

Health facilities globally face increasing operational pressure from rising Communicable and Non-Communicable disease burdens, with low- and middle-income countries experiencing the greatest challenges. To improve operational efficiency, the timely identification of healthcare use patterns and recurring care needs is essential.

Objective:

This study aimed to develop machine learning (ML) models that predict (1) patient revisits within 30, 90, and 180 days and (2) the most likely diagnosis at revisit, using longitudinal national health insurance scheme (NHIS) claims data from a medical facility in Ghana.

Methods:

We conducted a retrospective cohort study using electronic health records (EHR) spanning January 2015 to August 2025. The analytical dataset comprised 111,488 visits from 34,486 unique patients. We compared five machine learning approaches: logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and TabM (a recent parameter-efficient ensemble architecture for tabular data). Patient-level data splitting prevented information leakage between training and evaluation sets. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, and top-3 accuracy for multiclass disease prediction (31-54 categories depending on horizon). Feature importance was assessed using Shapley Additive exPlanations (SHAP) analysis for XGBoost and permutation importance for TabM.

Results:

For revisit prediction, TabM achieved the highest AUC-ROC across all horizons (0.891 at 30 days, 0.942 at 90 days, 0.973 at 180 days), followed closely by XGBoost (0.884, 0.927, 0.964). Disease prediction proved more challenging given the multiclass nature of the task; TabM achieved the highest top-3 accuracy (0.420 at 30 days, 0.626 at 90 days, and 0.635 at 180 days) and standard accuracy for 90 and 180 days, respectively (0.494 and 0.492), while XGBoost achieved the highest AUC-ROC (0.666, 0.710, and 0.690). Feature importance analysis revealed that clinical visit pattern features (total visits, visit frequency) dominated revisit prediction, while demographic features (age) and current diagnosis drove disease prediction.

Conclusions:

Machine learning models using NHIS claims data can effectively predict hospital revisits and narrow diagnostic possibilities to clinically useful shortlists in a resource-limited hospital setting. TabM, a recent tabular deep learning architecture, has demonstrated competitive or superior performance compared to gradient boosting methods, challenging assumptions about the limitations of neural networks on tabular healthcare data. These findings support the feasibility of deploying predictive analytics in Sub-Saharan African health systems with modest data infrastructure.


 Citation

Please cite as:

Morgan NKS, Annan-Noonoo P

Machine Learning for Predicting Patient Revisits and Future Diagnoses Using Electronic Health Claims Data: A Retrospective Cohort Study from Ghana

JMIR Preprints. 23/01/2026:92079

DOI: 10.2196/preprints.92079

URL: https://preprints.jmir.org/preprint/92079

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.