JMIR Preprints #68898: The Use of Machine Learning in Real-World Data: A Systematic Review of Disease Prediction and Management

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

The Use of Machine Learning in Real-World Data: A Systematic Review of Disease Prediction and Management

Norah Hamad Alhumaidi;
Doni Dermawan;
Hanin Farhana Kamaruzaman;
Nasser Alotaiq

ABSTRACT

Background:

Machine learning (ML) and big data analytics are revolutionizing healthcare, particularly in disease prediction, management, and personalized care. With vast amounts of real-world data (RWD) from sources like electronic health records (EHRs), patient registries, and wearable devices, ML offers significant potential to improve clinical outcomes. However, data quality, transparency, and clinical integration challenges remain.

Objective:

This study aims to systematically review the use of ML in real-world data for disease prediction and management, identifying the most common ML methods, disease types, study designs, and sources of real-world evidence (RWE).

Methods:

A systematic review followed the PRISMA guidelines to identify studies that utilized machine learning methods for analyzing real-world data in disease prediction and management. The review focused on extracting data related to the machine learning algorithms used, disease categories, types of studies, and sources of RWE, such as electronic health records (EHRs), patient registries, and wearable devices.

Results:

The systematic review revealed that the most frequently employed machine learning methods were Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). These methods were applied across various disease categories, with cardiovascular diseases, cancers, and neurological disorders being the most common. Real-world evidence primarily originated from EHRs, patient registries, and wearable devices, with a predominant focus on predictive modeling to improve clinical outcomes.

Conclusions:

ML and big data hold significant promise for enhancing healthcare through better disease prediction and management. However, data quality, model interpretability, and generalizability must be addressed to integrate ML models fully into clinical practice.

Citation

Please cite as:

Alhumaidi NH, Dermawan D, Kamaruzaman HF, Alotaiq N

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

JMIR Med Inform 2025;13:e68898

DOI: 10.2196/68898

PMID: 40537090

PMCID: 12226786

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 17, 2024

Date Accepted: Apr 22, 2025

The Use of Machine Learning in Real-World Data: A Systematic Review of Disease Prediction and Management

ABSTRACT

Citation

Copyright