Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 17, 2024
Date Accepted: Apr 22, 2025

The final, peer-reviewed published version of this preprint can be found here:

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

Alhumaidi NH, Dermawan D, Kamaruzaman HF, Alotaiq N

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

JMIR Med Inform 2025;13:e68898

DOI: 10.2196/68898

PMID: 40537090

PMCID: 12226786

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

The Use of Machine Learning in Real-World Data: A Systematic Review of Disease Prediction and Management

  • Norah Hamad Alhumaidi; 
  • Doni Dermawan; 
  • Hanin Farhana Kamaruzaman; 
  • Nasser Alotaiq

ABSTRACT

Background:

Machine learning (ML) and big data analytics are revolutionizing healthcare, particularly in disease prediction, management, and personalized care. With vast amounts of real-world data (RWD) from sources like electronic health records (EHRs), patient registries, and wearable devices, ML offers significant potential to improve clinical outcomes. However, data quality, transparency, and clinical integration challenges remain.

Objective:

This study aims to systematically review the use of ML in real-world data for disease prediction and management, identifying the most common ML methods, disease types, study designs, and sources of real-world evidence (RWE).

Methods:

A systematic review followed the PRISMA guidelines to identify studies that utilized machine learning methods for analyzing real-world data in disease prediction and management. The review focused on extracting data related to the machine learning algorithms used, disease categories, types of studies, and sources of RWE, such as electronic health records (EHRs), patient registries, and wearable devices.

Results:

The systematic review revealed that the most frequently employed machine learning methods were Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). These methods were applied across various disease categories, with cardiovascular diseases, cancers, and neurological disorders being the most common. Real-world evidence primarily originated from EHRs, patient registries, and wearable devices, with a predominant focus on predictive modeling to improve clinical outcomes.

Conclusions:

ML and big data hold significant promise for enhancing healthcare through better disease prediction and management. However, data quality, model interpretability, and generalizability must be addressed to integrate ML models fully into clinical practice.


 Citation

Please cite as:

Alhumaidi NH, Dermawan D, Kamaruzaman HF, Alotaiq N

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review

JMIR Med Inform 2025;13:e68898

DOI: 10.2196/68898

PMID: 40537090

PMCID: 12226786

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.