Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 17, 2024
Date Accepted: Apr 22, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
The Use of Machine Learning in Real-World Data: A Systematic Review of Disease Prediction and Management
ABSTRACT
Background:
Machine learning (ML) and big data analytics are revolutionizing healthcare, particularly in disease prediction, management, and personalized care. With vast amounts of real-world data (RWD) from sources like electronic health records (EHRs), patient registries, and wearable devices, ML offers significant potential to improve clinical outcomes. However, data quality, transparency, and clinical integration challenges remain.
Objective:
This study aims to systematically review the use of ML in real-world data for disease prediction and management, identifying the most common ML methods, disease types, study designs, and sources of real-world evidence (RWE).
Methods:
A systematic review followed the PRISMA guidelines to identify studies that utilized machine learning methods for analyzing real-world data in disease prediction and management. The review focused on extracting data related to the machine learning algorithms used, disease categories, types of studies, and sources of RWE, such as electronic health records (EHRs), patient registries, and wearable devices.
Results:
The systematic review revealed that the most frequently employed machine learning methods were Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM). These methods were applied across various disease categories, with cardiovascular diseases, cancers, and neurological disorders being the most common. Real-world evidence primarily originated from EHRs, patient registries, and wearable devices, with a predominant focus on predictive modeling to improve clinical outcomes.
Conclusions:
ML and big data hold significant promise for enhancing healthcare through better disease prediction and management. However, data quality, model interpretability, and generalizability must be addressed to integrate ML models fully into clinical practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.