Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 19, 2026
Open Peer Review Period: Apr 2, 2026 - May 28, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Bias in Machine Learning Models in Healthcare – a systematic review of systematic reviews

  • Paul Oni; 
  • Thomas Rielage; 
  • Anne-Kathrin Altevogt; 
  • Oliver Schoeffski

ABSTRACT

Background:

Over the past decade, the application of machine learning (ML) models to solving a wide range of healthcare challenges has grown rapidly. However, the translation and integration of these models into widespread applications in real-world clinical settings remain limited due to persistent issues especially relating to bias.

Objective:

This systematic review of systematic reviews aims to identify the most common types of bias that impede the widespread applicability and generalizability of machine learning models in healthcare.

Methods:

Using a standardized search strategy relevant systematic reviews on the application of machine learning models in healthcare published between January 1, 2022, and June 1, 2025, were identified across three databases: Pubmed, Embase and Medline. Only studies that examined the application of machine learning, including its subfield of deep learning, and that employed standardized tools for risk of bias assessment were included. Additional inclusion criteria required that studies adhere to the PRISMA guidelines for systematic reviews and be published in English. Risk of bias was assessed using a tailored and modified version of the Assessment of Systematic Reviews tool (AMSTAR 2), while methodological quality was evaluated according to the Risk of Bias in Systematic Reviews (ROBIS) guidelines

Results:

In total, 729 abstracts were identified, of which 60 studies met the inclusion criteria. Twenty-seven reviews applied machine learning models to prognostic tasks, 20 to diagnostic tasks, and 6 addressed both diagnosic and prognostic applications. Additionally, three reviews focused on disease management and three on decision support, while one review examined the use of machine learning for surgical skill assessment. The risk of bias in the primary studies were appraised using PROBAST (n=33), QUADAS-2 (n=17), Cochrane Handbook for Systematic Review of Interventions (n=3), ROBINS-1 (n=2), Robvis (n=1), QUIPS (n=1), EPHPP (n=1), PROBAST&TRIPOD (n=1), QUADAS-2 & PROBAST (n=1). The most frequently identified sources of bias included lack of external validation, analytical bias (e.g., inadequate handling of missing data, insufficient sample size, lack of internal validation, and poor reporting of model calibration), participant selection bias, bias related to the reference standard or ground truth, feature selection bias, and issues related to model interpretability and algorithmic bias.

Conclusions:

Machine learning has the potential to improve healthcare. However, realizing this potential at scale requires addressing the aforementioned sources of bias and establishing standardized methodological guidelines for future studies. This is essential to improving the generalizability and acceptibility of ML models and to facilitating their safe integration into routine clinical practise.


 Citation

Please cite as:

Oni P, Rielage T, Altevogt AK, Schoeffski O

Bias in Machine Learning Models in Healthcare – a systematic review of systematic reviews

JMIR Preprints. 19/03/2026:95728

DOI: 10.2196/preprints.95728

URL: https://preprints.jmir.org/preprint/95728

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.