Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 20, 2023
Open Peer Review Period: Mar 20, 2023 - May 15, 2023
Date Accepted: Nov 17, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review

Zrubka Z, Kertész G, Gulácsi L, Czere J, Hölgyesi , Nezhad HM, Mosavi A, Kovács L, Butte AJ, Péntek M

The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review

J Med Internet Res 2024;26:e47430

DOI: 10.2196/47430

PMID: 38241075

PMCID: 10837761

Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: A Systematic Review

  • Zsombor Zrubka; 
  • Gábor Kertész; 
  • László Gulácsi; 
  • János Czere; 
  • Áron Hölgyesi; 
  • Hossein Motahari Nezhad; 
  • Amir Mosavi; 
  • Levente Kovács; 
  • Atul J Butte; 
  • Márta Péntek

ABSTRACT

Background:

Pediatric diabetes is among the leading areas that adopt digital technologies and intelligent devices. However, there has been growing concern about the transparency, replicabiliby, biasedness and overall validity of research in the field of medical artificial intelligence and machine learning.

Objective:

To systematically review the reporting quality of machine learning (ML) studies on pediatric diabetes mellitus.

Methods:

PubMed and Web of Science databases were searched for 2016-2020. Studies were included if use of ML was reported in 2–18-year-old children with diabetes mellitus, including studies of complications, screening studies and in silico samples. In studies following the ML workflow of training, validation, and evaluation of results, reporting quality was assessed via the Minimum Information about CLinical Artificial Intelligence Modelling (MI-CLAIM) checklist by pairs of independent reviewers. Positive answers to the 17 binary items about sufficient reporting were qualitatively summarized and counted as a proxy measure of reporting quality. Synthesis of results included correlation analysis of reporting quality with date of publication, and testing differences by data type, subjects (human or in silico), research goals, level of code sharing, and the scientific field of publications (medial or engineering) was evaluated via ANOVA or t-tests. The association of MI-CLAIM ratings with expert judgements of clinical impact and reproducibility was also tested.

Results:

After screening 1043 records 28 studies were included. The sample size of training cohorts ranged between 5-561. Six studies featured only in silico patients. 61 unique ML techniques were applied. Reporting quality was low with great variation among the 21 assessed studies. The number of items with sufficient reporting ranged between 4-12 (mean 7.43). The research questions and data characterization were most often, while patient characteristics and model examination were least often reported adequately. Reporting quality improved over time (r=0.50, P=.02), it was higher than average in prognostic biomarker and risk factor studies (P=.04) and lower in non-invasive hypoglycemia detection studies (P=.006), higher in studies published in medical vs. engineering journals (P=.004), and higher in studies sharing any code of the ML pipeline vs. no sharing (P=.003). Studies with human or in silico samples or using various data types did not differ in reporting quality. The association between expert judgements and MI-CLAIM ratings was not significant.

Conclusions:

The reporting quality of ML studies in the pediatric population with diabetes was generally low. Important details for clinicians, such as patient characteristics, comparison to the state-of-the-art solution and model examination for valid, unbiased, and robust results were often the weak points of reporting. To allow the assessment of their clinical utility, the reporting standards of machine learning studies must evolve, and algorithms for this challenging population must become more transparent and replicable.


 Citation

Please cite as:

Zrubka Z, Kertész G, Gulácsi L, Czere J, Hölgyesi , Nezhad HM, Mosavi A, Kovács L, Butte AJ, Péntek M

The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review

J Med Internet Res 2024;26:e47430

DOI: 10.2196/47430

PMID: 38241075

PMCID: 10837761

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.