Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 7, 2025
Date Accepted: Oct 14, 2025
Date Submitted to PubMed: Oct 14, 2025

The final, peer-reviewed published version of this preprint can be found here:

Critical Appraisal Tools for Evaluating Artificial Intelligence in Clinical Studies: Scoping Review

Cabello JB, Ruiz Garcia V, Torralba M, Maldonado Fernandez M, Ubeda MdM, Ansuategui E, Ramos-Ruperto L, Emparanza JI, Urreta I, Iglesias MT, Pijoan JI, Burls A

Critical Appraisal Tools for Evaluating Artificial Intelligence in Clinical Studies: Scoping Review

J Med Internet Res 2025;27:e77110

DOI: 10.2196/77110

PMID: 41359958

PMCID: 12685289

CRITICAL APPRAISAL TOOLS FOR ARTIFICIAL INTELLIGENCE CLINICAL STUDIES: A SCOPING REVIEW.

  • Juan B Cabello; 
  • Vicente Ruiz Garcia; 
  • Miguel Torralba; 
  • Miguel Maldonado Fernandez; 
  • Maria del Mar Ubeda; 
  • Eukene Ansuategui; 
  • Luis Ramos-Ruperto; 
  • Jose I Emparanza; 
  • Iratxe Urreta; 
  • Maria Teresa Iglesias; 
  • Jose I Pijoan; 
  • Amanda Burls

ABSTRACT

Background:

Health research that uses predictive and/or generative AI is rapidly growing. Just as in traditional clinical studies, the way in which AI studies are conducted can introduce systematic errors. Transmission of this AI evidence into clinical practice and research needs critical appraisal tools for clinical decision makers and researchers.

Objective:

To identify existing tools for critical appraisal of clinical studies that use artificial intelligence (AI) and examine the concepts and domains these tools explore.

Methods:

Inclusion criteria in PCC framework P: (population) Artificial intelligence clinical studies. C (Concept): tools for critical appraisal and associated constructs such as: quality, reporting, validity, risk of bias, and applicability. C (context): in clinical practice context. In addition, bias classification and Chatbot assessment studies were included. We searched in medical and engineering databases (MEDLINE, EMBASE, CINAHL, PsycINFO and IEEE). We included clinical primary research with tools for critical appraisal. Classic reviews and systematic reviews were included in first phase of screening. They were excluded in the secondary phase, after identifying new tools by forward snowballing. We excluded non-human, computer and mathematical research, and letters, opinion papers and editorials. We used Rayyan for screening. Data extraction was done by two observers and discrepancies were solved by discussion. The protocol was previously registered in OSF (https://doi.org/10.17605/OSF.IO/ETYDS). We adhered to the PRISMA extension for Scoping reviews and to the PRISMA-Search extension for Reporting Literature in Systematic Reviews.

Results:

We retrieved 4393 unique records for screening. After excluding 3803 records, 119 were selected for full-text screening. From these, 59 were excluded. After inclusion of 10 studies via other methods, a total of 70 records were finally included. 46 of them were reporting guidelines (15 tools for critical appraisal, 2 for quality of study and 2 for risk of bias). Nine papers ware focused on bias classification or mitigation. We found 15 Chatbots assessment studies or systematic reviews of Chatbots studies (6 and 9 respectively) which are a very heterogeneous group.

Conclusions:

The results picture a landscape of the evidence tools where reporting tools predominate, followed by critical appraisal and risk of bias tools, and few tools for risk of bias. The mismatch of bias in AI and epidemiology should be considered for critical appraisal, especially regarding fairness and the mitigation bias in the AI. Finally, Chatbot assessment studies is a vast and evolving field in which progress in design, reporting and critical appraisal is necessary and urgent. Clinical Trial: https://doi.org/10.17605/OSF.IO/ETYDS


 Citation

Please cite as:

Cabello JB, Ruiz Garcia V, Torralba M, Maldonado Fernandez M, Ubeda MdM, Ansuategui E, Ramos-Ruperto L, Emparanza JI, Urreta I, Iglesias MT, Pijoan JI, Burls A

Critical Appraisal Tools for Evaluating Artificial Intelligence in Clinical Studies: Scoping Review

J Med Internet Res 2025;27:e77110

DOI: 10.2196/77110

PMID: 41359958

PMCID: 12685289

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.