Currently accepted at: Journal of Medical Internet Research
Date Submitted: May 27, 2025
Open Peer Review Period: May 27, 2025 - Jul 22, 2025
Date Accepted: Mar 31, 2026
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/78168
The final accepted version (not copyedited yet) is in this tab.
Evaluation Frameworks for Clinical Artificial Intelligence: A Scoping Review of Validation Strategies, Real-World Applicability and Ethical Principles.
ABSTRACT
Background:
Artificial intelligence is increasingly integrated into clinical practice to enhance decision-making, diagnosis, and patient care. However, the diversity and complexity of AI-based clinical decision support systems demand rigorous methodological and ethical evaluation to ensure their safety, effectiveness, and equity in real-world healthcare settings.
Objective:
To identify, characterize, and critically analyze existing evaluation systems and methodological frameworks used to assess AI models in clinical practice, with a focus on technical performance, clinical applicability, and bioethical considerations.
Methods:
We conducted a systematic review following PRISMA guidelines. We searched PubMed/MEDLINE, Embase, Scopus, and Web of Science from January 2013 to April 2024. We included studies describing evaluation frameworks or systems designed to assess AI-based clinical decision support systems in real-world clinical contexts. Data extraction included methodological characteristics, validation approaches, performance metrics, and ethical dimensions. The included frameworks were mapped and analyzed across five domains: validation strategy, reporting standards, clinical applicability, healthcare system integration, and ethical criteria.
Results:
A total of 24 articles were included. Most frameworks emphasized technical validation and performance metrics (e.g., accuracy, AUC), with fewer addressing prospective or external validation. Only a minority incorporated real-world implementation strategies or ethical dimensions such as transparency, equity, or patient autonomy. Regulatory guidance (e.g., from FDA or EU AI Act) was inconsistently referenced. Common gaps included lack of standardized outcome measures and insufficient stakeholder engagement, particularly from patients and healthcare providers.
Conclusions:
Current evaluation systems for AI models in clinical practice are heterogeneous and often incomplete, with limited emphasis on ethical and health systems integration. There is a critical need for standardized, multidimensional frameworks that encompass technical rigor, clinical relevance, and ethical accountability. A comprehensive and integrative approach is essential to ensure the safe, effective, and equitable deployment of AI in healthcare. Clinical Trial: PROSPERO ID 1019640
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.