Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Interactive Journal of Medical Research

Date Submitted: Nov 19, 2023
Date Accepted: Jan 26, 2024
Date Submitted to PubMed: Jan 26, 2024

The final, peer-reviewed published version of this preprint can be found here:

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review

Sallam M, Barakat M, Sallam M

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review

Interact J Med Res 2024;13:e54704

DOI: 10.2196/54704

PMID: 38276872

PMCID: 10905357

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

METRICS: Establishing a Preliminary Checklist to Standardize Design and Reporting of Artificial Intelligence-Based Studies in Healthcare

  • Malik Sallam; 
  • Muna Barakat; 
  • Mohammed Sallam

ABSTRACT

Background:

Adherence to evidence-based practice is indispensable in healthcare. Recently, the utility of artificial intelligence (AI)-based models in healthcare has been evaluated extensively. However, the lack of consensus guidelines for design and reporting of findings in these studies pose challenges to interpretation and synthesis of evidence.

Objective:

To propose a preliminary framework forming the basis of comprehensive guidelines to standardize reporting of AI-based studies in healthcare education and practice.

Methods:

A systematic literature review was conducted on Scopus, PubMed, and Google Scholar. The published records with “ChatGPT”, “Bing”, or “Bard” in the title were retrieved. Careful examination of the methodologies employed in the included records was conducted to identify the common pertinent themes and gaps in reporting. Panel discussion followed to establish a unified and thorough checklist for reporting. Testing of the finalized checklist on the included records was done by two independent raters with Cohen’s κ as the method to evaluate the inter-rater reliability.

Results:

The final dataset that formed the basis for pertinent theme identification and analysis comprised a total of 34 records. The finalized checklist included nine pertinent themes collectively referred to as “METRICS”: (1) Model used and its exact settings; (2) Evaluation approach for the generated content; (3) Timing of testing the model; (4) Transparency of the data source; (5) Range of tested topics; (6) Randomization of selecting the queries; (7) Individual factors in selecting the queries and inter-rater reliability; (8) Count of queries executed to test the model; (9) Specificity of the prompts and language used. The overall mean METRICS score was 3.0±0.58. The tested METRICS score was acceptable by the range of Cohen’s κ of 0.558–0.962 (P<.001 for the nine tested items). Classified per item, the highest average METRICS score was recorded for the “Model” item, followed by “Specificity of the prompts and language used” item, while the lowest scores were recorded for the “Randomization of selecting the queries” item classified as sub-optimal and “Individual factors in selecting the queries and inter-rater reliability” item classified as satisfactory.

Conclusions:

The findings highlighted the need for standardized reporting algorithms for AI-based studies in healthcare based on variability observed in methodologies and reporting. The proposed METRICS checklist could be the preliminary helpful step to establish a universally accepted approach to standardize reporting in AI-based studies in healthcare, a swiftly evolving research topic.


 Citation

Please cite as:

Sallam M, Barakat M, Sallam M

A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review

Interact J Med Res 2024;13:e54704

DOI: 10.2196/54704

PMID: 38276872

PMCID: 10905357

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.