Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 22, 2024
Date Accepted: May 5, 2025

The final, peer-reviewed published version of this preprint can be found here:

Large Language Model–Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Evaluation Study

Huang J, Lai H, Zhao W, Xia D, Bai C, Liu J, Liu J, Pan B, Tian J, Ge L

Large Language Model–Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Evaluation Study

J Med Internet Res 2025;27:e70450

DOI: 10.2196/70450

PMID: 40554779

PMCID: 12238788

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

LLM-Assisted Evidence Synthesis: Assessing Risk of Bias of Randomized Controlled Trials with RoB 2

  • Jiajie Huang; 
  • Honghao Lai; 
  • Weilong Zhao; 
  • Danni Xia; 
  • Chunyang Bai; 
  • Jianing Liu; 
  • Jiayi Liu; 
  • Bei Pan; 
  • Jinhui Tian; 
  • Long Ge

ABSTRACT

Background:

The revised version of the risk of bias tool (RoB 2) overcomes some limitations compared with the original version, but concurrently introduces challenges in its application. Large language models (LLMs) may potentially assist the utilization of RoB 2. However, the exact methods and reliability remain uncertain

Objective:

This feasibility study aims investigate the capability of large language models in assessing the ROB of RCTs with RoB 2

Methods:

We systematically searched Cochrane reviews utilizing the RoB 2. Cochrane reviews were classified based on interested in adherence to intervention or assignment to intervention. From each category, 23 RCTs were randomly selected. The RoB 2 judgments reported the Cochrane reviews were used as the external validation standard. Three experienced reviewers were recruited to assess risk of bias of selected 46 RCTs using RoB 2. The reviewer judgments of six randomized controlled trials were selected to develop and optimize the prompt for the LLMs. The remaining 40 trials were used to establish the internal validation standard. Accuracy rate was calculated to reflect accuracy, both domain and signaling question; consistent assessment rate and Cohen’ κ were calculated to gauge consistency; and assessment time was calculated to measure efficiency.

Results:

Compared to Cochrane reviews, the LLMs' judgments demonstrated accuracy rates of 57.5% and 70% for Overall (assignment) and Overall (adhering), respectively. When compared to reviewer judgments, the LLMs' accuracy rates for Overall (assignment) and Overall (adhering) were 65% and 70.0%. The average accuracy rates for the remaining six domains were 65.2% (95% CI, 57.6%-72.7%) and 74.2% (95% CI, 64.7%-83.9%) when compared to Cochrane reviews and reviewers. The average accuracy rate in signaling level was 83.2% (95%CI: 77.5%-88.9%), consistent assessment rate is 85.2% (95% CI: 85.15%-88.79%). Compared to Reviewers, the LLMs conducted assessments 29.6 minutes (95% CI: 25.6-33.6) faster.

Conclusions:

LLMs were capable of rapidly assessing the risk of bias in RCTs using RoB 2, and exhibit a comparatively high level of accuracy. This suggests the potential utility of employing LLMs as adjunctive tools in the systematic review process.


 Citation

Please cite as:

Huang J, Lai H, Zhao W, Xia D, Bai C, Liu J, Liu J, Pan B, Tian J, Ge L

Large Language Model–Assisted Risk-of-Bias Assessment in Randomized Controlled Trials Using the Revised Risk-of-Bias Tool: Evaluation Study

J Med Internet Res 2025;27:e70450

DOI: 10.2196/70450

PMID: 40554779

PMCID: 12238788

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.