Currently accepted at: JMIR Formative Research
Date Submitted: Dec 5, 2023
Open Peer Review Period: Dec 5, 2023 - Jan 31, 2024
Date Accepted: Jan 16, 2026
(closed for review but you can still tweet)
This paper has been accepted and is currently in production.
It will appear shortly on 10.2196/55127
The final accepted version (not copyedited yet) is in this tab.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
RoBuster: A Corpus Annotated with Risk of Bias Text Spans in Randomized Controlled Trials
ABSTRACT
Background:
Background:
Risk of bias (RoB) assessment of randomized clinical trials (RCTs) is vital to answering systematic review questions accurately. Manual RoB assessment for hundreds of RCTs is a cognitively demanding and lengthy process. Automation has the potential to assist reviewers in rapidly identifying text descriptions in RCTs that indicate potential risks of bias. However, no RoB text span annotated corpus could be used to fine-tune or evaluate large language models (LLMs), and there are no established guidelines for annotating the RoB spans in RCTs.
Objective:
Objective:
The revised Cochrane RoB Assessment 2 (RoB 2) tool provides comprehensive guidelines for RoB assessment; however, due to the inherent subjectivity of this tool, it cannot be directly used as RoB annotation guidelines. Our objective was to develop precise RoB text span annotation instructions that could address this subjectivity and thus aid the corpus annotation.
Methods:
Methods:
We leveraged RoB 2 guidelines to develop visual instructional placards that serve as text annotation guidelines for RoB spans and risk judgments. Expert annotators employed these visual placards to annotate a dataset named RoBuster, consisting of 41 full-text RCTs from the domains of physiotherapy and rehabilitation. We report inter-annotator agreement (IAA) between two expert annotators for text span annotations before and after applying visual instructions on a subset (9 out of 41) of RoBuster. We also provide IAA on bias risk judgments using Cohen's Kappa. Moreover, we utilized a portion of RoBuster (10 out of 41) to evaluate an LLM using a straightforward evaluation framework. This evaluation aimed to gauge the performance of LLM (here GPT 3.5) in the challenging task of RoB span extraction and demonstrate the utility of this corpus using a straightforward evaluation framework.
Results:
Results:
We present a corpus of 41 RCTs with fine-grained text span annotations comprising more than 28,427 tokens belonging to 22 RoB classes. The IAA at the text span level calculated using the F1 measure varies from 0% to 90%, while Cohen's kappa for risk judgments ranges between -0.235 and 1.0. Employing visual instructions for annotation increases the IAA by more than 17 percent points. LLM (GPT-3.5) shows promising but varied observed agreements with the expert annotation across the different bias questions.
Conclusions:
Conclusions:
Despite having comprehensive bias assessment guidelines and visual instructional placards, RoB annotation remains a complex task. Utilizing visual placards for bias assessment and annotation enhances IAA compared to cases where visual placards are absent; however, text annotation remains challenging for the subjective questions and the questions for which annotation data is unavailable in RCTs. Similarly, while GPT-3.5 demonstrates effectiveness, its accuracy diminishes with more subjective RoB questions and low information availability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.