Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Feasibility Analysis
ABSTRACT
Background:
Randomized controlled trials (RCTs) are the gold standard for evaluating interventions in oncology, but reporting can be subject to "spin" - presenting results in ways that mislead readers about true efficacy.
Objective:
To investigate if large language models (LLMs) could provide a standardized approach to detect spin, particularly in conclusions where it most commonly occurs.
Methods:
We randomly sampled 250 two-arm, single primary endpoint oncology RCTs from seven major medical journals published between 2005-2023. Two authors independently annotated trials as positive or negative based on whether they met their primary endpoint. Three commercial LLMs (GPT-3.5 Turbo, GPT-4o, and o1) were tasked with classifying trials as positive or negative when provided with: 1) conclusion only, 2) methods and conclusion, 3) methods, results, and conclusion, or 4) title and full abstract. LLM performance was evaluated against human annotations.
Results:
Of the 250 trials, 58.4% were positive and 41.6% negative. The o1 model demonstrated the highest performance across all conditions with F1 scores of 0.932 (conclusion only), 0.96 (methods and conclusion), 0.98 (methods, results, and conclusion), and 0.97 (title and abstract). Analysis of trials incorrectly classified as positive when the model was provided only with conclusions revealed shared patterns including absence of primary endpoint results, emphasis on subgroup improvements, or unclear distinction between primary and secondary endpoints.
Conclusions:
LLMs can effectively detect potential spin in oncology RCT reporting by identifying discrepancies between how trials are presented in conclusions versus full abstracts. This approach could serve as a supplementary tool for improving transparency in scientific reporting, though further development is needed to address more complex trial designs beyond those examined in this feasibility study.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.