JMIR Preprints #78221: Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Feasibility Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Feasibility Analysis

Carole Koechli;
Fabio Dennstädt;
Christina Schröder;
Daniel M. Aebersold;
Robert Förster;
Daniel R. Zwahlen;
Paul Windisch

ABSTRACT

Background:

Randomized controlled trials (RCTs) are the gold standard for evaluating interventions in oncology, but reporting can be subject to "spin" - presenting results in ways that mislead readers about true efficacy.

Objective:

To investigate if large language models (LLMs) could provide a standardized approach to detect spin, particularly in conclusions where it most commonly occurs.

Methods:

We randomly sampled 250 two-arm, single primary endpoint oncology RCTs from seven major medical journals published between 2005-2023. Two authors independently annotated trials as positive or negative based on whether they met their primary endpoint. Three commercial LLMs (GPT-3.5 Turbo, GPT-4o, and o1) were tasked with classifying trials as positive or negative when provided with: 1) conclusion only, 2) methods and conclusion, 3) methods, results, and conclusion, or 4) title and full abstract. LLM performance was evaluated against human annotations.

Results:

Of the 250 trials, 58.4% were positive and 41.6% negative. The o1 model demonstrated the highest performance across all conditions with F1 scores of 0.932 (conclusion only), 0.96 (methods and conclusion), 0.98 (methods, results, and conclusion), and 0.97 (title and abstract). Analysis of trials incorrectly classified as positive when the model was provided only with conclusions revealed shared patterns including absence of primary endpoint results, emphasis on subgroup improvements, or unclear distinction between primary and secondary endpoints.

Conclusions:

LLMs can effectively detect potential spin in oncology RCT reporting by identifying discrepancies between how trials are presented in conclusions versus full abstracts. This approach could serve as a supplementary tool for improving transparency in scientific reporting, though further development is needed to address more complex trial designs beyond those examined in this feasibility study.

Citation

Please cite as:

Koechli C, Dennstädt F, Schröder C, Aebersold DM, Förster R, Zwahlen DR, Windisch P

Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Comparative Analysis of GPT Models and Prompts

JMIR Cancer 2026;12:e78221

DOI: 10.2196/78221

PMID: 41564336

PMCID: 12823016

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Cancer

Date Submitted: May 28, 2025

Date Accepted: Dec 16, 2025

Large Language Models for Supporting Clear Writing and Detecting Spin in Randomized Controlled Trials in Oncology: Feasibility Analysis

ABSTRACT

Citation

Copyright