Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 12, 2024
Date Accepted: Apr 28, 2025

The final, peer-reviewed published version of this preprint can be found here:

Large Language Models in Randomized Controlled Trials Design: Observational Study

Jin L, Ong JCL, Elangovan K, Ke Y, Pyle A, Ting DSW, Liu N

Large Language Models in Randomized Controlled Trials Design: Observational Study

J Med Internet Res 2025;27:e67469

DOI: 10.2196/67469

PMID: 41055064

PMCID: 12407223

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Large Language Models in Randomized Controlled Trials Design

  • Liyuan Jin; 
  • Jasmine Chiat Ling Ong; 
  • Kabilan Elangovan; 
  • Yuhe Ke; 
  • Alexandra Pyle; 
  • Daniel Shu Wei Ting; 
  • Nan Liu

ABSTRACT

Background:

Randomized controlled trials (RCTs) face challenges such as limited generalizability, insufficient recruitment diversity, and high failure rates, often due to restrictive eligibility criteria and inefficient patient selection. Large language models (LLMs) have shown promise in various clinical tasks, but their potential role in RCT design remains underexplored.

Objective:

This study investigates the ability of LLMs, specifically GPT-4-Turbo-Preview, to assist in designing RCTs that enhance generalizability, recruitment diversity, and reduce failure rates, while maintaining clinical safety and ethical standards.

Methods:

We conducted a non-interventional, observational study analyzing 20 parallel-arm RCTs, comprising 10 completed and 10 ongoing studies published after January 2024 to mitigate pretraining biases. The LLM was tasked with generating RCT designs based on input criteria, including eligibility, recruitment strategies, interventions, and outcomes. The accuracy of LLM-generated designs was quantitatively assessed by comparing them to clinically validated ground truth data from ClinicalTrials.gov. Qualitative assessments were performed using Likert scale ratings (1–3) for domains such as safety, accuracy, objectivity, pragmatism, inclusivity, and diversity.

Results:

The LLM achieved an overall accuracy of 72% in replicating RCT designs. Recruitment and intervention designs demonstrated high agreement with the ground truth, achieving 88% and 93% accuracy, respectively. However, LLMs showed lower accuracy in designing eligibility criteria (55%) and outcomes measurement (53%). Qualitative evaluations showed that LLM-generated designs scored above 2 points across all domains, indicating strong clinical alignment. In particular, LLMs enhanced diversity and pragmatism, which are key factors in improving RCT generalizability and addressing failure rates.

Conclusions:

LLMs, such as GPT-4-Turbo-Preview, have demonstrated potential in improving RCT design, particularly in recruitment and intervention planning, while enhancing generalizability and addressing diversity. However, expert oversight and regulatory measures are essential to ensure patient safety and ethical standards. The findings support further integration of LLMs into clinical trial design, although continued refinement is necessary to address limitations in eligibility and outcomes measurement.


 Citation

Please cite as:

Jin L, Ong JCL, Elangovan K, Ke Y, Pyle A, Ting DSW, Liu N

Large Language Models in Randomized Controlled Trials Design: Observational Study

J Med Internet Res 2025;27:e67469

DOI: 10.2196/67469

PMID: 41055064

PMCID: 12407223

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.