Accepted for/Published in: JMIR Research Protocols
Date Submitted: Dec 29, 2025
Date Accepted: Mar 3, 2026
Are LLM-based Chatbot Interventions Properly Controlled?: Protocol for a Methodological Review
ABSTRACT
Background:
Large language model (LLM)–based chatbots are rapidly being repurposed as patient-facing digital health tools. Their interactive, adaptive, and seemingly empathic behavior can heighten engagement and expectancy—nonspecific factors that complicate causal inference. Yet, comparator strategies in LLM trials are inconsistently defined and often under-matched (e.g., minimal education versus highly engaging chatbots), risking biased effect estimates and poor reproducibility.
Objective:
To systematically identify and categorize the control conditions used in interventional studies of LLM-based, patient-facing digital health interventions, and to evaluate their methodological appropriateness. Secondary aims are to describe variability by health domain and study design and to explore whether control type/quality relates to the direction of reported effects.
Methods:
This protocol follows PRISMA-P and is registered in PROSPERO. Eligible studies are interventional designs that evaluate LLM-based, patient-facing digital health interventions; any control condition is eligible (including no control, waitlist, treatment-as-usual, attention/education, active comparator, or sham digital control). We will search PubMed, PsycINFO, CENTRAL, CINAHL, and Scopus for records from January 1, 2023 onward. All records will be managed and screened in Rayyan by two independent reviewers. Dual, independent data extraction will target study context, intervention details, and control-arm characteristics (typology, rationale, matching to nonspecifics, blinding, reporting). No formal risk-of-bias assessments are planned aas the focus is on meta-research.
Results:
At submission, the protocol is registered in PROSPERO. Scoping searches are complete; full screening and extraction have not yet commenced.
Conclusions:
This review will provide an empirical map of control practices in LLM chatbot trials and guidance for designing better-matched comparators, supporting more valid and interpretable evaluations as LLMs diffuse into patient care. Clinical Trial: PROSPERO ID: CRD420251246148
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.