JMIR Preprints #78289: Comparing the Accuracy of Large Language Model Responses versus Internet Searches to Common Questions About GLP1RA Therapy: An Exploratory Simulation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Comparing the Accuracy of Large Language Model Responses versus Internet Searches to Common Questions About GLP1RA Therapy: An Exploratory Simulation Study

Sarah Ying Tse Tan;
Gerald Gui Ren Sng;
Phong Ching Lee

ABSTRACT

Background:

Novel glucagon-like peptide 1 receptor agonists (GLP1RAs) for obesity treatment have generated much dialogue on digital media platforms. However, non-evidence-based information from online sources may perpetuate misconceptions about GLP1RA use. A promising new digital avenue for patient education is large language models (LLMs), which could potentially be used as an alternative to clarify questions about GLP1RA therapy.

Objective:

This study compared LLM (ChatGPT 4o) and internet (Google) search responses to simulated questions about GLP1RA therapy.

Methods:

Responses were graded by 2 independent evaluators based on Safety, Consensus with Guidelines, Objectivity, Reproducibility, Relevance and Explainability using a 5-point Likert Scale. Mean scores were compared using independent T-test. Qualitative observations were recorded.

Results:

LLM responses had significantly higher mean scores than Internet responses in the "objectivity" (3.91 ± 0.63 vs 3.36 ± 0.80, p=0.038) and "reproducibility" (3.85 ± 0.49 vs 3.00 ± 0.97, p=0.007) categories. There was no significant difference in the mean scores in "safety", "consensus", “relevance” and 'explainability". However, LLM responses lacked updated information pertaining to more contemporary concerns surrounding GLP1RA use such as the impact on fertility and mental health.

Conclusions:

The study highlights the importance of healthcare provider communication, as both LLM and internet searches have limitations and may perpetuate misconceptions about GLP1RAs.

Citation

Please cite as:

Tan SYT, Sng GGR, Lee PC

Accuracy of Large Language Model Responses Versus Internet Searches for Common Questions About Glucagon-Like Peptide-1 Receptor Agonist Therapy: Exploratory Simulation Study

JMIR Form Res 2025;9:e78289

DOI: 10.2196/78289

PMID: 41284989

PMCID: 12643393

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: May 29, 2025

Date Accepted: Nov 6, 2025

Comparing the Accuracy of Large Language Model Responses versus Internet Searches to Common Questions About GLP1RA Therapy: An Exploratory Simulation Study

ABSTRACT

Citation

Copyright