Currently submitted to: JMIR Infodemiology
Date Submitted: May 13, 2026
Open Peer Review Period: May 26, 2026 - Jul 21, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Quality and Safety of YouTube Videos on GLP-1 Receptor Agonists and Reproductive Health: A Systematic Evaluation Using Validated Instruments and Artificial Intelligence
ABSTRACT
Background:
GLP-1 receptor agonists (GLP-1 RAs) — semaglutide, liraglutide, and tirzepatide — are among the most widely prescribed medications globally, with disproportionate uptake among women of reproductive age. The quality of YouTube content on GLP-1 RAs and reproductive health has not been previously characterized.
Objective:
This study aimed to evaluate the quality, accuracy, and misinformation burden of YouTube videos addressing GLP-1 RAs in a reproductive health context, and to assess the validity of large language model (LLM)-assisted quality scoring as a scalable surveillance tool.
Methods:
We conducted a PRISMA-compliant cross-sectional analysis of 137 YouTube videos retrieved via YouTube Data API v3 on 8 March 2026. Two physicians independently scored each video using four validated instruments: Global Quality Scale (GQS, 1-5), modified DISCERN (mDISCERN, 0-5), JAMA Benchmark Criteria (0-4), and a 3-point Medical Accuracy Scale. Videos were classified Useful or Misleading by consensus. A large language model (Claude Sonnet; temperature = 0) independently scored all videos blinded to human ratings.
Results:
Overall quality was below acceptable (GQS 2.60 ± 1.04; 65.7% scored below 3); 22.6% (n = 31) were classified Misleading. Short videos (≤ 180 s; 54.7%) had the lowest mean GQS (1.95) and the highest misinformation rate (29%). ROC analysis identified 212 s as the optimal quality threshold (AUC = 0.892). Off-label content carried the highest misinformation rate (73%); preconception discontinuation protocols were absent from 61% of videos. Engagement metrics were uncorrelated with quality (ρ = −0.141). The LLM showed strong rank-order agreement with human raters (GQS ρ = 0.845; mDISCERN ρ = 0.840) but exhibited systematic upward score inflation.
Conclusions:
YouTube content on GLP-1 RAs and reproductive health is predominantly poor quality and frequently misleading. Short-format videos pose the greatest patient safety risk, critical preconception information is frequently absent, and viral reach does not select for accuracy. LLM-assisted evaluation is valid and scalable for rank-order surveillance but requires expert calibration before operational deployment.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.