Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Infodemiology

Date Submitted: May 13, 2026
Open Peer Review Period: May 26, 2026 - Jul 21, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Quality and Safety of YouTube Videos on GLP-1 Receptor Agonists and Reproductive Health: A Systematic Evaluation Using Validated Instruments and Artificial Intelligence

  • Çağlayan Biçer; 
  • Mürüvvet Biçer

ABSTRACT

Background:

GLP-1 receptor agonists (GLP-1 RAs) — semaglutide, liraglutide, and tirzepatide — are among the most widely prescribed medications globally, with disproportionate uptake among women of reproductive age. The quality of YouTube content on GLP-1 RAs and reproductive health has not been previously characterized.

Objective:

This study aimed to evaluate the quality, accuracy, and misinformation burden of YouTube videos addressing GLP-1 RAs in a reproductive health context, and to assess the validity of large language model (LLM)-assisted quality scoring as a scalable surveillance tool.

Methods:

We conducted a PRISMA-compliant cross-sectional analysis of 137 YouTube videos retrieved via YouTube Data API v3 on 8 March 2026. Two physicians independently scored each video using four validated instruments: Global Quality Scale (GQS, 1-5), modified DISCERN (mDISCERN, 0-5), JAMA Benchmark Criteria (0-4), and a 3-point Medical Accuracy Scale. Videos were classified Useful or Misleading by consensus. A large language model (Claude Sonnet; temperature = 0) independently scored all videos blinded to human ratings.

Results:

Overall quality was below acceptable (GQS 2.60 ± 1.04; 65.7% scored below 3); 22.6% (n = 31) were classified Misleading. Short videos (≤ 180 s; 54.7%) had the lowest mean GQS (1.95) and the highest misinformation rate (29%). ROC analysis identified 212 s as the optimal quality threshold (AUC = 0.892). Off-label content carried the highest misinformation rate (73%); preconception discontinuation protocols were absent from 61% of videos. Engagement metrics were uncorrelated with quality (ρ = −0.141). The LLM showed strong rank-order agreement with human raters (GQS ρ = 0.845; mDISCERN ρ = 0.840) but exhibited systematic upward score inflation.

Conclusions:

YouTube content on GLP-1 RAs and reproductive health is predominantly poor quality and frequently misleading. Short-format videos pose the greatest patient safety risk, critical preconception information is frequently absent, and viral reach does not select for accuracy. LLM-assisted evaluation is valid and scalable for rank-order surveillance but requires expert calibration before operational deployment.


 Citation

Please cite as:

Biçer , Biçer M

Quality and Safety of YouTube Videos on GLP-1 Receptor Agonists and Reproductive Health: A Systematic Evaluation Using Validated Instruments and Artificial Intelligence

JMIR Preprints. 13/05/2026:101296

DOI: 10.2196/preprints.101296

URL: https://preprints.jmir.org/preprint/101296

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.