JMIR Preprints #94855: Self-Reported Health Outcomes in Metabolic Health YouTube Comments: Cross-Sectional Study of Rule-Based NLP Framework Development and Validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Self-Reported Health Outcomes in Metabolic Health YouTube Comments: Cross-Sectional Study of Rule-Based NLP Framework Development and Validation

Ricardo Ribeiro;
Aneesh Zutshi

ABSTRACT

Background:

YouTube is increasingly used for Healthcasting, the sharing of evidence-based dietary and lifestyle interventions by expert researchers and clinicians. In the metabolic health domain, channels focused on Therapeutic Carbohydrate Restriction (TCR) have accumulated audiences of millions. A distinctive feature is the comment section, where viewers share first-person accounts of health changes: weight loss, biomarkers normalised, chronic conditions reversed. At scale, these comments constitute a unique source of real-world outcome data. However, extracting structured health information from hundreds of thousands of unstructured comments with the precision required for outcomes research presents significant computational challenges.

Objective:

To develop and validate a precision-optimised computational framework for systematically extracting self-reported health outcomes from Healthcasting YouTube comments, and to characterise the nature, distribution, and channel-level variation of reported outcomes across a large-scale metabolic health corpus.

Methods:

We collected 209,661 comments from 110 videos across 11 TCR-focused Healthcasting channels (37,742 unique authors; 2013–2026). A four-phase methodology was employed: (1) exploratory corpus characterisation; (2) iterative development of a 35-aspect hierarchical health outcome ontology; (3) a precision-optimised rule-based classification pipeline with manual validation (n=500) and negative-sample recall estimation (n=105); and (4) Aspect-Based Sentiment Analysis using dual-model LLM consensus coding.

Results:

The framework identified 6,671 positive health outcome reports (3.18% prevalence), achieving 97.6% precision (95% CI: 95.7%–98.6%) and estimated 16.5% recall (95% CI: 11.6%–23.6%). Outcomes extended well beyond weight loss: pain and inflammation reduction (17.0%), type 2 diabetes improvement (14.6%), skin health (11.8%), and psychological well-being (11.0%), with 2,032 outcomes spanning 18 named disease conditions. Over half (50.3%) spanned multiple research objectives simultaneously. Significant channel-level variation was observed (χ²=3,509, p<0.001), with positive outcome rates ranging from 1.14% to 8.06% (OR=7.61). A complementary Aspect-Based Sentiment Analysis confirmed a positive-to-negative ratio of 4.6:1, with negative experiences (11.9% of health-related comments) primarily involving gastrointestinal adaptation and cardiovascular concerns.

Conclusions:

Healthcasting YouTube comment sections contain a substantial, structured signal of self-reported health outcomes amenable to systematic computational extraction. The framework generates a high-confidence corpus of 6,510 estimated true positives across 35 health aspects, documenting the breadth and scale of metabolic health improvement reported by users of TCR-focused expert content. These findings provide a validated methodological foundation for AI-augmented digital health platform design.

Citation

Please cite as:

Ribeiro R, Zutshi A

Self-Reported Health Outcomes in Metabolic Health YouTube Comments: Cross-Sectional Study and Rule-Based Natural Language Processing Framework Development and Validation

J Med Internet Res 2026;28:e94855

DOI: 10.2196/94855

PMID: 42077206

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 7, 2026

Open Peer Review Period: Mar 9, 2026 - May 4, 2026

Date Accepted: May 1, 2026

Date Submitted to PubMed: May 4, 2026

(closed for review but you can still tweet)

Self-Reported Health Outcomes in Metabolic Health YouTube Comments: Cross-Sectional Study of Rule-Based NLP Framework Development and Validation

ABSTRACT

Citation

Copyright