Currently submitted to: JMIR Research Protocols
Date Submitted: May 21, 2026
Open Peer Review Period: May 21, 2026 - Jul 16, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Algorithmic Noise Adjacency in Cantonese-Language Dementia Education on YouTube: Protocol for a Socio-Technical Audit of Recommendation Environments
ABSTRACT
Background:
Older Chinese American adults with limited English proficiency frequently rely on culturally and linguistically tailored online media platforms for dementia education and health literacy. Existing digital health evaluation frameworks primarily assess the internal quality and accuracy of educational materials while overlooking the recommendation environments through which users encounter health information. Commercial recommendation systems may expose users to medically unverified or commercially motivated health content adjacent to evidence-based dementia education resources.
Objective:
This protocol establishes a reproducible computational framework for auditing recommendation environments surrounding Cantonese-language dementia education videos on YouTube. The study introduces the concept of algorithmic noise adjacency, defined operationally as the concentration of recommendation nodes whose informational utility diverges from evidence-based dementia education objectives within local recommendation neighborhoods.
Methods:
The protocol uses an automated socio-technical audit framework centered on 2 Cantonese-language dementia education videos previously examined in longitudinal digital outreach research. A Selenium WebDriver pipeline with a headless Chromium browser architecture and fingerprinting mitigations will simulate 1200 independent browsing sessions distributed uniformly across a 90-day window. Sessions will be routed through rotating residential proxy infrastructure localized to Southern California Chinese American communities. During each session, the top 10 sidebar recommendation videos adjacent to the source clinical asset (the anchor video) will be extracted. Recommendation metadata will undergo structured semantic classification into 5 mutually exclusive categories: (A) verified public health and clinical infrastructure; (B) diaspora culture and entertainment; (C) commercially motivated health content lacking established clinical validation; (D) unverified alternative medicine; and (E) ambiguous or unclassified baseline noise. Double-coding and consensus adjudication of a random 10% sample will be used to establish inter-rater reliability. Generalized linear mixed-effects models (GLMM) with logit link functions, session-level random intercepts, and reciprocal rank slot-position weighting will evaluate recommendation characteristics while accounting for clustering within browsing sessions. Primary outcome measures include the Noise Adjacency Ratio (NAR), slot-weighted NAR, recommendation recurrence density, and neighborhood entropy. Secondary analyses will evaluate temporal recommendation drift and cross-session recommendation variability.
Results:
Protocol development and pilot automation testing were finalized in May 2026. Automation stress testing, semantic calibration, and proxy validation are scheduled for September 2026. Data collection is projected to occur over a 90-day interval following deployment of the finalized extraction architecture.
Conclusions:
This protocol proposes a structural informatic auditing framework for minority-language digital health ecosystems. By shifting evaluation from isolated content quality toward surrounding recommendation neighborhoods, the study may provide digital health researchers with a reproducible methodology for characterizing health-information exposure conditions among linguistically isolated populations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.