Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: May 21, 2025
Date Accepted: Jan 9, 2026

The final, peer-reviewed published version of this preprint can be found here:

Leveraging AI for Analysis of Digital Health Information on Cancer Prevention Among Arab Youth and Adults: Content Analysis

Komsany A, Al Zoubi O, Sebaaly L, Harrison G, Soroka O, ElKefi S, Scales D, Phillips E, Pinheiro LC, Ismail I, Chebli P

Leveraging AI for Analysis of Digital Health Information on Cancer Prevention Among Arab Youth and Adults: Content Analysis

JMIR Infodemiology 2026;6:e77888

DOI: 10.2196/77888

PMID: 41663094

PMCID: 12930147

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Leveraging AI for Content Analysis of Digital Health Information on Cancer Prevention Among Arabic speaking Youth and Adults

  • Alia Komsany; 
  • Obada Al Zoubi; 
  • Laetitia Sebaaly; 
  • Gabrielle Harrison; 
  • Orysya Soroka; 
  • Safa ElKefi; 
  • David Scales; 
  • Erica Phillips; 
  • Laura C Pinheiro; 
  • Israa Ismail; 
  • Perla Chebli

ABSTRACT

Background:

As TikTok becomes a growing source of health information, Arabic-language content remains largely unexamined. Cancer misinformation and lack of accessible, culturally relevant content may contribute to disparities in health knowledge, behaviors, and outcomes. AI tools such as large language models (LLMs) offer scalable solutions for content analysis, yet their utility in Arabic health communication remains underexplored.

Objective:

To characterize and evaluate the quality of Arabic-language TikTok videos on cancer prevention and explore the use of LLMs for scalable content analysis.

Methods:

We used the TikTok Research Application Program Interface (API) and a Generative Pre-trained Transformer (GPT) assisted keyword strategy to collect 1,800 Arabic-language TikTok videos (2021–2024). After transcription and preprocessing, the top 25% most-viewed videos (n=30) were manually coded for content type, cancer type, uploader identity, tone, scientific citation, and disclaimers. Video quality was assessed using Patient Education Materials Assessment Tool for Audiovisual Materials (PEMAT AV) for understandability and actionability, and the Global Quality Scale (GQS). GPT-4 was used to generate AI annotations, which were compared to human coding for select variables.

Results:

From an initial pool of 320 Arabic-language TikTok videos on cancer prevention, 30 top viewed videos were analyzed. Together, these videos amassed a total of 21.6 million views. Diet and alternative therapies were most common (50%) which included recommendations to reduce hydrogenated oils, increase fruit and vegetable intake, and the use of traditional remedies such as garlic and black seed. Only 6.6% of videos cited scientific literature. General cancer (53%), breast (17%), and cervical (14%) cancers were most frequently mentioned. Doctors led 30% of videos and were more likely to produce higher quality content, including significantly higher Global Quality Scores (median GQS = 4 vs 3, p = .06). Over half of the videos had low understandability (53%) and actionability (60%). Emotionally framed content had the highest engagement across likes and shares, although this did not reach statistical significance (p = .08 and p = .05, respectively). However, emotional tone was significantly associated with lower GQS scores (p = .01). GPT-4 showed high agreement with human coders for cancer type (κ = 1.0), strong agreement for GQS (κ = 0.94), but low agreement for tone classification (κ = 0.15), due to misclassification of emotional delivery from text only input.

Conclusions:

Arabic TikTok content on cancer prevention is highly engaging but varies in quality. AI assisted tools show strong potential for scalable, multilingual health content analysis, but limitations in interpreting more nuanced and audio-visual features such as tone remain.


 Citation

Please cite as:

Komsany A, Al Zoubi O, Sebaaly L, Harrison G, Soroka O, ElKefi S, Scales D, Phillips E, Pinheiro LC, Ismail I, Chebli P

Leveraging AI for Analysis of Digital Health Information on Cancer Prevention Among Arab Youth and Adults: Content Analysis

JMIR Infodemiology 2026;6:e77888

DOI: 10.2196/77888

PMID: 41663094

PMCID: 12930147

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.