Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 19, 2025
Date Accepted: Oct 15, 2025

The final, peer-reviewed published version of this preprint can be found here:

Leveraging Large Language Models to Identify Engagement-Driving Features in Vaping-Related TikTok Videos: Cross-Sectional Study

Xie Z, Korrapolu NK, Dubey A, Song L, Xu C, Wilson KM, Cupertino A, Li D

Leveraging Large Language Models to Identify Engagement-Driving Features in Vaping-Related TikTok Videos: Cross-Sectional Study

J Med Internet Res 2025;27:e76265

DOI: 10.2196/76265

PMID: 41264868

PMCID: 12634013

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Leveraging large language models to identify engagement-driving features in vaping-related TikTok videos: a cross-sectional study

  • Zidian Xie; 
  • Nanda Kishore Korrapolu; 
  • Amisha Dubey; 
  • Luchuan Song; 
  • Chenliang Xu; 
  • Karen M. Wilson; 
  • AnaPaula Cupertino; 
  • Dongmei Li

ABSTRACT

Background:

Electronic cigarette (e-cigarette) use is prevalent in youth and young adults. TikTok, a popular social media platform for youth and young adults, has been used to disseminate e-cigarette-related videos, primarily dominated by promotional videos.

Objective:

We aim to identify key e-cigarette-related TikTok video features associated with high user engagement to assist with future video design for vaping prevention campaigns.

Methods:

We collected 1,487 e-cigarette-related TikTok videos and related metadata using the TikTok API (Application Programming Interface). We applied large language models GPT-4 and Video-LLaMA to extract video features (e.g., promotion content, background, gender, lifestyle, talking, cartoon, vaping tricks, containing emoji) from e-cigarette-related TikTok videos. We randomly selected and hand-coded 25 videos to check the accuracy of two models in identifying these video features. We utilized generalized linear models with identity link functions to identify significant video features associated with high TikTok user engagement (likes + shares + comments)/views.

Results:

Compared to the Video-LLaMA model, the GPT-4 model exhibited higher accuracy (83%-100 % vs. 24%-88 %) in video feature identification. Notably, video backgrounds in cars, private spaces, or shops demonstrated significantly higher user engagement than in public spaces. Moreover, videos featuring young adults, smoking or vaping, talking, vape tricks, containing emojis, or funny and silly content exhibited heightened user engagement. Conversely, videos with promotional content or featuring e-cigarettes experienced lower engagement.

Conclusions:

TikTok video features like background settings, young adult presence, talking, and containing emojis substantially enhance user engagement. These insights offer valuable guidance for designing compelling videos in vaping prevention campaigns to improve social media user engagement.


 Citation

Please cite as:

Xie Z, Korrapolu NK, Dubey A, Song L, Xu C, Wilson KM, Cupertino A, Li D

Leveraging Large Language Models to Identify Engagement-Driving Features in Vaping-Related TikTok Videos: Cross-Sectional Study

J Med Internet Res 2025;27:e76265

DOI: 10.2196/76265

PMID: 41264868

PMCID: 12634013

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.