Accepted for/Published in: JMIR Infodemiology
Date Submitted: Apr 4, 2025
Open Peer Review Period: Apr 16, 2025 - Jun 11, 2025
Date Accepted: Jul 20, 2025
(closed for review but you can still tweet)
Data Mining Trauma: An AI-Assisted Qualitative Study of Cyber Victimization on Reddit
ABSTRACT
Background:
Cyber victimization exposes teens to numerous risks. Their developmental stage often leaves them unaware of potential dangers, making them susceptible to psychological distress. Despite this vulnerability, methods for identifying teens at risk of cyber victimization within healthcare settings are limited, as is research that explores their experiences of cyber victimization. The purpose of this study was to analyze how teens describe experiences of cyber victimization on the social media platform Reddit using data mining.
Objective:
This study aimed to analyze and describe how teens on Reddit describe and discuss their experience of cyber victimization using data mining and computational analysis of unsolicited data.
Methods:
This computational qualitative study used data mining, Word Adjacency Graph (WAG) Modeling, and thematic analysis to analyze discussions of Reddit users surrounding cyber victimization. Inclusion criteria included posts from 2012-2023 from subreddits r/cyberbullying and r/bullying. GPT-4, an advanced artificial intelligence language model, summarized posts and assisted in cluster labeling. Posts were reviewed to remove irrelevant content and duplicates. User anonymity was maintained throughout the study.
Results:
13,381 posts from 3,283 Reddit were analyzed, with 5.07% originating between 2012 and 2018 and 94.93% from 2019 to 2023. The WAG modeling approach identified 38 clusters, with 35 deemed to be relevant to cyber victimization experiences. Two clusters containing irrelevant material were excluded. Six overarching themes emerged: (1) psychological impact, (2) coping and healing, (3) protecting yourself online, (4) protecting yourself offline, (5) victimization across various settings, and (6) seeking meaning and understanding.
Conclusions:
The study highlights the effectiveness of data mining and AI in analyzing large public data sets for qualitative research. These methods can inform future studies on risky internet behavior, victimization, and assessment strategies in healthcare settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.