Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 16, 2023
Date Accepted: Oct 27, 2023
Understanding mental health issues in different subdomains in social networking services: computational analysis of text-based reddit posts
ABSTRACT
Background:
Users increasingly use social networking services (SNSs) to share their feelings and emotions. For those with mental disorders, SNSs can also be used to seek advice on mental health issues. One available SNS is Reddit, in which users can freely discuss such matters on relevant health diagnostic subreddits.
Objective:
In this study, we analyze distinctive linguistic characteristics of user posts on specific mental disorder subreddit channels (depression, anxiety, bipolar, borderline personality disorder, schizophrenia, autism, and mental health). We also confirm that these differences in linguistic formulations can be learned through a machine learning process.
Methods:
We used various statistical analysis methods, including one-way analyzes of variance and subsequent post hoc tests. We also applied three supervised and unsupervised clustering methods after extracting textual features from posts of each subreddit channel using bidirectional encoder representations from transformers (BERT), to ensure that our dataset is suitable for further machine learning or deep learning tasks.
Results:
The results indicate that there are notable linguistic differences among the channels, consistent with the findings of prior research. The findings reveal that patients with each mental health issue show different lexical and semantic patterns throughout their online social networking activities. Furthermore, distinctive features of each subreddit class could be successfully captured through supervised and unsupervised clustering methods using the extracted BERT embeddings of the textual posts.
Conclusions:
By analyzing textual posts related to mental health issues using statistical, natural language processing (NLP) and machine learning techniques, our approach provides insights into aspects of recent lexical usage and information on the linguistic characteristics of patients with specific mental health issues, which can inform clinicians about a patient's mental health in diagnostic terms to aid online intervention. Our findings can further promote research areas involving linguistic analysis and machine learning approaches for patients with mental health issues by identifying and detecting mentally vulnerable groups of people online. The dataset used in this study is also publicly available online.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.