JMIR Preprints #49074: Understanding mental health issues in different subdomains in social networking services: computational analysis of text-based reddit posts

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Understanding mental health issues in different subdomains in social networking services: computational analysis of text-based reddit posts

Seoyun Kim;
Junyeop Cha;
Dongjae Kim;
Eunil Park

ABSTRACT

Background:

Users increasingly use social networking services (SNSs) to share their feelings and emotions. For those with mental disorders, SNSs can also be used to seek advice on mental health issues. One available SNS is Reddit, in which users can freely discuss such matters on relevant health diagnostic subreddits.

Objective:

In this study, we analyze distinctive linguistic characteristics of user posts on specific mental disorder subreddit channels (depression, anxiety, bipolar, borderline personality disorder, schizophrenia, autism, and mental health). We also confirm that these differences in linguistic formulations can be learned through a machine learning process.

Methods:

We used various statistical analysis methods, including one-way analyzes of variance and subsequent post hoc tests. We also applied three supervised and unsupervised clustering methods after extracting textual features from posts of each subreddit channel using bidirectional encoder representations from transformers (BERT), to ensure that our dataset is suitable for further machine learning or deep learning tasks.

Results:

The results indicate that there are notable linguistic differences among the channels, consistent with the findings of prior research. The findings reveal that patients with each mental health issue show different lexical and semantic patterns throughout their online social networking activities. Furthermore, distinctive features of each subreddit class could be successfully captured through supervised and unsupervised clustering methods using the extracted BERT embeddings of the textual posts.

Conclusions:

By analyzing textual posts related to mental health issues using statistical, natural language processing (NLP) and machine learning techniques, our approach provides insights into aspects of recent lexical usage and information on the linguistic characteristics of patients with specific mental health issues, which can inform clinicians about a patient's mental health in diagnostic terms to aid online intervention. Our findings can further promote research areas involving linguistic analysis and machine learning approaches for patients with mental health issues by identifying and detecting mentally vulnerable groups of people online. The dataset used in this study is also publicly available online.

Citation

Please cite as:

Kim S, Cha J, Kim D, Park E

Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit Posts

J Med Internet Res 2023;25:e49074

DOI: 10.2196/49074

PMID: 38032730

PMCID: 10722371

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 16, 2023

Date Accepted: Oct 27, 2023

Understanding mental health issues in different subdomains in social networking services: computational analysis of text-based reddit posts

ABSTRACT

Citation

Copyright