Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Feb 8, 2024
Date Accepted: Mar 29, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

Bauer B, Norel R, Leow A, Rached ZA, Wen B, Cecchi G

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

JMIR Ment Health 2024;11:e57234

DOI: 10.2196/57234

PMID: 38771256

PMCID: 11112053

Using Large Language Models to Understand Suicidality in a Social Media-based Taxonomy of Mental Health Disorders

  • Brian Bauer; 
  • Raquel Norel; 
  • Alex Leow; 
  • Zad Abi Rached; 
  • Bo Wen; 
  • Guillermo Cecchi

ABSTRACT

Background:

Several challenges make suicide risk identification difficult, including suicide occurring across many different diagnoses as well as in the absence of any mental health diagnosis (i.e., transdiagnostic). Further, there are many different pathways leading to suicide (equifinality). With the rapid increase in use of online platforms, such as Reddit, people experiencing mental health symptoms have new outlets for sharing experiences, seeking support, and engaging in discussion regarding their mental health. Platforms like Reddit provide unique opportunities for studying the experiences and perspectives of individuals at risk of suicide in the context of other mental pathologies and stressors.

Objective:

This study aims to contribute to our understanding of suicide risk by analyzing posts from an online community dedicated to providing support for individuals in crisis (i.e., "Suicide Watch" subreddit).

Methods:

To understand natural language use during public online discussions around topics related to suicidality, we used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health related subreddit channels, with a focus on suicidality. We then apply dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddit channels, including suicide watch, between October 1st, 2022, and December 31st, 2022, and the same period in 2010.

Results:

Our results showed that, in line with existing theories of suicide, posters in the suicidality community ("Suicide Watch") predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and the resulting subreddit clusters were in line with a statistically-driven diagnostic classification system - namely the Hierarchical Taxonomy of Psychopathology (HiTOP) - by mapping onto purposed superspectra.

Conclusions:

Overall, our findings provide data-driven support for several language-based theories of suicide as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared online, and may aid in the validation and refutation of different mental health theories.


 Citation

Please cite as:

Bauer B, Norel R, Leow A, Rached ZA, Wen B, Cecchi G

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

JMIR Ment Health 2024;11:e57234

DOI: 10.2196/57234

PMID: 38771256

PMCID: 11112053

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.