Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Feb 8, 2024
Date Accepted: Mar 29, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

Bauer B, Norel R, Leow A, Rached ZA, Wen B, Cecchi G

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

JMIR Ment Health 2024;11:e57234

DOI: 10.2196/57234

PMID: 38771256

PMCID: 11112053

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Suicidality in a Social Media-based Taxonomy of Mental Health Disorders

  • Brian Bauer; 
  • Raquel Norel; 
  • Alex Leow; 
  • Zad Abi Rached; 
  • Bo Wen; 
  • Guillermo Cecchi

ABSTRACT

Background:

To understand natural language use during public online discussions around topics related to suicidality.

Objective:

We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health related subreddit channels, with a focus on suicidality. We then apply dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower dimensional Euclidean space for further downstream analyses.

Methods:

We analyzed 2.9 million posts extracted from 30 subreddit channels, including suicide watch, between October 1st, 2022, and December 31st, 2022, and the same period in 2010.

Results:

Our results showed that, in line with existing theories of suicide, posters in the suicidality community ("Suicide Watch") predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and the resulting subreddit clusters were in line with a statistically-driven diagnostic classification system - namely the Hierarchical Taxonomy of Psychopathology (HiTOP) - by mapping onto purposed superspectra.

Conclusions:

Overall, our findings provide data-driven support for several language-based theories of suicide as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared online, and may aid in the validation and refutation of different mental health theories.


 Citation

Please cite as:

Bauer B, Norel R, Leow A, Rached ZA, Wen B, Cecchi G

Using Large Language Models to Understand Suicidality in a Social Media–Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts

JMIR Ment Health 2024;11:e57234

DOI: 10.2196/57234

PMID: 38771256

PMCID: 11112053

Per the author's request the PDF is not available.