Accepted for/Published in: JMIR Mental Health
Date Submitted: Feb 8, 2024
Date Accepted: Mar 29, 2024
(closed for review but you can still tweet)
Using Large Language Models to Understand Suicidality in a Social Media-based Taxonomy of Mental Health Disorders
ABSTRACT
Background:
Several challenges make suicide risk identification difficult, including suicide occurring across many different diagnoses as well as in the absence of any mental health diagnosis (i.e., transdiagnostic). Further, there are many different pathways leading to suicide (equifinality). With the rapid increase in use of online platforms, such as Reddit, people experiencing mental health symptoms have new outlets for sharing experiences, seeking support, and engaging in discussion regarding their mental health. Platforms like Reddit provide unique opportunities for studying the experiences and perspectives of individuals at risk of suicide in the context of other mental pathologies and stressors.
Objective:
This study aims to contribute to our understanding of suicide risk by analyzing posts from an online community dedicated to providing support for individuals in crisis (i.e., "Suicide Watch" subreddit).
Methods:
To understand natural language use during public online discussions around topics related to suicidality, we used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health related subreddit channels, with a focus on suicidality. We then apply dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddit channels, including suicide watch, between October 1st, 2022, and December 31st, 2022, and the same period in 2010.
Results:
Our results showed that, in line with existing theories of suicide, posters in the suicidality community ("Suicide Watch") predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and the resulting subreddit clusters were in line with a statistically-driven diagnostic classification system - namely the Hierarchical Taxonomy of Psychopathology (HiTOP) - by mapping onto purposed superspectra.
Conclusions:
Overall, our findings provide data-driven support for several language-based theories of suicide as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared online, and may aid in the validation and refutation of different mental health theories.
Citation
Request queued. Please wait while the file is being generated. It may take some time.