Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 28, 2019
Date Accepted: Sep 15, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Detection of Suicidality among Opioid Users on Reddit: A Machine Learning Based Approach
ABSTRACT
Background:
In recent years, both suicide and overdose rates have been increasingly on the rise. Many individuals who struggle with opioid use disorder and chronic pain are prone to suicide ideation; this may oftentimes result in overdose. However, these fatal overdoses are difficult to classify as intentional or unintentional. Intentional overdose is difficult to detect partially due to lack of predictors and social stigmas that push individuals away from seeking help and may instead use online means to articulate their concerns.
Objective:
Our goal is to predict suicidality among opioid users on Reddit through analyzing posts on Reddit with machine learning. This will help us to better understand the rationale of these users, providing new insights on opioid epidemic.
Methods:
Reddit posts between June 2017 and June 2018 were collected from r/suicidewatch, r/depression, set of opioid related subreddits, and a control subreddit set. We first classified suicidal versus nonsuicidal languages, and then classified users with opioid usage versus without opioid usage. The model then predicted suicidality of opioid users by finding the intersection. Using the subreddit names as labels, a convolutional neural network (CNN) was trained on 50,000 posts to classify suicidality and non-suicidality, and then used to predict suicidality in 39,000 unlabeled opioid related posts. A second model was trained on 60,000 posts to classify users with opioid usage and without opioid usage, and then used to predict for opioid usage in 25,820 unlabeled posts from r/suicidewatch. A logistic regression model is also built for performance comparison.
Results:
The baseline slightly outperforms the CNN in terms of classifying suicidal vs non-suicidal posts (F=0.96) and for presence of opioid use versus no opioid use (F=0.98). The CNN achieves F=0.94 for classifying suicidal vs non-suicidal posts and F=0.96 for determining yes/no presence of opioid usage. For prediction of suicidality in unlabeled opioid related posts and for opioid usage in unlabeled r/suicidewatch posts, the CNN notably outperforms the baseline.
Conclusions:
The use of opioid is linked to not only the risk of unintentional overdose, but also suicide risk. Social media such as Reddit has properties that can aid in machine learning and provides data on a personal level that cannot be obtained elsewhere. We demonstrate that it is possible to collect data for an out-of-sample target subject and use posts concerning both suicide ideation and opioid misuse to examine for possible intentional overdose using neural networks, which learns flexibly without definite parameters or hard-coded features.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.