Accepted for/Published in: JMIR Infodemiology
Date Submitted: May 19, 2025
Date Accepted: Jan 13, 2026
Automated Risk Assessment of Opioid Use: Analysis Using Pre-Trained Transformers on Social Media Data
ABSTRACT
Background:
The illegal use of opioids has emerged as a major global public health concern, contributing to widespread addiction and a growing number of overdose-related deaths. In response, the U.S. federal government has invested billions of dollars in combating the opioid epidemic through addiction treatment, overdose prevention, and law enforcement initiatives. Despite these efforts, there remains an urgent need for advanced automated tools capable of detecting opioid overdose cases and identifying the risk levels of substances used—tools that can enable faster, more effective responses and reduce the need for human intervention. Social media particularly Reddit, have become valuable sources of self-reported data on opioid misuse, offering rich insights into user experiences and symptoms
Objective:
This research aims to designed and proposed an advance automated tool for detecting opioid overdose risks and classifying opioid-related substances into high risk- and low-risk categories by analyzing social media posts.
Methods:
This study makes 4 different key contributions, first, we created a unique, manually annotated dataset from Reddit posts, where opioid-related substances were labeled according to their risk levels based on user experience and contextual indicators. Second, we created detailed annotation guidelines to ensure consistency and accuracy in labeling opioid misuse patterns. Third, for model implementation, we designed and proposed a BioBERT-based classification framework enhanced with a custom attention mechanism to capture relevant semantic information for improved risk-level prediction. The performance of this model was rigorously evaluated using 5-fold cross-validation and compared against multiple baseline models, including traditional supervised learning, deep learning, and transfer learning approaches. A total of 12 experiments were conducted to benchmark our model's effectiveness. Finally, we performed a paired t-test on the cross-validation results of BioBERT and the strongest baseline model, XGBoost, to statistically validate the observed performance improvements.
Results:
The designed and proposed BioBERT model with custom attention mechanism achieved an outstanding average cross-validation score of 0.99, outperforming the best baseline, XGBoost, which achieved 0.97, resulting in a 2.06% relative improvement. Crucially, a paired t-test on the cross-validation scores confirmed that this improvement is statistically significant, providing strong evidence that the superior performance of the BioBERT-attention based model is not due to random variation but reflects genuine improvements in the detection of opioid risk levels.
Conclusions:
This study highlights the potential of leveraging social media data and advanced NLP models to build reliable systems for opioid risk detection. The BioBERT-attention model demonstrates state-of-the-art performance and statistical robustness, offering a powerful tool to support timely intervention and harm reduction strategies in the ongoing battle against the opioid crisis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.