Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: May 19, 2025
Date Accepted: Jan 13, 2026

The final, peer-reviewed published version of this preprint can be found here:

Automated Risk Assessment of Opioid Use: Analysis Using Pre-Trained Transformers on Social Media Data

Ahmad M, Orji R, Amjad M, Siddique A, Kubysheva N, Batyrshin I, Sidorov G

Automated Risk Assessment of Opioid Use: Analysis Using Pre-Trained Transformers on Social Media Data

JMIR Infodemiology 2026;6:e77783

DOI: 10.2196/77783

PMID: 41711388

Automated Risk Assessment of Opioid Use: Analysis Using Pre-Trained Transformers on Social Media Data

  • Muhammad Ahmad; 
  • Rita Orji; 
  • Maaz Amjad; 
  • Abubakar Siddique; 
  • Nailya Kubysheva; 
  • Ildar Batyrshin; 
  • Grigori Sidorov

ABSTRACT

Background:

The illegal use of opioids has emerged as a major global public health concern, contributing to widespread addiction and a growing number of overdose-related deaths. In response, the U.S. federal government has invested billions of dollars in combating the opioid epidemic through addiction treatment, overdose prevention, and law enforcement initiatives. Despite these efforts, there remains an urgent need for advanced automated tools capable of detecting opioid overdose cases and identifying the risk levels of substances used—tools that can enable faster, more effective responses and reduce the need for human intervention. Social media particularly Reddit, have become valuable sources of self-reported data on opioid misuse, offering rich insights into user experiences and symptoms

Objective:

This research aims to designed and proposed an advance automated tool for detecting opioid overdose risks and classifying opioid-related substances into high risk- and low-risk categories by analyzing social media posts.

Methods:

This study makes 4 different key contributions, first, we created a unique, manually annotated dataset from Reddit posts, where opioid-related substances were labeled according to their risk levels based on user experience and contextual indicators. Second, we created detailed annotation guidelines to ensure consistency and accuracy in labeling opioid misuse patterns. Third, for model implementation, we designed and proposed a BioBERT-based classification framework enhanced with a custom attention mechanism to capture relevant semantic information for improved risk-level prediction. The performance of this model was rigorously evaluated using 5-fold cross-validation and compared against multiple baseline models, including traditional supervised learning, deep learning, and transfer learning approaches. A total of 12 experiments were conducted to benchmark our model's effectiveness. Finally, we performed a paired t-test on the cross-validation results of BioBERT and the strongest baseline model, XGBoost, to statistically validate the observed performance improvements.

Results:

The designed and proposed BioBERT model with custom attention mechanism achieved an outstanding average cross-validation score of 0.99, outperforming the best baseline, XGBoost, which achieved 0.97, resulting in a 2.06% relative improvement. Crucially, a paired t-test on the cross-validation scores confirmed that this improvement is statistically significant, providing strong evidence that the superior performance of the BioBERT-attention based model is not due to random variation but reflects genuine improvements in the detection of opioid risk levels.

Conclusions:

This study highlights the potential of leveraging social media data and advanced NLP models to build reliable systems for opioid risk detection. The BioBERT-attention model demonstrates state-of-the-art performance and statistical robustness, offering a powerful tool to support timely intervention and harm reduction strategies in the ongoing battle against the opioid crisis.


 Citation

Please cite as:

Ahmad M, Orji R, Amjad M, Siddique A, Kubysheva N, Batyrshin I, Sidorov G

Automated Risk Assessment of Opioid Use: Analysis Using Pre-Trained Transformers on Social Media Data

JMIR Infodemiology 2026;6:e77783

DOI: 10.2196/77783

PMID: 41711388

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.