Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 21, 2022
Open Peer Review Period: Mar 21, 2022 - May 16, 2022
Date Accepted: Sep 7, 2022
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Assessment of Mentions of Adverse Drug Events in Social Media Through Natural Language Processing
ABSTRACT
Background:
Adverse reactions of drugs attract significant concerns in both clinical practice and public health monitoring. Multiple measures have been put into place to increase postmarketing surveillance of drug adverse effects and improve drug safety. These include implementing spontaneous reporting systems and developing automated natural language processing (NLP) systems from electronic health records and social media data to collect evidence of adverse drug events (ADEs) that can be further investigated as possible adverse reactions.
Objective:
While using social media for this purpose has potential, it is not clear whether it is a reliable source for this information. Our work aims to 1) develop NLP approaches to identify ADEs on social media and 2) assess the reliability of social media data to identify ADEs.
Methods:
We propose a collocated Long Short-Term Memory network model with attentive pooling and aggregated, contextual representation generated by a pretrained model. We applied the model on large-scale Twitter data to identify ADE-related tweets. We conducted a qualitative content analysis on those tweets to validate the reliability of social media data as a means to collect such information.
Results:
The model outperforms the variant without the contextual representations during both the validation and evaluation phases. Through the content analysis on ADE tweets, we observe that ADE-related discussions referred to seven ADE themes. Mental health-related, sleep-related, and pain-related ADE discussions were most frequent. We also contrast known adverse drug reactions to f those mentioned in tweets.
Conclusions:
We observe a distinct improvement in the model using contextual information. However, results reveal weak generalizability of the current systems on unseen data. Additional research is needed to fully utilize social media data, and improve the robustness and reliability of NLP systems. The content analysis, on the other hand, shows that Twitter covers a sufficiently wide range of ADEs as well as known adverse reactions for drugs mentioned in tweets. Our work demonstrates that social media can be a reliable data source for collecting ADE mentions.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.