Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 7, 2022
Date Accepted: Mar 29, 2023
Transferability Based on Drug Structure Similarity in Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach
ABSTRACT
Background:
In recent years, inappropriate drug use, known as medication noncompliance, has become an issue as the distribution and sales of drugs on the internet have increased. Therefore, we aimed to monitor improper drug use on social media. However, since corpus construction for monitoring is costly, we attempted transfer learning of corpora for drugs with similar chemical structures.
Objective:
We implemented a multilabel classification of social media texts based on medication noncompliance. In addition, the chemical similarity of the drugs was used to confirm the possibility of transfer learning in the corpus.
Methods:
We used the MediA corpus for medication noncompliance, with labels consisting of Noncompliant use/mention, Noncompliant sale, General use, and General mention assigned to tweets mentioning 20 different drugs. The classification model for tweets about a specific drug was transfer-trained on two sub-corpora: tweets about one other drug (single sub-corpus transfer learning), and tweets about other drugs (multi-sub-corpus incremental learning). Based on drug structure similarity, we evaluated whether there was an effective sub-corpus of drugs to be used for transfer learning.
Results:
A slight correlation of 0.278 was observed between the structural similarity of drugs and classification performance. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a sub-corpus when the number of sub-corpora was small.
Conclusions:
The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of Tanimoto structural similarity if a sufficient variety of drugs is ensured.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.