Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 9, 2026
Date Accepted: May 28, 2026
Methods for Detecting Suspicious Information from Individual Transactions of Pharmaceutical Products via Twitter (now X): A Retrospective Observational Study
ABSTRACT
Background:
Individual transactions involving pharmaceutical products via social networking sites (SNSs) are considered an inappropriate distribution route and might serve as a guise for illicit business-to-consumer activities. Individual transactions of pharmaceutical products via the internet are considered inappropriate distribution routes in Japan. Such transactions may lead to the misuse of pharmaceutical products and require more active monitoring and guidance with regard to pharmaceutical security and quality assurance.
Objective:
This study aimed to develop a method to accurately detect SNS posts that possibly involve individual transactions of pharmaceutical products, using text data from X (formerly Twitter), the primary platform for such activities in Japan.
Methods:
Text mining was applied to 1,389 text posts, suspected of involving individual pharmaceutical transactions, that had been recorded by the Counterfeit and Illegal Drugs Information Center. Using the hashtag “#Okusuri mogumogu,” commonly associated with trading psychotropic pharmaceuticals, and our web crawler program, we accumulated 7,499 tweets posted in 2022 and 6,461 tweets posted from January 1, 2023, to March 31, 2023. After manual scrutiny of each post to determine whether it was related to individual pharmaceutical transactions, the relevant posts were categorized. Using the 2022 dataset, we extracted relevant words from these posts and summarized their occurrences and frequencies. A decision tree model was then generated using the 2022 dataset and validated using the 2023 dataset to evaluate the reliability of detecting transaction-related posts.
Results:
A total of 7,499 and 6,461 posts were identified in 2022 and the first three months of 2023, respectively, using the hashtag “#Okusuri mogumogu” by web crawling. The crawling results also identified an increase in the number of detectable posts closer to the crawl date, indicating that SNS posts may frequently disappear due to deletion. Among the 3,228 extracted words in 2022, 452 were significantly associated with posts suspected of involving individual transactions. Highly indicative terms included kyuu (request), yuzuri (transfer), DM (direct message), and transaction-related hashtags. The Chi-square Automatic Interaction Detector model demonstrated stable discriminative performance (training area under the curve [AUC] 0.84; test AUC 0.83; Gini 0.65). When applied to the 2023 dataset, 82.31% of posts classified as suspicious were found to be consistent with manual annotation, indicating reasonable generalizability despite linguistic fragmentation and the presence of partial word forms characteristic of Japanese text.
Conclusions:
We identified key terms linked to individual pharmaceutical transactions using transaction-related tags, text mining, and machine learning, and developed a predictive model. This approach might help prevent inappropriate online transactions of pharmaceutical products.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.