JMIR Preprints #65481: AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies

Benjamin Alexander Ogorek;
Thomas Patrick Rhoads;
Eric Scott Finkelman;
Isaac Rogelio Rodriguez-Chavez

ABSTRACT

Background:

Pharmacoepidemiologic studies require Anatomic Therapeutic Chemical (ATC) drug classification from real-world data sources. These studies enable standardized analysis of drug utilization patterns and safety monitoring, ultimately promoting rational drug use and improving health outcomes. Proprietary tools for this purpose are expensive while free tools lack generalizability. Large language models (LLMs), like GPT-4o, offer a cost-effective alternative as they can produce explanations about a drug’s ATC code and return the output in a structured fashion.

Objective:

This paper seeks to establish LLMs as an assisting technology in the drug classification task, a prerequisite to good pharmacoepidemiologic research. This requires developing AI prompts and data processing procedures and showing that the resulting accuracy, efficiency and effectiveness is as good or better than established methods.

Methods:

Patients residing in the US and Canada with medication scheduled through a smart medication dispenser called “spencer SmartHub” (Spencer Health Solutions, Inc., Morrisville, NC) were included in this study if they had a scheduled medication refill in 2024 and consented to the use of their data for research. An AI prompt requesting best and next-best 2nd level ATC codes from de-identified daily-dose strings was generated iteratively with expert guidance on clinical research, digital medicine, and regulatory affairs. An initial prompt was created that ensured aspirin at various doses would be classified as either an analgesic or antithrombotic. Upon success, the prompt was used in a pilot sample of 20 daily dose strings and graded by the expert. While there was more than one incorrect response, the prompt was revised. The prompt was then applied to an inference sample of n=200 daily dose strings, taken without replacement. Finite population inference on the proportions of correct and approximately correct ATC drug classification was carried out. All errors made by the algorithm were reviewed.

Results:

There were 3,371 de-identified patients who met the inclusion criteria, 2908 (86%) residing in Canada and 463 (14%) residing in the United States. This resulted in 12,294 daily dose strings. The initial prompt with few-shot learning and concise output was unable to distinguish between aspirin’s analgesic vs antithrombotic therapeutic uses. A revised prompt using chain-of-thought reasoning succeeded and achieved 100% correctness on the pilot sample of n=20. In the inferential sample, a proportion of 0.96 (80% CI 0.943-0.978), were deemed correct by the expert, with the approximately correct designation never being used. The top mistakes were incorrectly classifying dietary supplements as medications, mistaking the identity of a drug, and incorrectly following delimiter instructions.

Conclusions:

GPT-4o offers an accurate, efficient and effective drug classification approach to augment real-world drug databases with ATC drug classes, giving all research teams access to a powerful tool to satisfy a key prerequisite of pharmacoepidemiologic analysis using real-world data from across the globe.

Citation

Please cite as:

Ogorek BA, Rhoads TP, Finkelman ES, Rodriguez-Chavez IR

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation

JMIR AI 2025;4:e65481

DOI: 10.2196/65481

PMID: 40505126

PMCID: 12203024

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Aug 16, 2024

Date Accepted: Apr 14, 2025

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies

ABSTRACT

Citation

Copyright