Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Aug 16, 2024
Date Accepted: Apr 14, 2025

The final, peer-reviewed published version of this preprint can be found here:

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation

Ogorek BA, Rhoads TP, Finkelman ES, Rodriguez-Chavez IR

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation

JMIR AI 2025;4:e65481

DOI: 10.2196/65481

PMID: 40505126

PMCID: 12203024

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies

  • Benjamin Alexander Ogorek; 
  • Thomas Patrick Rhoads; 
  • Eric Scott Finkelman; 
  • Isaac Rogelio Rodriguez-Chavez

ABSTRACT

Background:

Pharmacoepidemiologic studies require Anatomic Therapeutic Chemical (ATC) drug classification from real-world data sources. These studies enable standardized analysis of drug utilization patterns and safety monitoring, ultimately promoting rational drug use and improving health outcomes. Proprietary tools for this purpose are expensive while free tools lack generalizability. Large language models (LLMs), like GPT-4o, offer a cost-effective alternative as they can produce explanations about a drug’s ATC code and return the output in a structured fashion.

Objective:

This paper seeks to establish LLMs as an assisting technology in the drug classification task, a prerequisite to good pharmacoepidemiologic research. This requires developing AI prompts and data processing procedures and showing that the resulting accuracy, efficiency and effectiveness is as good or better than established methods.

Methods:

Patients residing in the US and Canada with medication scheduled through a smart medication dispenser called “spencer SmartHub” (Spencer Health Solutions, Inc., Morrisville, NC) were included in this study if they had a scheduled medication refill in 2024 and consented to the use of their data for research. An AI prompt requesting best and next-best 2nd level ATC codes from de-identified daily-dose strings was generated iteratively with expert guidance on clinical research, digital medicine, and regulatory affairs. An initial prompt was created that ensured aspirin at various doses would be classified as either an analgesic or antithrombotic. Upon success, the prompt was used in a pilot sample of 20 daily dose strings and graded by the expert. While there was more than one incorrect response, the prompt was revised. The prompt was then applied to an inference sample of n=200 daily dose strings, taken without replacement. Finite population inference on the proportions of correct and approximately correct ATC drug classification was carried out. All errors made by the algorithm were reviewed.

Results:

There were 3,371 de-identified patients who met the inclusion criteria, 2908 (86%) residing in Canada and 463 (14%) residing in the United States. This resulted in 12,294 daily dose strings. The initial prompt with few-shot learning and concise output was unable to distinguish between aspirin’s analgesic vs antithrombotic therapeutic uses. A revised prompt using chain-of-thought reasoning succeeded and achieved 100% correctness on the pilot sample of n=20. In the inferential sample, a proportion of 0.96 (80% CI 0.943-0.978), were deemed correct by the expert, with the approximately correct designation never being used. The top mistakes were incorrectly classifying dietary supplements as medications, mistaking the identity of a drug, and incorrectly following delimiter instructions.

Conclusions:

GPT-4o offers an accurate, efficient and effective drug classification approach to augment real-world drug databases with ATC drug classes, giving all research teams access to a powerful tool to satisfy a key prerequisite of pharmacoepidemiologic analysis using real-world data from across the globe.


 Citation

Please cite as:

Ogorek BA, Rhoads TP, Finkelman ES, Rodriguez-Chavez IR

AI-Powered Drug Classification and Indication Mapping for Pharmacoepidemiologic Studies: Prompt Development and Validation

JMIR AI 2025;4:e65481

DOI: 10.2196/65481

PMID: 40505126

PMCID: 12203024

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.