JMIR Preprints #69504: Identifying Deprescribing Opportunities with Large Language Models in Older Adults: Retrospective Cohort Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Identifying Deprescribing Opportunities with Large Language Models in Older Adults: Retrospective Cohort Study

Vimig Socrates;
Donald S. Wright;
Thomas Huang;
Soraya Fereydooni;
Christine Dien;
Ling Chi;
Jesse Albano;
Brian Patterson;
Naga Kanaparthy;
Catherine X. Wright;
Andrew Loza;
David Chartash;
Mark Iscoe;
R. Andrew Taylor

ABSTRACT

Background:

Polypharmacy, the concurrent use of multiple medications, is prevalent among older adults and associated with increased risks for adverse drug events including falls. Deprescribing, the systematic process of discontinuing potentially inappropriate medications (PIMs), aims to mitigate these risks. However, the practical application of deprescribing criteria in emergency settings remains limited due to time constraints and the complexity of criteria.

Objective:

This study evaluates the performance of a large language model (LLM)-based pipeline in identifying deprescribing opportunities for older emergency department (ED) patients with polypharmacy, utilizing 3 different sets of criteria: Beers, Screening Tool of Older People’s Prescriptions (STOPP), and GEMS-Rx. It further evaluates LLM confidence calibration and its ability to improve recommendation performance.

Methods:

We conducted a retrospective cohort study of older adults presenting to an ED in a large academic medical center in the Northeast United States from January-March 2022. A random, convenience sample of 100 patients (712 total oral medications) was selected for detailed analysis. The LLM pipeline consisted of two steps: (1) filtering high-yield deprescribing criteria based on patients' medication lists, and (2) applying these criteria using both structured and unstructured patient data to recommend deprescribing. Model performance was assessed by comparing model recommendations to those of trained medical students, with discrepancies adjudicated by board-certified ED physicians. Selective prediction, a method that allows a model to abstain from low-confidence predictions to improve overall reliability, was applied to assess the model's confidence and decision-making thresholds.

Results:

The LLM achieved high accuracy in identifying deprescribing criteria (PPV: 0.83; NPV: 0.93) relative to medical students, but showed limitations in making specific deprescribing recommendations (PPV: 0.47; NPV: 0.93). Adjudication revealed that while the model excelled at identifying when there was a deprescribing criterion related to one of the patient's medications, it often struggled with determining whether that criterion applied to the specific case due to complex inclusion/exclusion criteria (54.5% of errors) and ambiguous clinical contexts (e.g. missing information; 39.3% of errors). Selective prediction only marginally improved LLM performance due to poorly calibrated confidence estimates.

Conclusions:

This study highlights the potential of LLMs to support deprescribing decisions in the ED by effectively filtering relevant criteria. However, challenges remain in applying these criteria to complex clinical scenarios, as the LLM demonstrated poor performance on more intricate decision-making tasks, with its reported confidence often failing to align with its actual success in these cases. The findings underscore the need for clearer deprescribing guidelines, improved LLM calibration for real-world use, and better integration of human-AI workflows to balance AI recommendations with clinician judgment.

Citation

Please cite as:

Socrates V, Wright DS, Huang T, Fereydooni S, Dien C, Chi L, Albano J, Patterson B, Kanaparthy N, Wright CX, Loza A, Chartash D, Iscoe M, Taylor RA

Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study

JMIR Aging 2025;8:e69504

DOI: 10.2196/69504

PMID: 40215480

PMCID: 12032504

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Aging

Date Submitted: Dec 2, 2024

Date Accepted: Feb 21, 2025

Identifying Deprescribing Opportunities with Large Language Models in Older Adults: Retrospective Cohort Study

ABSTRACT

Citation

Copyright