JMIR Preprints #76128: Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems：A Commentary on AI's Performance on the Clinical Pharmacy

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems：A Commentary on AI's Performance on the Clinical Pharmacy

Lulu Li;
Aifeng Wang;
Pengqiang Du;
Xiaojing Huang;
Hongwei Zhao;
Ming Ni;
Meng Yan

ABSTRACT

Background:

In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields.However, there remains a gap between technological potential and practical application, necessitating the establishment of a scientific evaluation system.Despite some existing research beginning to conduct clinical application assessments of generative AI dialogue systems, these efforts are largely limited to testing individual models on single tasks, lacking horizontal comparative analysis across multiple dialogue models and validation of continuous decision chains in real clinical scenarios.As generative artificial intelligence (AI) systems play an increasingly extensive role in the field of Medicine and Pharmacy, we need more research to explore this area.

Objective:

To systematically evaluate and compare the performance of eight mainstream generative AI systems, both domestic and international, across four core clinical pharmacy practice scenarios: medication consultation, medication education, prescription review, and case analysis with pharmaceutical care. This study aims to quantitatively assess their capabilities in addressing common clinical pharmacy practice problems.

Methods:

Assessment questions were systematically extracted from medication consultation clinic records, real clinical cases, and clinical pharmacist standardized training examination databases. Three researchers tested eight different generative AI systems on the same day using standardized "inquiry prompts." A double-blind scoring design was employed, with six experienced clinical pharmacists with extensive clinical backgrounds evaluating the AI responses on a 0-10 scale across six dimensions: accuracy, rigor, applicability, logical coherence, conciseness, and universality. Statistical analysis used one-way analysis of variance (ANOVA) to compare score differences between systems, with multiple comparison tests for significant results, and intraclass correlation coefficient (ICC) calculations to assess inter-rater consistency.Systematic descriptive evaluations of the AI-generated responses were also conducted.

Results:

DeepSeek-R1 demonstrated the best overall performance across all four task categories. Qwen, GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro performed slightly inferior to DeepSeek-R1. Doubao and Kimi showed inconsistent performance, while ERNIE Bot performed the poorest. Comprehensive evaluation indicated that responses from existing generative AI systems still have certain limitations and should be used as clinical reference tools rather than independent clinical decision-making bases. Inter-rater consistency analysis showed good agreement (ICC>0.75) in evaluating medication consultation, medication education, and prescription review. However, the lowest consistency level (ICC=0.70) was observed in assessing the conciseness of case analysis and pharmaceutical care, reflecting significant cognitive differences among raters regarding evaluation standards for these complex issues.

Conclusions:

The DeepSeek-R1 model demonstrates significant potential as a supportive decision-making tool in clinical pharmacy practice. However, overall, the current generative AI systems still require systematic improvement and refinement in their ability to handle multidimensional complex clinical pharmacy problems. Clinical Trial: none

Citation

Please cite as:

Li L, Wang A, Du P, Huang X, Zhao H, Ni M, Yan M

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study

JMIR Med Inform 2025;13:e76128

DOI: 10.2196/76128

PMID: 40705654

PMCID: 12288765

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 16, 2025

Date Accepted: Jun 17, 2025

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems：A Commentary on AI's Performance on the Clinical Pharmacy

ABSTRACT

Citation

Copyright