Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 16, 2025
Date Accepted: Jun 17, 2025

The final, peer-reviewed published version of this preprint can be found here:

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study

Li L, Wang A, Du P, Huang X, Zhao H, Ni M, Yan M

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study

JMIR Med Inform 2025;13:e76128

DOI: 10.2196/76128

PMID: 40705654

PMCID: 12288765

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems:A Commentary on AI's Performance on the Clinical Pharmacy

  • Lulu Li; 
  • Aifeng Wang; 
  • Pengqiang Du; 
  • Xiaojing Huang; 
  • Hongwei Zhao; 
  • Ming Ni; 
  • Meng Yan

ABSTRACT

Background:

In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields.However, there remains a gap between technological potential and practical application, necessitating the establishment of a scientific evaluation system.Despite some existing research beginning to conduct clinical application assessments of generative AI dialogue systems, these efforts are largely limited to testing individual models on single tasks, lacking horizontal comparative analysis across multiple dialogue models and validation of continuous decision chains in real clinical scenarios.As generative artificial intelligence (AI) systems play an increasingly extensive role in the field of Medicine and Pharmacy, we need more research to explore this area.

Objective:

To systematically evaluate and compare the performance of eight mainstream generative AI systems, both domestic and international, across four core clinical pharmacy practice scenarios: medication consultation, medication education, prescription review, and case analysis with pharmaceutical care. This study aims to quantitatively assess their capabilities in addressing common clinical pharmacy practice problems.

Methods:

Assessment questions were systematically extracted from medication consultation clinic records, real clinical cases, and clinical pharmacist standardized training examination databases. Three researchers tested eight different generative AI systems on the same day using standardized "inquiry prompts." A double-blind scoring design was employed, with six experienced clinical pharmacists with extensive clinical backgrounds evaluating the AI responses on a 0-10 scale across six dimensions: accuracy, rigor, applicability, logical coherence, conciseness, and universality. Statistical analysis used one-way analysis of variance (ANOVA) to compare score differences between systems, with multiple comparison tests for significant results, and intraclass correlation coefficient (ICC) calculations to assess inter-rater consistency.Systematic descriptive evaluations of the AI-generated responses were also conducted.

Results:

DeepSeek-R1 demonstrated the best overall performance across all four task categories. Qwen, GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-Pro performed slightly inferior to DeepSeek-R1. Doubao and Kimi showed inconsistent performance, while ERNIE Bot performed the poorest. Comprehensive evaluation indicated that responses from existing generative AI systems still have certain limitations and should be used as clinical reference tools rather than independent clinical decision-making bases. Inter-rater consistency analysis showed good agreement (ICC>0.75) in evaluating medication consultation, medication education, and prescription review. However, the lowest consistency level (ICC=0.70) was observed in assessing the conciseness of case analysis and pharmaceutical care, reflecting significant cognitive differences among raters regarding evaluation standards for these complex issues.

Conclusions:

The DeepSeek-R1 model demonstrates significant potential as a supportive decision-making tool in clinical pharmacy practice. However, overall, the current generative AI systems still require systematic improvement and refinement in their ability to handle multidimensional complex clinical pharmacy problems. Clinical Trial: none


 Citation

Please cite as:

Li L, Wang A, Du P, Huang X, Zhao H, Ni M, Yan M

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study

JMIR Med Inform 2025;13:e76128

DOI: 10.2196/76128

PMID: 40705654

PMCID: 12288765

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.