JMIR Preprints #81629: A Comparison of Large Language Models in Support for Different Stakeholders against the Fentanyl Crisis：Performance evaluation of multiple models

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Comparison of Large Language Models in Support for Different Stakeholders against the Fentanyl Crisis：Performance evaluation of multiple models

Siyu Cao;
Cain Clark;
Wai Tong Chien;
Nick Tse;
Mamateli Obul;
Dilireba Shataer;
Haipeng Liu

ABSTRACT

Background:

The fentanyl crisis is an urgent public health challenge, where the lack of knowledge causes an increasing mortality due to overdose. Large language models (LLMs) have shown great potential in medical fields, such as telemedicine and health education, while their benefits for different stakeholders in combating the fentanyl crisis warrants further investigation.

Objective:

This study aims to systematically evaluate the quality differences in real-time fentanyl-related guidance provided by six LLMs to users, first responders, clinicians, and policymakers. Clarify the advantages and disadvantages of different LLMS in the four major scenarios of identifying fentanyl, implementing emergency rescue, clinical diagnosis and treatment, and public health decision-making. To provide evidence-based evidence for the construction of a precise, reliable and multilingual fentanyl crisis intervention tool based on LLM, in order to reduce the risk of excessive deaths caused by knowledge gaps.

Methods:

We compared six LLMs, i.e., ChatGPT 3.5, Gemini 1.5 Flash, YouChat Smart, Copilot, Perplexity and Luzia regarding their ability to answer fentanyl-related questions. The performance of the models in various scenarios was scored by two experts and analyzed using analysis of variation (ANOVA), linear mixed models (LMM), and Cohen’s Kappa consistency test.

Results:

The LLMs performance significantly differed between question types (p<0.05 in ANOVA), whilst LMM confirmed that ChatGPT outperformed all other models across categories, with the largest effect sizes found when comparing ChatGPT to Gemini Bard 1.5 Flash and Copilot Bing Chat. Individually, Gemini performed well in user-related questions, but is relatively weak in first-aid-related questions. Luzia on WhatsApp performs moderately in first-aid-related questions, but poorly in clinical and policy-making ones. Perplexity scores are relatively high in clinical questions, but its overall consistency is poor. YouChat Smart and Copilot generally scored low in all scenarios and had poor stability.

Conclusions:

LLMs can provide real-time guidance for users, first aiders, clinicians, and policymakers, with different in performance between LLMs in different types of questions. The selection of LLM in answering fentanyl-related questions should be based on specific scenarios.

Citation

Please cite as:

Cao S, Clark C, Chien WT, Tse N, Obul M, Shataer D, Liu H

A Comparison of Large Language Models in Support for Different Stakeholders against the Fentanyl Crisis：Performance evaluation of multiple models

JMIR Preprints. 31/07/2025:81629

DOI: 10.2196/preprints.81629

URL: https://preprints.jmir.org/preprint/81629

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: Journal of Medical Internet Research (no longer under consideration since Jan 28, 2026)

Date Submitted: Jul 31, 2025

A Comparison of Large Language Models in Support for Different Stakeholders against the Fentanyl Crisis：Performance evaluation of multiple models

ABSTRACT

Citation

Copyright