JMIR Preprints #103308: Evaluating Search-Enabled Large Language Model Interfaces for Medication Counseling in Secondary Stroke Prevention: A Multi-Metric Comparative Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating Search-Enabled Large Language Model Interfaces for Medication Counseling in Secondary Stroke Prevention: A Multi-Metric Comparative Study

Zhi Wang;
Yi Zhu;
Lan Xu;
Jingshi Wang

ABSTRACT

Background:

Large language models (LLMs) are increasingly used by patients seeking medication advice. Their quality for secondary stroke prevention counseling has not been well characterized.

Objective:

To compare five widely used search-enabled consumer LLM interfaces on patient-facing medication counseling for secondary stroke prevention across fourteen evaluation metrics covering safety, clinical accuracy, information quality, readability, empathy, actionability, and model test-retest stability, operationalized as lexical text stability.

Methods:

A 56-item English-language question bank was developed from current stroke prevention guidelines and submitted to five consumer LLM interfaces (ChatGPT, Claude, Gemini, DeepSeek, Doubao) via their official web interfaces on May 1, 2026, with repeat querying on May 8, 2026 to assess model test-retest stability. All systems were accessed using a logged-in account with web search enabled via a US-based connection. Responses were independently rated by two blinded raters. Non-parametric tests with Benjamini-Hochberg correction were applied.

Results:

Clinical accuracy was high and uniform across models (mean 4.44-4.52/5; Friedman p = 0.578). Gemini, DeepSeek, and Doubao scored significantly higher on EQIP (70.2-70.7 vs. 63.6-64.1; p < 0.001) and DISCERN (p < 0.001) than ChatGPT and Claude. All models substantially exceeded commonly used patient-education readability benchmarks (FKGL 11.1-14.4; benchmark <=6; FRES 33.4-46.8; benchmark >=60). ChatGPT had the highest unsafe response rate (14.3% vs. 7.1-10.7%).

Conclusions:

In this controlled evaluation of researcher-generated questions, the tested search-enabled LLM interfaces produced broadly accurate responses for secondary stroke prevention medication counseling, but weaknesses in readability, source transparency, and safety indicate that readability optimization, source-attribution prompting, and clinical review are needed before patient-facing use.

Citation

Please cite as:

Wang Z, Zhu Y, Xu L, Wang J

Evaluating Search-Enabled Large Language Model Interfaces for Medication Counseling in Secondary Stroke Prevention: A Multi-Metric Comparative Study

JMIR Preprints. 02/06/2026:103308

DOI: 10.2196/preprints.103308

URL: https://preprints.jmir.org/preprint/103308

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jun 2, 2026

Open Peer Review Period: Jun 3, 2026 - Jul 29, 2026

(currently open for review)

Evaluating Search-Enabled Large Language Model Interfaces for Medication Counseling in Secondary Stroke Prevention: A Multi-Metric Comparative Study

ABSTRACT

Citation

Copyright