JMIR Preprints #81720: Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis

Meryem Tuba Sönmez;
Mehmet Fatih Yetkin;
Duygu Arslan Mehdiyev;
Nazlı Durmaz Çelik;
Merve Bahar Ercan;
Pınar Öztürk;
Yeşim Eylev Akboğa;
Emine Rabia Koç;
Semra Mungan

ABSTRACT

Background:

Large language model-based chatbots are increasingly used by the public to access medical information. While these tools offer considerable potential in terms of accessibility and scalability, their accuracy, transparency, and clarity remain insufficiently evaluated for rare and diagnostically complex conditions such as myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD).

Objective:

This study aimed to evaluate the quality, comprehensibility, transparency, and readability of responses generated by widely used AI chatbot platforms in response to a standardized, patient-centered question about MOGAD.

Methods:

We conducted a cross-sectional content analysis using the query: “What is MOGAD, and how is MOGAD treated?” Ten widely used chatbot platforms were selected to reflect diversity in architecture, access model, and functional design. Responses were collected on the same day, anonymized, and independently evaluated by seven blinded neurologists. Validated instruments were used, including DISCERN (treatment quality), PEMAT-P (understandability), Web Resource Rating (WRR; citation transparency), and two readability metrics: Flesch–Kincaid Grade Level (FKGL) and Coleman–Liau Index (CLI). Chatbots were also compared by access type (free vs paid) and functional focus (conversation-based vs search-based). Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs).

Results:

Significant differences were observed across platforms in DISCERN, PEMAT-P, and WRR scores (all p < 0.001). Paid chatbots demonstrated higher treatment quality (p = 0.020) and citation transparency (p = 0.001) compared to free versions. Search-based models produced more understandable responses than conversation-based ones (p = 0.035). However, none of the chatbot responses achieved the recommended readability threshold for public-facing health communication (FKGL < 8). Inter-rater agreement was excellent across all expert-rated measures (ICC ≥ 0.838).

Conclusions:

AI chatbot responses to patient queries about MOGAD vary widely in quality, clarity, and transparency. These findings highlight the need for structured benchmarking, transparent evaluation frameworks, and thoughtful oversight in the use of generative AI tools for digital health communication, particularly in the context of rare and clinically complex diseases.

Citation

Please cite as:

Sönmez MT, Yetkin MF, Mehdiyev DA, Çelik ND, Ercan MB, Öztürk P, Akboğa YE, Koç ER, Mungan S

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody–Associated Disease: Cross-Sectional Content Analysis

JMIR Med Inform 2026;14:e81720

DOI: 10.2196/81720

PMID: 42054566

PMCID: 13128063

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 2, 2025

Open Peer Review Period: Aug 15, 2025 - Oct 10, 2025

Date Accepted: Feb 24, 2026

(closed for review but you can still tweet)

Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis

ABSTRACT

Citation

Copyright