Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 2, 2025
Open Peer Review Period: Aug 15, 2025 - Oct 10, 2025
Date Accepted: Feb 24, 2026
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody–Associated Disease: Cross-Sectional Content Analysis

Sönmez MT, Yetkin MF, Mehdiyev DA, Çelik ND, Ercan MB, Öztürk P, Akboğa YE, Koç ER, Mungan S

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody–Associated Disease: Cross-Sectional Content Analysis

JMIR Med Inform 2026;14:e81720

DOI: 10.2196/81720

PMID: 42054566

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Multidimensional Evaluation of AI Chatbot Responses to a Standardized Patient Query on MOGAD: A Blinded Expert Analysis

  • Meryem Tuba Sönmez; 
  • Mehmet Fatih Yetkin; 
  • Duygu Arslan Mehdiyev; 
  • Nazlı Durmaz Çelik; 
  • Merve Bahar Ercan; 
  • Pınar Öztürk; 
  • Yeşim Eylev Akboğa; 
  • Emine Rabia Koç; 
  • Semra Mungan

ABSTRACT

Background:

Large language model-based chatbots are increasingly used by the public to access medical information. While these tools offer considerable potential in terms of accessibility and scalability, their accuracy, transparency, and clarity remain insufficiently evaluated for rare and diagnostically complex conditions such as myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD).

Objective:

This study aimed to evaluate the quality, comprehensibility, transparency, and readability of responses generated by widely used AI chatbot platforms in response to a standardized, patient-centered question about MOGAD.

Methods:

We conducted a cross-sectional content analysis using the query: “What is MOGAD, and how is MOGAD treated?” Ten widely used chatbot platforms were selected to reflect diversity in architecture, access model, and functional design. Responses were collected on the same day, anonymized, and independently evaluated by seven blinded neurologists. Validated instruments were used, including DISCERN (treatment quality), PEMAT-P (understandability), Web Resource Rating (WRR; citation transparency), and two readability metrics: Flesch–Kincaid Grade Level (FKGL) and Coleman–Liau Index (CLI). Chatbots were also compared by access type (free vs paid) and functional focus (conversation-based vs search-based). Inter-rater reliability was assessed using intraclass correlation coefficients (ICCs).

Results:

Significant differences were observed across platforms in DISCERN, PEMAT-P, and WRR scores (all p < 0.001). Paid chatbots demonstrated higher treatment quality (p = 0.020) and citation transparency (p = 0.001) compared to free versions. Search-based models produced more understandable responses than conversation-based ones (p = 0.035). However, none of the chatbot responses achieved the recommended readability threshold for public-facing health communication (FKGL < 8). Inter-rater agreement was excellent across all expert-rated measures (ICC ≥ 0.838).

Conclusions:

AI chatbot responses to patient queries about MOGAD vary widely in quality, clarity, and transparency. These findings highlight the need for structured benchmarking, transparent evaluation frameworks, and thoughtful oversight in the use of generative AI tools for digital health communication, particularly in the context of rare and clinically complex diseases.


 Citation

Please cite as:

Sönmez MT, Yetkin MF, Mehdiyev DA, Çelik ND, Ercan MB, Öztürk P, Akboğa YE, Koç ER, Mungan S

Evaluation of AI Chatbot Responses to a Standardized Patient Query on Myelin Oligodendrocyte Glycoprotein Antibody–Associated Disease: Cross-Sectional Content Analysis

JMIR Med Inform 2026;14:e81720

DOI: 10.2196/81720

PMID: 42054566

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.