Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 23, 2024
Date Accepted: Mar 12, 2025

The final, peer-reviewed published version of this preprint can be found here:

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

Wei B, Yao LL, Hu X, Hu YX, Rao J, Ji Y, Dong ZE, Duan YC, Wu Xr

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

J Med Internet Res 2025;27:e67883

DOI: 10.2196/67883

PMID: 40209226

PMCID: 12022522

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients with Ocular Myasthenia Gravis

  • Bin Wei; 
  • Li-Li Yao; 
  • Xin Hu; 
  • Yu-Xiang Hu; 
  • Jie Rao; 
  • Yu Ji; 
  • Zhuo-Er Dong; 
  • Yi-Chong Duan; 
  • Xiao-rong Wu

ABSTRACT

Background:

In recent years, the rapid advancements in deep learning and artificial intelligence have highlighted the potential applications of large language models (LLMs) across various fields. In the medical domain, LLMs have emerged as promising tools for facilitating communication and providing educational resources to both healthcare professionals and patients. However, the effectiveness, accuracy, and applicability of these models in clinical practice remain under investigation. In China, where the patient population is large and medical resources are relatively constrained, the use of LLMs to improve patient education and disease management presents a crucial research opportunity. Ocular myasthenia gravis (OMG) is a common neuromuscular junction disorder that primarily affects the extraocular muscles, leading to symptoms such as ptosis and diplopia. Early diagnosis and effective management of OMG are critical to preventing progression to generalized myasthenia gravis (GMG). However, due to the limited time physicians can spend with patients, many patients rely on the Internet for medical information, where they may encounter inaccurate or misleading advice. This study aims to evaluate the effectiveness of LLMs in providing health education to Chinese patients with OMG by exploring real-world patient interactions with LLMs and examining the potential of these models in delivering medical information effectively and safely.

Objective:

To evaluate the effectiveness of various large language models (LLMs) in providing health education to Chinese patients with ocular myasthenia gravis (OMG).

Methods:

The study was conducted in two phases: 130 choice ophthalmology exam questions were input into five different LLMs. Their performance was compared with that of undergraduates, master's students, and ophthalmology residents. Additionally, 23 common OMG-related patient questions were posed to four LLMs, and their responses were evaluated by ophthalmologists across five domains. Second Phase: 20 OMG patients interacted with the two LLMs from the first phase, each asking three questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the five domains.

Results:

ChatGPT o1-preview achieved the highest accuracy rate of 73.1% on 130 ophthalmology exam questions, outperforming other LLMs and professional groups like undergraduates and master's students. For 23 common ocular myasthenia gravis (OMG)-related patient questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 OMG patients, ChatGPT o1-preview received higher satisfaction scores than ERNIE 3.5 (4.40 vs. 3.89, P = .002), although ERNIE 3.5's responses were slightly more readable (4.31 vs. 4.03, P = .01).

Conclusions:

LLMs like ChatGPT o1-preview show significant potential in enhancing patient education for OMG. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice.


 Citation

Please cite as:

Wei B, Yao LL, Hu X, Hu YX, Rao J, Ji Y, Dong ZE, Duan YC, Wu Xr

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

J Med Internet Res 2025;27:e67883

DOI: 10.2196/67883

PMID: 40209226

PMCID: 12022522

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.