JMIR Preprints #74299: Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: A Mixed-Methods Study on Enhancing Perioperative Patient Education

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: A Mixed-Methods Study on Enhancing Perioperative Patient Education

Chung Man Ho;
Shaowei Guan;
Prudence Kwan-Lam Mok;
Candice H.W Lam;
Wai Ying Ho;
Calvin Hoi-Kwan Mak;
Harry Qin;
Arkers Kwan Ching Wong;
Vivian Hui

ABSTRACT

Background:

Perioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. AI-powered chatbots have demonstrated efficacy in various healthcare contexts, but their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education AI chatbots have the potential to offer tailored perioperative guidance to improve patient education within this specialty.

Objective:

The aim of this study is to develop, validate, and assess NeuroBot, an AI-driven system that utilizes large language models (LLMs) with Retrieval-Augmented Generation (RAG) to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education.

Methods:

A mixed-methods approach was employed, consisting of three phases. In the first phase, internal validation, we compared the performance of Assistant API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale, with statistical analyses, including ANOVA and paired t-tests. In the second phase of external validation, ten neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the Intraclass Correlation Coefficient (ICC). Finally, in the third phase, a qualitative study was conducted through interviews with 18 healthcare providers, which helped identify key themes related to the NeuroBot's usability and perceived benefits. Thematic analysis was performed using NVivo, and inter-rater reliability was confirmed through Cohen's Kappa.

Results:

The Assistant API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28/6 (95% CI 5.21-5.35), with a statistically significant result (P <.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70/6 (95% CI: 5.46-5.94) for accuracy, 5.58/6 (95% CI: 5.45-5.94) for relevance, and 2.70/3 (95% CI: 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot's potential to reduce staff workload, enhance patient education, and deliver evidence-based responses.

Conclusions:

NeuroBot, leveraging LLMs with RAG technique, demonstrates the potential of LLMs-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledge, NeuroBot simplifies communication between professionals and patients, while ensuring patients have 24/7 access to reliable, evidence-based information. Further refinement and research will enhance NeuroBot’s ability to foster patient-centered communication, optimize clinical outcomes, and advance AI-driven innovations in healthcare delivery.

Citation

Please cite as:

Ho CM, Guan S, Mok PKL, Lam CH, Ho WY, Mak CHK, Qin H, Wong AKC, Hui V

Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education

J Med Internet Res 2025;27:e74299

DOI: 10.2196/74299

PMID: 40663377

PMCID: 12308165

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 21, 2025

Date Accepted: Jun 19, 2025

(closed for review but you can still tweet)

Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: A Mixed-Methods Study on Enhancing Perioperative Patient Education

ABSTRACT

Citation