Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 7, 2026
Date Accepted: May 18, 2026
Evaluation of Five Large Language Models for Parental Education in Pediatric Anesthesia: Reliability and Readability Study
ABSTRACT
Background:
Large Language Models (LLMs) are increasingly utilized in healthcare to generate detailed medical responses. However, their performance in providing reliable and readable information for pediatric anesthesia remains unclear.
Objective:
To evaluate the reliability and readability of LLM responses to parental inquiries regarding pediatric anesthesia.
Methods:
On December 14, 2025, five LLMs (DeepSeek-V3.2, ChatGPT-5, Gemini 2.5 Flash, Copilot, and Perplexity) accessed via official web-based interfaces were evaluated. Thirty-three parental inquiries from multiple authoritative sources were used for zero-shot prompting to generate responses. Two blinded senior anesthesiologists independently assessed the reliability using the DISCERN instrument, Ensuring Quality Information for Patients (EQIP) tool, Journal of the American Medical Association (JAMA) benchmark, and Global Quality Score (GQS). Readability was evaluated using six automated indices.
Results:
Perplexity showed superior reliability on DISCERN (median 41; P<.05), yet no model achieved a “good” rating. Crucially, qualitative analysis revealed safety hazards, such as Perplexity’s misleading binary summary regarding breastfeeding, which contradicted preoperative fasting protocols. Gemini exhibited structural-quality dissociation, achieving the highest EQIP (median 90; P<.001) despite lower GQS (median 3). Transparency was universally poor (JAMA median ≤1), with DeepSeek and ChatGPT showing a “floor effect”. ChatGPT had superior readability, but all models exceeded the recommended sixth-grade complexity level.
Conclusions:
Current LLMs are insufficient as standalone resources. Structural-quality dissociation, poor transparency, and poor readability pose safety risks. Consequently, strict clinical professional review is mandatory until future models simultaneously ensure clinical reliability and optimize patient-centered readability.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.