JMIR Preprints #98385: Clinical Accuracy and Safety of a Locally Hosted Large Language Model for Pediatric E-Consults: A Blinded Multi-Subspecialty Evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Clinical Accuracy and Safety of a Locally Hosted Large Language Model for Pediatric E-Consults: A Blinded Multi-Subspecialty Evaluation

Marleah Knights;
Amna Umer;
David Rich;
Juggy Jaganathan;
Lee Pyles;
Michael Sweetman;
Audra Rouster;
David Huss;
Lawrence Morton;
Matthew Thomas;
Rouba Sayegh;
Brian Ely;
Evan Jones;
Jai Udassi;
Rafka Chaiban;
Collin John;
Bryce Harvey;
Charles Jacob Mullett

ABSTRACT

Background:

Electronic consultations (e-consults) improve access to pediatric subspecialty care, particularly in rural settings, but rising consult volume contributes to subspecialist documentation burden, creating interest in whether large language models can safely assist with draft response generation.

Objective:

To evaluate the clinical utility, safety, and accuracy of a locally hosted, open-source large language model (LLM) in drafting pediatric subspecialty e-consult responses.

Methods:

We compared AI-generated consult drafts (Qwen3-30B, hospital-hosted) with human subspecialist-written e-consults for 50 real pediatric cases. Blinded pediatric subspecialists (n=50 case ratings) and generalists (n=20 case ratings) assessed accuracy, appropriateness, communication quality, and safety using structured rating instruments. Reviewer free-text comments underwent thematic analysis.

Results:

Among 50 cases, 60% of AI-generated drafts were rated as reasonable medical advice compared with 98% of physician-authored consults. False statements were identified in 39% of AI drafts, incorrect details in 58%, and potentially harmful omissions in 30%. Despite these errors, 70% of AI drafts were considered safe and potentially useful as initial drafts under specialist oversight. Performance varied by subspecialty, with neurology drafts most frequently rated reasonable (90%) and infectious disease and endocrinology drafts rated lower (40%-60%). Generalists found AI drafts understandable and comfortable to act upon in 80% of cases.

Conclusions:

While locally hosted LLMs show promise as drafting assistants to improve efficiency, high rates of clinical inaccuracies preclude their autonomous use. Specialty-specific guardrails and rigorous human oversight remain essential for safe implementation.

Citation

Please cite as:

Knights M, Umer A, Rich D, Jaganathan J, Pyles L, Sweetman M, Rouster A, Huss D, Morton L, Thomas M, Sayegh R, Ely B, Jones E, Udassi J, Chaiban R, John C, Harvey B, Mullett CJ

Clinical Accuracy and Safety of a Locally Hosted Large Language Model for Pediatric E-Consults: A Blinded Multi-Subspecialty Evaluation

JMIR Preprints. 21/04/2026:98385

DOI: 10.2196/preprints.98385

URL: https://preprints.jmir.org/preprint/98385

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Apr 21, 2026

Open Peer Review Period: Apr 28, 2026 - Jun 23, 2026

(currently open for review)

Clinical Accuracy and Safety of a Locally Hosted Large Language Model for Pediatric E-Consults: A Blinded Multi-Subspecialty Evaluation

ABSTRACT

Citation

Copyright