Currently submitted to: JMIR AI
Date Submitted: Jan 30, 2026
Open Peer Review Period: Feb 10, 2026 - Apr 7, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Language-Level Safety Risk in AI-Generated Patient-Facing Messages: A Feasibility Simulation Study Using the SAFE-AI Message Guard Framework
ABSTRACT
Background:
Health systems increasingly deploy large language models (LLMs) to draft patient-facing messages, including patient portal replies and follow-up communications. While these tools may improve efficiency, safety failures often arise not from obvious factual errors but from how content is framed—diagnostic language that exceeds clinical scope, false certainty that minimizes legitimate concerns, or fabricated evidence presented as authoritative. These language-level risks remain poorly characterized and are not routinely addressed within clinical governance workflows.
Objective:
This study aimed to estimate the prevalence and types of language-level safety risks in AI-generated patient-facing messages and to assess the feasibility of a structured, clinician-led governance approach for identifying and acting on these risks prior to message delivery.
Methods:
We conducted a single-reviewer simulation feasibility study evaluating 200 AI-generated patient-facing messages representative of common patient portal and follow-up communication scenarios. Messages were generated using GPT-4 (OpenAI) and evaluated using the SAFE-AI Message Guard framework, a clinician-informed operational governance model for identifying language-level safety risks across four domains: (1) clinical scope violations involving non-delegable diagnostic determinations, (2) overconfidence or false reassurance through absolutist language, (3) hallucinated specifics including fabricated guidelines, statistics, or citations, and (4) bias, minimization, or ethical concerns. Messages could receive multiple flags across domains. A board-certified psychiatric-mental health nurse practitioner (PMHNP-BC) assigned severity classifications (high: block or mandatory rewrite required; medium: clinician review recommended; low: log for monitoring only) and recommended workflow actions for each flagged message. This study used only simulated AI-generated messages; no human subjects or protected health information were involved.
Results:
Of 200 messages evaluated, 102 (51.0%) received at least one language-level risk flag. At the message level, 80 messages (40.0%) were classified as high severity, requiring blocking or mandatory rewrite before patient delivery. Workflow actions were distributed as follows: 20 messages (10.0%) blocked, 20 (10.0%) required mandatory rewrite, 11 (5.5%) recommended for clinician review, and 149 (74.5%) allowed to proceed. At the flag level, 126 total risk flags were assigned across the 102 flagged messages (mean 1.24 flags per flagged message). By message-level category presence, overconfidence/false reassurance was most frequent (24 messages), followed by scope violations (20), hallucinated specifics (16), and bias/ethical risk (3). By flag-level severity, 80 flags (63.5%) were high severity and 46 (36.5%) were medium severity; no low-severity flags were assigned.
Conclusions:
Language-level safety risk in AI-generated patient-facing messages is frequent and clinically meaningful, affecting more than half of evaluated messages. A structured, clinician-defined governance framework can feasibly identify scope violations, overconfidence, and hallucinated content, providing an auditable mechanism to reduce the likelihood of unsafe messages reaching patients. Health systems deploying generative AI for patient communication should incorporate language-level safety evaluation into governance workflows. Multi-reviewer validation studies and development of automated detection methods are needed before operational deployment at scale.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.