Currently submitted to: JMIR AI
Date Submitted: Jun 16, 2026
Open Peer Review Period: Jun 18, 2026 - Aug 13, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit
ABSTRACT
Background:
Audits of medical large language models target unsafe answers, not benign non-answers. Claude Fable 5 incorporates a safety classifier that reroutes flagged requests to a fallback model (Claude Opus 4.8) before any answer is generated. Whether this fallback applies to routine consumer-health questions, which contain biomedical terminology, has not been examined.
Objective:
To characterize how often Claude Fable 5 routes consumer-health questions to its fallback model and whether fallback rates vary by clinical domain and question type.
Methods:
We entered the alphabetically first 500 unique questions from the HealthSearchQA consumer health-search benchmark into Claude Fable 5 once each (9–12 June 2026), coding each response as answered or routed to fallback. Two reviewers coded independently (94.8% agreement; Cohen kappa=0.90); 26 disagreements were resolved by a third reviewer. Gemini 2.5 Flash served as a comparator. Fallback rates are reported with Wilson 95% confidence intervals; chi-square tests examined association with domain and question type.
Results:
Fable 5 routed 243 of 500 questions (48.6%; 95% CI 44.2–53.0) to fallback. Fallback varied by clinical domain (chi-square P<0.001), highest in oncology (88.9%), reproductive health (79.3%), and infectious disease (68.0%), and by question type (chi-square P<0.001), highest for prognosis-framed questions (63.1%) and lowest for treatment questions (0.0%). Gemini 2.5 Flash answered 218 of the 243 fallback-triggering prompts (89.7%). Terms most associated with fallback included "survive" (95%), "cancer" (90%), and "cured" (87%).
Conclusions:
Benign fallback on consumer-health prompts is a measurable safety property that tracked disease vocabulary and question framing rather than unsafe intent. Medical AI audits should report fallback rates alongside answer quality.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.