JMIR Preprints #104856: Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

Yosef Adiniaev;
Mahmud Omar;
Yiftach Barash;
Olga R Brook;
Alon Gorenshtein;
Eyal Klang

ABSTRACT

Background:

Audits of medical large language models target unsafe answers, not benign non-answers. Claude Fable 5 incorporates a safety classifier that reroutes flagged requests to a fallback model (Claude Opus 4.8) before any answer is generated. Whether this fallback applies to routine consumer-health questions, which contain biomedical terminology, has not been examined.

Objective:

To characterize how often Claude Fable 5 routes consumer-health questions to its fallback model and whether fallback rates vary by clinical domain and question type.

Methods:

We entered the alphabetically first 500 unique questions from the HealthSearchQA consumer health-search benchmark into Claude Fable 5 once each (9–12 June 2026), coding each response as answered or routed to fallback. Two reviewers coded independently (94.8% agreement; Cohen kappa=0.90); 26 disagreements were resolved by a third reviewer. Gemini 2.5 Flash served as a comparator. Fallback rates are reported with Wilson 95% confidence intervals; chi-square tests examined association with domain and question type.

Results:

Fable 5 routed 243 of 500 questions (48.6%; 95% CI 44.2–53.0) to fallback. Fallback varied by clinical domain (chi-square P<0.001), highest in oncology (88.9%), reproductive health (79.3%), and infectious disease (68.0%), and by question type (chi-square P<0.001), highest for prognosis-framed questions (63.1%) and lowest for treatment questions (0.0%). Gemini 2.5 Flash answered 218 of the 243 fallback-triggering prompts (89.7%). Terms most associated with fallback included "survive" (95%), "cancer" (90%), and "cured" (87%).

Conclusions:

Benign fallback on consumer-health prompts is a measurable safety property that tracked disease vocabulary and question framing rather than unsafe intent. Medical AI audits should report fallback rates alongside answer quality.

Citation

Please cite as:

Adiniaev Y, Omar M, Barash Y, Brook OR, Gorenshtein A, Klang E

Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

JMIR Preprints. 16/06/2026:104856

DOI: 10.2196/preprints.104856

URL: https://preprints.jmir.org/preprint/104856

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR AI

Date Submitted: Jun 16, 2026

Open Peer Review Period: Jun 18, 2026 - Aug 13, 2026

(currently open for review)

Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

ABSTRACT

Citation

Copyright