Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR AI

Date Submitted: Jun 16, 2026
Open Peer Review Period: Jun 18, 2026 - Aug 13, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

  • Yosef Adiniaev; 
  • Mahmud Omar; 
  • Yiftach Barash; 
  • Olga R Brook; 
  • Alon Gorenshtein; 
  • Eyal Klang

ABSTRACT

Background:

Audits of medical large language models target unsafe answers, not benign non-answers. Claude Fable 5 incorporates a safety classifier that reroutes flagged requests to a fallback model (Claude Opus 4.8) before any answer is generated. Whether this fallback applies to routine consumer-health questions, which contain biomedical terminology, has not been examined.

Objective:

To characterize how often Claude Fable 5 routes consumer-health questions to its fallback model and whether fallback rates vary by clinical domain and question type.

Methods:

We entered the alphabetically first 500 unique questions from the HealthSearchQA consumer health-search benchmark into Claude Fable 5 once each (9–12 June 2026), coding each response as answered or routed to fallback. Two reviewers coded independently (94.8% agreement; Cohen kappa=0.90); 26 disagreements were resolved by a third reviewer. Gemini 2.5 Flash served as a comparator. Fallback rates are reported with Wilson 95% confidence intervals; chi-square tests examined association with domain and question type.

Results:

Fable 5 routed 243 of 500 questions (48.6%; 95% CI 44.2–53.0) to fallback. Fallback varied by clinical domain (chi-square P<0.001), highest in oncology (88.9%), reproductive health (79.3%), and infectious disease (68.0%), and by question type (chi-square P<0.001), highest for prognosis-framed questions (63.1%) and lowest for treatment questions (0.0%). Gemini 2.5 Flash answered 218 of the 243 fallback-triggering prompts (89.7%). Terms most associated with fallback included "survive" (95%), "cancer" (90%), and "cured" (87%).

Conclusions:

Benign fallback on consumer-health prompts is a measurable safety property that tracked disease vocabulary and question framing rather than unsafe intent. Medical AI audits should report fallback rates alongside answer quality.


 Citation

Please cite as:

Adiniaev Y, Omar M, Barash Y, Brook OR, Gorenshtein A, Klang E

Consumer-Health Questions Trigger Safety-Filter Fallback in Claude Fable 5: A Point-in-Time Audit

JMIR Preprints. 16/06/2026:104856

DOI: 10.2196/preprints.104856

URL: https://preprints.jmir.org/preprint/104856

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.