Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR AI

Date Submitted: May 22, 2026
Open Peer Review Period: May 25, 2026 - Jul 20, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Detecting and Mitigating AI Bias in Healthcare: Development and Validation of a Unified Multi-Stage Framework

  • Chetneti Srisaan; 
  • Ruj Mateedulsatit

ABSTRACT

Background:

AI-driven clinical systems can improve diagnosis, prognosis, and resource allocation, but they may reproduce disparities encoded in historical healthcare data. Existing mitigation methods typically target a single source of bias, while clinical datasets often contain interacting representation, proxy, integrity, and temporal biases.

Objective:

This study develops and evaluates a unified multi-stage framework for detecting and mitigating multiple forms of bias in structured healthcare machine learning data.

Methods:

We designed a compositional pipeline, D_clean = T_temp(T_int(T_proxy(T_repr(D)))), in which each stage conditions on the corrected output of the previous stage. To address cross-dataset heterogeneity, all datasets were first mapped into a prespecified harmonized clinical-concept space with explicit missing-concept masks. The five anchor features used for alignment were race, sex, age_group, income_proxy, and n_prior_visits. The final harmonized representation contained 37 clinical concepts plus 37 corresponding binary mask indicators, yielding a 74-dimensional model input after categorical expansion and mask concatenation. The primary model was trained on the Diabetes 130-US Hospitals dataset. External validation used CMS SynPUF for readmission prediction and NHANES for stage-level fairness and distributional stress testing rather than unsupported direct outcome transfer. Integrity bias was assessed with distributional tests appropriate to each variable type; Benford-style leading-digit analysis was restricted to unbounded count or charge-like variables and was not applied to bounded physiological laboratory values such as HbA1c.

Results:

On the primary Diabetes 130-US Hospitals test split, the proposed pipeline improved AUC from 0.798 to 0.812 and reduced Demographic Parity Difference (DPD) from 0.134 to 0.052. The DPD reduction was statistically significant (bootstrap 95% CI -0.094 to -0.069; paired permutation P < .001). On CMS SynPUF after harmonized concept mapping, DPD decreased from 0.141 to 0.066. NHANES stage-level validation showed improved representation balance and proxy attenuation, while HbA1c integrity checks were evaluated with bounded-variable distributional baselines rather than Benford's Law. Mixture-of-experts processing isolated 7,938 of 101,766 records (7.8%) flagged for integrity concerns and improved fairness without discarding records

Conclusions:

A coordinated data-centric pipeline can improve both fairness and predictive performance when dataset heterogeneity, variable-specific integrity assumptions, and subgroup-specific processing are made explicit. The revised framework resolves the methodological risk of unsupported zero-shot transfer by introducing a harmonized concept layer and reporting outcome validation only where the prediction task and feature space are aligned


 Citation

Please cite as:

Srisaan C, Mateedulsatit R

Detecting and Mitigating AI Bias in Healthcare: Development and Validation of a Unified Multi-Stage Framework

JMIR Preprints. 22/05/2026:102146

DOI: 10.2196/preprints.102146

URL: https://preprints.jmir.org/preprint/102146

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.