JMIR Preprints #71757: Detecting, Characterizing and Mitigating Implicit and Explicit Racial Biases in Healthcare Datasets with Subgroup Learnability

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Detecting, Characterizing and Mitigating Implicit and Explicit Racial Biases in Healthcare Datasets with Subgroup Learnability

Faris Gulamali;
Ashwin Sawant;
Lora Liharska;
Carol Horowitz;
Lili Chan;
Ira Hofer;
Karandeep Singh;
Lynne Richardson;
Emmanuel Mensah;
Alexander Charney;
David Reich;
Jianying Hu;
Girish Nadkarni

ABSTRACT

Background:

The growing adoption of diagnostic and prognostic algorithms in healthcare has led to concerns about the perpetuation of algorithmic bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success and tradeoffs. However, there have been limited substantive efforts to address bias at the level of the data used to generate algorithms in healthcare datasets.

Objective:

We create a simple metric (AEquity) that utilizes a learning curve approximation to distinguish and mitigate bias via guided dataset collection or relabeling.

Methods:

We demonstrate this metric in two well-known examples: chest X-rays and healthcare cost utilization, and detect novel biases in the National Health and Nutrition Examination Survey.

Results:

We demonstrate that utilizing AEquity to guide data-centric collection for each diagnostic finding in the chest radiograph dataset decreased bias by between 29% and 96.5% when measured by differences in area-under-the-curve. When we examined Black patients on Medicaid, at the intersection of race and socioeconomic status, we found that AEquity-based interventions reduced bias across a number of different fairness metrics including overall false negative rate by 33.3% (Bias Reduction Absolute = 1.88 x 10-1; 95% CI (1.4x10-1, 2.5x10-1); Bias Reduction (%) 33.3% (95% CI, 26.6-40.0)), Precision Bias by 7.50x10-2; 95% CI (7.48x10-2, 7.51x10-2); Bias Reduction (%) 94.6% (95% CI, 94.5-94.7%); False Discovery Rate by 94.5% (Absolute Bias Reduction = 3.50x10-2; 95% CI: (3.49x10-2, 3.50x10-2). Similarly, AEquity-guided data collection demonstrates bias reduction of up to 80% on mortality prediction with the National Health and Nutrition Examination Survey (Bias Reduction Absolute = 0.08; 95% CI (0.07, 0.09)). Additionally, we benchmark against balanced empirical risk minimization and calibration and we show that AEquity-guided data collection outperforms both standard approaches. Moreover, we demonstrate that AEquity works on fully connected networks, convolutional neural networks such as ResNet-50, transformer architectures such as on VIT-B-16, an 86 million parameter Vision Transformer, and nonparametric methods such as LightGBM

Conclusions:

In short, we demonstrate AEquity is a robust tool by applying it to different datasets and algorithms, intersectional analyses and measuring its effectiveness with respect to a range of traditional fairness metrics.

Citation

Please cite as:

Gulamali F, Sawant A, Liharska L, Horowitz C, Chan L, Hofer I, Singh K, Richardson L, Mensah E, Charney A, Reich D, Hu J, Nadkarni G

Detecting, Characterizing, and Mitigating Implicit and Explicit Racial Biases in Health Care Datasets With Subgroup Learnability: Algorithm Development and Validation Study

J Med Internet Res 2025;27:e71757

DOI: 10.2196/71757

PMID: 40905712

PMCID: 12410029

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 25, 2025

Open Peer Review Period: Jan 25, 2025 - Mar 22, 2025

Date Accepted: Apr 3, 2025

(closed for review but you can still tweet)

Detecting, Characterizing and Mitigating Implicit and Explicit Racial Biases in Healthcare Datasets with Subgroup Learnability

ABSTRACT

Citation

Copyright