Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 3, 2023
Date Accepted: May 4, 2024

The final, peer-reviewed published version of this preprint can be found here:

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study

Straw I, Rees G, Nachev P

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study

J Med Internet Res 2024;26:e46936

DOI: 10.2196/46936

PMID: 39186324

PMCID: 11384168

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: An Exploratory Study

  • Isabel Straw; 
  • Geraint Rees; 
  • Parashkev Nachev

ABSTRACT

BACKGROUND The presence of bias in AI systems has garnered increased attention over the past decade, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In healthcare, the inequitable performance of medical algorithms across demographic groups may widen health inequalities. Here we identify and characterise bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure. METHODS Stage 1 involved a literature search of PUBMED and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Articles that built ML models to predict cardiac disease were evaluated for their focus demographic bias in model performance, and open-source datasets were retained for our own investigation. Two open-source datasets were identified; (i) UCI Heart Failure Dataset, (ii) UCI Coronary Artery Disease Dataset. We reproduced existing algorithms that have been reported for these datasets and tested them for sex biases in algorithm performance. Particular attention was paid to disparities in the False Negative Rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment. A range of bias remediation techniques were implemented and assessed for their efficacy in reducing inequities, including dataset balancing, sex-specific feature selection and Fair Adversarial Gradient Tree Boosting. RESULTS In Stage 1, our literature search returned 127 articles of which 60 met the criteria for full review. Of these, only three papers highlighted sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of females in the datasets. No papers investigated racial or ethnic differences. In Stage 2, we reproduced algorithms reported in the literature achieving mean accuracies of 84.24% (3.51 SD) for Dataset 1, and 85.72% (1.75 SD) for Dataset 2 (Random Forest models). For Dataset 1, the FNR was significantly higher for females in 13 out of 16 experiments, meeting the threshold of statistical significance (-17.81% to -3.37%, p<0.05). A smaller disparity in the False Positive Rate (FPR) was significant for males in 13 out of 16 experiments (-0.48% to +9.77%, p<0.05)). We observed an overprediction of disease for males (higher FPR) and an underprediction of disease for females (higher FNR). Sex differences in feature importance suggests that feature selection needs to be demographically tailored. DISCUSSION Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in the algorithmic performance and explores several sources of bias. We found an underrepresentation of females in the datasets used to train algorithms, identified sex biases in model error rates and demonstrated that a series of remediation techniques were unable to address the inequities present.


 Citation

Please cite as:

Straw I, Rees G, Nachev P

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study

J Med Internet Res 2024;26:e46936

DOI: 10.2196/46936

PMID: 39186324

PMCID: 11384168

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.