Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 3, 2023
Date Accepted: May 4, 2024
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Sex-related Performance Disparities in Cardiac Machine Learning Algorithms: A Quantitative Evaluation of Fairness
ABSTRACT
Background:
Inequitable performance of medical algorithms applied to different demographic groups may widen health inequalities, negatively affecting marginalised groups. Here we identify and characterise bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure.
Objective:
To evaluate performance disparities in machine learning algorithms deployed to predict cardiac disease.
Methods:
We searched the literature using PUBMED and Web of Science for key terms relating to cardiac machine learning algorithms. This returned 129 articles within which we identified open-source datasets used by authors to build their algorithms, in order to recreate and investigate these algorithms in our own research. Two open-source datasets were identified; (i) UCI Heart Failure Dataset, (ii) UCI Coronary Artery Disease Dataset. We reproduced existing algorithms reported for these datasets and tested for sex biases in algorithm performance. We paid particular attention to the False Negative Rate (FNR), which reflects missed diagnoses. We implement a range of remediation techniques and evaluated their efficacy in reducing performance inequities.
Results:
We reproduced the accuracy of algorithms reported in the literature achieving mean accuracies of 84.24% (3.51 SD) for Dataset 1, and 85.72% (1.75 SD) for Dataset 2. For Dataset 1, the FNR is significantly higher for females in 15 of 16 experiments (-17.81% to +1.55, p<0.05). A smaller disparity in the False Positive Rate (FPR) was significant for males in 14 of 16 experiments (+2.11% to +9.77%, p<0.05)). We observed an overprediction of disease for males (higher FPR) and an underprediction of disease for females (higher FNR). Sex differences in feature importance illustrated that feature selection needs to be demographically tailored.
Conclusions:
Our research identifies a previously unknown sex disparity in the performance of cardiac algorithms. We found an underrepresentation of females in the datasets used to train algorithms, identified sex biases in existing models and demonstrated that a series of remediation techniques were unable to address the inequities present. Clinical Trial: N/A
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.