Accepted for/Published in: JMIR Medical Informatics
Date Submitted: May 6, 2024
Date Accepted: Aug 17, 2024
Enhancing Bias Assessment for Complex Term Groups in Language Embeddings Models: Methods Study
ABSTRACT
Background:
Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people world-wide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary.
Objective:
In natural language processing (NLP) applications, the Word Embedding Association Test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of WEAT have been identified (ie, their non-robust measure of bias and their reliance on pre-defined and limited groups of words/sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains.
Methods:
In this study, we introduce SD-WEAT, which is a modified version of WEAT that utilizes the standard deviation (SD) of multiple permutations of the WEAT tests to calculate bias in input embeddings. With SD-WEAT, we evaluated the biases and stability of several language embedding models, including GloVe, Word2Vec, and BERT.
Results:
This method produces results comparable to that of WEAT, with strong correlations between the methods’ bias scores/effect sizes (r=0.786) and P values (r=0.776), while addressing some of its largest limitations. More specifically, SD-WEAT is more accessible as it removes the need to pre-define attribute groups, and because SD-WEAT measures bias over multiple runs rather than one, it reduces the impact of outliers and sample size. Furthermore, SD-WEAT was found to be more consistent and reliable than its predecessor.
Conclusions:
Thus, SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.