JMIR Preprints #49546: Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study

Shunit Agmon;
Uriel Singer;
Kira Radinsky

ABSTRACT

Background:

Women have been under-represented in clinical trials for many years. Machine learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing (NLP) systems. If word embeddings are trained on clinical trial abstracts, predictive models which use the embeddings will exhibit gender performance gaps.

Objective:

To capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks.

Methods:

We present TeDi-BERT, a method to harness the temporal trend of increasing women inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on two clinical tasks: prediction of unplanned readmission to the ICU, and hospital length of stay prediction. We also conduct algorithmic analysis of the proposed method.

Results:

In readmission prediction, TeDi-BERT achieved AUC of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In length of stay regression, TeDi-BERT achieved a MAE of 4.56 for female patients versus 4.62 (P<.001) and 4.54 for male patients versus 4.60 (P<.001).

Conclusions:

In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data.

Citation

Please cite as:

Agmon S, Singer U, Radinsky K

Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study

JMIR AI 2024;3:e49546

DOI: 10.2196/49546

PMID: 39357045

PMCID: 11483253

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Jun 2, 2023

Date Accepted: Jul 28, 2024

Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study

ABSTRACT

Citation

Copyright