Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 2, 2022
Date Accepted: Jul 8, 2022
Backdoor Attack on Machine Learning Models of Electronic Health Records: Exploiting Missing Value Patterns
ABSTRACT
Background:
A backdoor attack controls the output of a ML model using two stages. First, the attacker poisons the training dataset, introducing a backdoor into the victim's trained model. Second, during test time, the attacker adds an imperceptible pattern called a trigger to any input, which forces the victim’s model to output the attacker's intended values instead of true predictions or decisions. While the backdoor attack poses a serious threat to the reliability of any machine learning-based medical diagnostics, existing backdoor attacks that directly change the input values are detectable relatively easily.
Objective:
The goal of this study is to propose and study a robust backdoor attack on mortality prediction ML models that use Electronic Health Records (EHR). We show that our backdoor attack grants attacker’s full control over classification outcomes for safety-critical tasks such as mortality prediction, highlighting the importance of undertaking safe AI research in the medical field.
Methods:
To this end, we present a trigger generation method based on missing patterns of EHR data. Compared to existing approaches that introduce noise into the medical record, the proposed backdoor attack makes it simple to construct backdoor triggers without prior knowledge. To effectively avoid detection by manual inspectors, we employ Variational Autoencoders (VAEs) to learn the missing patterns in normal EHR data and produce trigger data that appears similar to this data.
Results:
We experimented with the proposed backdoor attack on four ML models (LR, MP, LSTM, and GRU) that predict in-hospital mortality using a public EHR dataset. Results showed that the proposed technique achieved a high attack success rate (97–99%) with a low poisoning rate (0.4%) in a training dataset. In addition, the classification accuracy of the normal EHR data was similar to that of the non-poisoned models (reduced by 0.0078 in the AUC-ROC score) which makes it difficult to detect the presence of poison.
Conclusions:
To the best of our knowledge, this is the first study to propose a backdoor attack that uses missing information from tabular data as a trigger. Through extensive experiments, we demonstrated that our backdoor attack can inflict severe damage on medical ML classifiers in practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.