Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jul 24, 2022
Date Accepted: Oct 8, 2022
Federated Learning on ICD-10 Classification: A Deep Contextualized Language Model with a Label Attention Mechanism
ABSTRACT
Background:
Automatic coding of clinical text documents using the International Classification of Diseases 10th edition (ICD-10) can be used in statistical analysis and reimbursement. With the development of natural language processing (NLP) models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase the model performance and external validity, the privacy of clinical documents should be protected. Federated learning trained the model with multicenter data without sharing data per se.
Objective:
This paper aims to train a classification model with federated learning for ICD-10 multilabel classification.
Methods:
Text data from discharge notes in electronic medical records were collected from three medical centers, including Far Eastern Memorial Hospital (FEMH), National Taiwan University Hospital (NTUH), and Taipei Veterans General Hospital (VGHTPE). After comparing the performance of different variants of BERT, PubMedBERT was chosen for the word embeddings. The nonalphanumeric characters are retained because the model performance of this preprocessing is better than removing them. To explain the outputs of our model, we added label attention to the model architecture. The model was trained with data from three hospitals by themselves and with federated learning. The models trained with federated learning and models trained with local data are compared on a testing set composed of data from the three hospitals. The micro F1-score was employed to evaluate the model performance in all three centers.
Results:
The F1-scores of PubMedBERT, RoBERTa, Clinical BERT, and BioBERT were 0.735, 0.692, 0.711, and 0.721, respectively. The F1-score of the model while retaining nonalphanumeric characters was 0.8120, whereas after removing these characters was 0.7875, an increase of 0.0245 (3.11%). The F1-scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, FEMH, NTUH, and VGHTPE models, respectively. The explainable predictions are displayed by highlighted input words with label attention architecture.
Conclusions:
Federated learning can train the ICD-10 classification model from multicenter clinical text with protected data privacy. The model performance is better than that of models trained locally.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.