Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 19, 2023
Date Accepted: Aug 17, 2024
Multifaceted NLP task-based evaluation of BERT models for bilingual (Korean and English) clinical notes: Comparative Analysis
ABSTRACT
Background:
The Bidirectional Encoder Representations from Transformers (BERT) model has gained widespread use in clinical applications, such as patient classification and disease prediction. However, prior studies have had certain limitations. First, these studies often emphasized application development without thoroughly assessing the model's comprehension of clinical context. Second, comparative research on BERT models using medical documents from non-English speaking countries has been lacking, raising concerns about the applicability of BERT models trained on English clinical notes to non-English contexts. To address these gaps, our study aimed to identify the most effective BERT model for non-English clinical notes.
Objective:
This study sought to evaluate the contextual understanding abilities of various BERT models when applied to mixed Korean and English clinical notes. Our primary objective was to identify the BERT model that excels in understanding the context of such documents.
Methods:
Leveraging data from 164,460 patients in a South Korean tertiary hospital, we compared BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT). We pretrained these models to improve their contextual comprehension capabilities and subsequently compared them in seven distinct finetuning tasks.
Results:
The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks utilizing [CLS] token embeddings, such as document classification, demonstrating their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension (RC) tasks, where better results were obtained when there were fewer occurrences of words being replaced with [UNK] tokens. Third, M-BERT excelled in the knowledge inference task, where it effectively inferred correct disease names from 63 candidate disease names when given a document wherein the disease names had been replaced with [MASK] tokens.
Conclusions:
This study highlights the effectiveness of different BERT models in a multilingual clinical domain. We anticipate that our findings will significantly benefit researchers working in the clinical field or conducting language-based investigations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.