Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 31, 2024
Date Accepted: Sep 1, 2024
Disambiguate Clinical Abbreviation by One-to-all Classification: Algorithm Development and Validation Study
ABSTRACT
Background:
Electronic Medical Records (EMRs) store extensive patient data and serve as a comprehensive repository, including textual medical records like surgical and imaging reports. Their utility in Clinical Decision Support Systems (CDSS) is significant, but the widespread use of ambiguous and unstandardized abbreviations in clinical documents poses challenges for Natural Language Processing (NLP) in CDSS. Efficient abbreviation disambiguation methods are needed for effective information extraction.
Objective:
This study aims to enhance the One-to-All (OTA) framework for clinical abbreviation expansion, which uses a single model to predict multiple meanings of abbreviations. The objective is to improve OTA by developing context-candidate pairs and optimizing word embeddings in BERT, evaluating the model's efficacy in expanding clinical abbreviations using real data.
Methods:
Three datasets were used: MSH WSD, UMN, and CYCH from Ditmanson Medical Foundation Chia-Yi Christian Hospital. Texts containing polysemous words and abbreviations were preprocessed and formatted for BERT. The study involved fine-tuning pre-trained models, ClinicalBERT and BlueBERT, generating dataset pairs for training and testing based on Huang et al.'s (2019) method.
Results:
The BERT sequence classification (BlueBERT) achieved macro and micro accuracies of 95.41% and 95.16% on the MSH WSD dataset, respectively. For the UMN dataset, the OTA method outperformed other models with accuracies of 98.40% (macro) and 98.22% (micro). The model fine-tuned with the CYCH dataset showed enhanced performance compared to training without it.
Conclusions:
This research demonstrates the potential of automated models for abbreviation disambiguation and expansion in clinical texts, which could improve clinical staff efficiency and research effectiveness. The study validates the practicality of the OTA approach, especially in cross-hospital scenarios, and represents a pioneering effort in using a Context-candidate pair and BERT model for word sense disambiguation and clinical abbreviation expansion, with promising results on the MSH WSD and UMN datasets.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.