Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 13, 2020
Open Peer Review Period: Jan 13, 2020 - Jan 23, 2020
Date Accepted: Apr 10, 2020
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
AlphaBERT: An extractive summarization model based on a character-level token and Bidirectional Encoder Representations from Transformers (BERT)
ABSTRACT
Background:
Doctors must care for many patients simultaneously, and it is time-consuming to find and examine all patients’ medical histories. The deep learning method, applied in and Bidirectional Encoder Representations from Transformers (BERT)-based models, is useful for summarization. To manage the problem of medical terminology, the BioBERT model is also included in this study and its performance compared. However, a heavy model is difficult to build using outdated, resource-limited computers across several hospitals. Adopting a character-level token on BERT is a solution to this problem.
Objective:
We aim to build a diagnoses-extractive summarization model for hospital information systems and provide a website service that can be operated with limited computing resources.
Methods:
We collected diagnoses from the National Taiwan University Hospital Integrated Medical Database (NTUH-iMD) and labeled the highlighted extractive summaries written by experienced doctors. We used the BERT-based structure and a two-stage training method. We used a character-level token to reduce the model size and pre-trained the model using random mask characters in the diagnoses and ICD sets, then fine-tuned the model using summary labels. We cleaned up the prediction results by averaging all the probabilities for entire words to prevent character-level-induced fragment words. We evaluated the model performance using the ROUGE score and built a questionnaire website to collect feedback from more doctors for each summary proposal.
Results:
The Area Under the Receiver Operating Characteristics (AUROCs) of the summary proposals were 0.941, 0.928, 0.899, and 0.933 for BioBERT, BERT, LSTM, and the proposed model. The ROUGE-L values were 0.697, 0.711, 0.648, and 0.678 for BioBERT, BERT, LSTM, and the proposed model. The mean critic scores (standard deviations) from doctors were 2.232 (0.832), 2.134 (0.877), 2.207 (0.844), 1.927 (0.910), and 2.126 (0.874) for reference-by-doctor labels, BioBERT, BERT, LSTM, and the proposed model. Using the pairwise paired t-test, there was a statistically significant difference in LSTM compared to the reference (p<.001), BERT (p=.001), BioBERT (p<.001), and the proposed model (p=.002), but not to the other models.
Conclusions:
Using character-level tokens in a BERT model can greatly decrease the model size without significantly reducing performance for the diagnoses summarization task. A well-developed deep learning model will enhance doctors’ abilities and promote medical studies by providing the capability to use extensive unstructured free-text notes. Clinical Trial: None
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.