Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Feb 25, 2022
Date Accepted: Jun 12, 2022
Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model with Rule-Based Approaches
ABSTRACT
Background:
For automatic multi-label text classification of the tenth revision of the International Classification of Diseases (ICD-10), this research aims to build a coding system using a contextual language model and rule-based preprocessing algorithm to improve the correctness of disease coding.
Objective:
This study aimed to establish a combination of natural language processing (NLP) and rule-based approaches for developing a more accurate and explainable ICD-10 auto-coding.
Methods:
We retrieved electronic health records (EHR) from a tertiary referral center and applied data mining and NLP techniques, including Word2Vec, XLNet, BERT, and AttentionXLM, to implement ICD-10 auto-coding system. Furthermore, we optimize the original contextual language model through the rule-based preprocessing approaches established by coding rules of disease classifiers.
Results:
The performance of the contextualized language model is better than the non-contextualized language model for the multi-label classification task. Our predicting result could achieve an F1-score of 0.77 and 0.69 on predicting ICD-10 clinical modification (ICD-10-CM) and procedure classification system codes (ICD-10-PCS), respectively, through a combination of the BioBERT pretrained model and rule-based preprocessing method. We improved the predication of ICD-10-CM with a keyword extractor and a combination code filter and the prediction for ICD-10-PCS with a keyword extractor and including surgical methods and examination reports.
Conclusions:
The performance of our model with the combination of the pretrained contextualized language model and rule-based preprocessing method is better than the model with only contextualized language in the multi-label classification task. This work highlights the importance of coding rules of disease classifiers to implant the rule-based algorithm into the coding system.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.