JMIR Preprints #37557: Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model with Rule-Based Approaches

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model with Rule-Based Approaches

Pei-Fu Chen;
Kuan-Chih Chen;
Wei-Chih Liao;
Feipei Lai;
Tai-Liang He;
Sheng-Che Lin;
Wei-Jen Chen;
Chi-Yu Yang;
Yu-Cheng Lin;
I-Chang Tsai;
Chi-Hao Chiu;
Shu-Chih Chang;
Fang-Ming Hung

ABSTRACT

Background:

For automatic multi-label text classification of the tenth revision of the International Classification of Diseases (ICD-10), this research aims to build a coding system using a contextual language model and rule-based preprocessing algorithm to improve the correctness of disease coding.

Objective:

This study aimed to establish a combination of natural language processing (NLP) and rule-based approaches for developing a more accurate and explainable ICD-10 auto-coding.

Methods:

We retrieved electronic health records (EHR) from a tertiary referral center and applied data mining and NLP techniques, including Word2Vec, XLNet, BERT, and AttentionXLM, to implement ICD-10 auto-coding system. Furthermore, we optimize the original contextual language model through the rule-based preprocessing approaches established by coding rules of disease classifiers.

Results:

The performance of the contextualized language model is better than the non-contextualized language model for the multi-label classification task. Our predicting result could achieve an F1-score of 0.77 and 0.69 on predicting ICD-10 clinical modification (ICD-10-CM) and procedure classification system codes (ICD-10-PCS), respectively, through a combination of the BioBERT pretrained model and rule-based preprocessing method. We improved the predication of ICD-10-CM with a keyword extractor and a combination code filter and the prediction for ICD-10-PCS with a keyword extractor and including surgical methods and examination reports.

Conclusions:

The performance of our model with the combination of the pretrained contextualized language model and rule-based preprocessing method is better than the model with only contextualized language in the multi-label classification task. This work highlights the importance of coding rules of disease classifiers to implant the rule-based algorithm into the coding system.

Citation

Please cite as:

Chen PF, Chen KC, Liao WC, Lai F, He TL, Lin SC, Chen WJ, Yang CY, Lin YC, Tsai IC, Chiu CH, Chang SC, Hung FM

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches

JMIR Med Inform 2022;10(6):e37557

DOI: 10.2196/37557

PMID: 35767353

PMCID: 9282222

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Feb 25, 2022

Date Accepted: Jun 12, 2022

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model with Rule-Based Approaches

ABSTRACT

Citation

Copyright