JMIR Preprints #58278: Evaluating an NLP-Driven, AI-Assisted ICD-10-CM Coding System for Diagnosis-Related Groups: A Feasibility Study in a Real Hospital Environment

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Evaluating an NLP-Driven, AI-Assisted ICD-10-CM Coding System for Diagnosis-Related Groups: A Feasibility Study in a Real Hospital Environment

Hong-Jie Dai;
Chen-Kai Wang;
Chien-Chang Chen;
Chong-Sin Liou;
An-Tai Lu;
Chia-Hsin Lai;
Bo-Tsz Shian;
Cheng-Rong Ke;
William Yu Chung Wang;
Tatheer Hussain Mir;
Mutiara S. Simanjuntak;
Hao-Yun Kao;
Ming-Ju Tsai;
Vincent S. Tseng

ABSTRACT

Background:

The International Classification of Diseases (ICD) code is widely used to describe diagnosis information, but manual coding relies heavily on human reading and interpretation of written material, which can be expensive, time-consuming, and prone to errors. With the transition from ICD-9 to ICD-10, the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Furthermore, as inaccurate coding can result in substantial financial losses for hospitals, a thorough and precise assessment of the outcomes generated by natural language processing (NLP)-driven auto-coding system assumes a critical role in safeguarding the accuracy of Taiwan Diagnosis Related Groups (Tw-DRGs).

Objective:

This study aims to evaluate the feasibility of applying an ICD-10-CM (clinical modification) auto-coding system that can automatically determine the corresponding diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically in principal diagnosis and the major diagnostic categories (MDC).

Methods:

By utilizing the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference dataset, we developed artificial intelligent (AI)-assisted ICD-10-CM coding systems based on deep learning models. Subsequently, we constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs was then manually curated by a CCS with the principal diagnosis and MDC determined from discharge summaries collected at KMUCHH from February 2023 to April 2023.

Results:

Evaluation was conducted using both the reference dataset and real hospital data to assess the performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all implemented methods, the generatively pre-trained transformer-2 (GPT-2) based model achieved the highest F-score of 0.667 (0.851 for top-50 codes) on the KMUCHH test set and a slightly lower F-score of 0.621 in real hospital data. Cohen’s Kappa evaluation for the agreement of MDC between the models and the CSS revealed that the overall average Kappa value for GPT-2 (0.714) is approximately 12.2 percentage points higher than that of the hierarchy attention network (0.592). GPT-2 demonstrated superior agreement with the CCS across six categories of MDC, with an average Kappa value exceeding 0.81, underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs.

Conclusions:

We introduced an NLP-driven AI-assisted coding system that can assist the CCSs in ICD-10-CM coding through offering coding references via a user interface. Our system, when compared to the CCS in the context of Tw-DRGs, demonstrated a potential in reducing the manual workload to expedite Tw-DRG assessment. The consistency in the performance reported affirmed the effectiveness of the NLP-driven AI-assisted coding system in supporting the work of CCSs in both ICD-10-CM coding and the judgement of Tw-DRGs.

Citation

Please cite as:

Dai HJ, Wang CK, Chen CC, Liou CS, Lu AT, Lai CH, Shian BT, Ke CR, Wang WYC, Mir TH, Simanjuntak MS, Kao HY, Tsai MJ, Tseng VS

Evaluating a Natural Language Processing–Driven, AI-Assisted International Classification of Diseases, 10th Revision, Clinical Modification, Coding System for Diagnosis Related Groups in a Real Hospital Environment: Algorithm Development and Validation Study

J Med Internet Res 2024;26:e58278

DOI: 10.2196/58278

PMID: 39302714

PMCID: 11452756

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 13, 2024

Open Peer Review Period: Mar 11, 2024 - May 6, 2024

Date Accepted: Jul 17, 2024

(closed for review but you can still tweet)

Evaluating an NLP-Driven, AI-Assisted ICD-10-CM Coding System for Diagnosis-Related Groups: A Feasibility Study in a Real Hospital Environment

ABSTRACT

Citation

Copyright