Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 13, 2024
Open Peer Review Period: Mar 11, 2024 - May 6, 2024
Date Accepted: Jul 17, 2024
(closed for review but you can still tweet)
Evaluating an NLP-Driven, AI-Assisted ICD-10-CM Coding System for Diagnosis-Related Groups: A Feasibility Study in a Real Hospital Environment
ABSTRACT
Background:
The International Classification of Diseases (ICD) code is widely used to describe diagnosis information, but manual coding relies heavily on human reading and interpretation of written material, which can be expensive, time-consuming, and prone to errors. With the transition from ICD-9 to ICD-10, the coding process has become more complex, highlighting the need for automated approaches to enhance coding efficiency and accuracy. Furthermore, as inaccurate coding can result in substantial financial losses for hospitals, a thorough and precise assessment of the outcomes generated by natural language processing (NLP)-driven auto-coding system assumes a critical role in safeguarding the accuracy of Taiwan Diagnosis Related Groups (Tw-DRGs).
Objective:
This study aims to evaluate the feasibility of applying an ICD-10-CM (clinical modification) auto-coding system that can automatically determine the corresponding diagnoses and codes based on free-text discharge summaries to facilitate the assessment of Tw-DRGs, specifically in principal diagnosis and the major diagnostic categories (MDC).
Methods:
By utilizing the patient discharge summaries from Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUCHH) from April 2019 to December 2020 as a reference dataset, we developed artificial intelligent (AI)-assisted ICD-10-CM coding systems based on deep learning models. Subsequently, we constructed a web-based user interface for the AI-assisted coding system and deployed the system to the workflow of the certified coding specialists (CCSs) of KMUCHH. The data used for the assessment of Tw-DRGs was then manually curated by a CCS with the principal diagnosis and MDC determined from discharge summaries collected at KMUCHH from February 2023 to April 2023.
Results:
Evaluation was conducted using both the reference dataset and real hospital data to assess the performance in determining ICD-10-CM coding, principal diagnosis, and MDC for Tw-DRGs. Among all implemented methods, the generatively pre-trained transformer-2 (GPT-2) based model achieved the highest F-score of 0.667 (0.851 for top-50 codes) on the KMUCHH test set and a slightly lower F-score of 0.621 in real hospital data. Cohen’s Kappa evaluation for the agreement of MDC between the models and the CSS revealed that the overall average Kappa value for GPT-2 (0.714) is approximately 12.2 percentage points higher than that of the hierarchy attention network (0.592). GPT-2 demonstrated superior agreement with the CCS across six categories of MDC, with an average Kappa value exceeding 0.81, underscoring the effectiveness of the developed AI-assisted coding system in supporting the work of CCSs.
Conclusions:
We introduced an NLP-driven AI-assisted coding system that can assist the CCSs in ICD-10-CM coding through offering coding references via a user interface. Our system, when compared to the CCS in the context of Tw-DRGs, demonstrated a potential in reducing the manual workload to expedite Tw-DRG assessment. The consistency in the performance reported affirmed the effectiveness of the NLP-driven AI-assisted coding system in supporting the work of CCSs in both ICD-10-CM coding and the judgement of Tw-DRGs.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.