Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 21, 2022
Date Accepted: Dec 4, 2022
Development of an end-to-end NLP application for prediction of medical case coding complexity
ABSTRACT
Background:
Medical coding is the process that converts clinical documentation into standard medical codes. Codes are used for several key purposes in a hospital such as insurance reimbursement and hospital performance analysis. The optimization of medical coding accuracy and efficiency is therefore crucial. With the rapid growth of NLP technologies, several commercial rule-based and machine-learning-based solutions have been proposed for aiding medical coding by automatically suggesting relevant codes for a medical case. However, their effectiveness is still limited to simple cases, and it is not yet clear how much value they can bring in improving coding efficiency and accuracy.
Objective:
Our study aims to propose an alternative approach for improving medical coding efficiency. Based on the analysis of the work organization of the coding team in the Lausanne University Hospital, Switzerland, we develop an end-to-end multimodal machine-learning-based application that can predict coding complexity in the pre-coding phase. The goal is to enable a more efficient redistribution of coding tasks based on the various levels of expertise within the coding unit to eventually minimize coding errors and improve coding throughput.
Methods:
We collected 2060 cases rated by coders from 1 (simplest) to 4 (most complex) to train and evaluate our ML approach. We asked two expert coders to rate 62 cases out of the 2060 as the gold standard. The agreements between experts are used as benchmarks for model evaluation. A case contains both clinical text and patient’s metadata from the hospital electronic health record. We extracted both text features and metadata features, then concatenated and fed into a ML model. We built two models: The first with cross-validated training on 1751 cases and testing on 309 cases aiming at assessing predictive power of the proposed approach and its generalizability, the second, trained on 1998 cases and tested on the gold standard to validate the best model performance against human benchmarks.
Results:
Our first model achieves macro-f1 score 0.51, accuracy 0.59. The model distinguishes well between the simple (complexity 1-2) and complex (complexity 3-4) cases with macro f1-score 0.65, accuracy 0.71. Our second model achieves 61% agreement with experts’ ratings, and macro-f1 0.62 on the gold standard, while the two experts have a 66% agreement ratio with macro-f1 score 0.67.
Conclusions:
We proposed a multimodal modeling approach that leverages information from both clinical text and patients’ metadata to predict the complexity of coding a case in the pre-coding phase. The proposed approach yields a NLP model that is comparable with human expert coders. By integrating this model to the hospital coding system, coders’ workloads will be better allocated, and domain experts will receive better decision support.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.