Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 22, 2024
Date Accepted: May 7, 2025
A comparative cross-sectional study of Natural Language Processing and ICD-10 Coding for detecting bleeding events in discharge summaries
ABSTRACT
Background:
Bleeding adverse drug events (ADEs), particularly among older adult patients on antithrombotic therapy, are a significant concern in hospital settings. These events often go undetected using traditional rule-based methods relying on structured data from electronic medical records, underscoring the need for more effective detection approaches.
Objective:
This study aimed to develop and evaluate a natural language processing (NLP) model to accurately detect and categorise bleeding events in older adult inpatients’ discharge summaries. Specifically, it would identify ADEs related to antithrombotic therapy and compare the NLP model’s performance with Boolean algorithms based on International Classification of Diseases, 10th Revision (ICD-10) codes.
Methods:
nicians manually annotated 400 discharge summaries, comprising 65,706 sentences, into four categories: ‘no bleeding’, ‘clinically significant bleeding’, ‘severe bleeding’, and ‘history of bleeding’. The dataset was divided into a training set (70%, 45,994 sentences) and a test set (30%, 19,712 sentences). These annotations were used to train and validate two detection models: an NLP model using binary logistic regression and support vector machine classifiers, and a rule-based model using ICD-10 codes specific to bleeding ADEs. Due to the class imbalance, where the majority of sentences fell into the ‘no bleeding’ category, a class-weighting strategy was applied to enhance the NLP model’s sensitivity to minority classes, such as ‘severe bleeding’. We assessed both models’ performance using accuracy, precision, recall, F1 score, and the area under the curve (AUC) from receiver operating characteristic (ROC) analysis. Manual annotations served as the gold standard.
Results:
The NLP model outperformed the rule-based model across all metrics. It achieved macro-averages of 0.81 for accuracy and 0.80 for F1 score, with precision scores of 0.92 and 0.70 for severe and clinically significant bleeding, respectively. The ROC curve analysis showed strong diagnostic performance, with an AUC of 0.94 for distinguishing clinically significant from severe bleeding. minimising false positives while maintaining a true positive rate of 98% for irrelevant cases. The rule-based model, while effective at identifying clinically significant bleeding with a precision of 0.94, had significant limitations in detecting severe bleeding (recall: 0.03). Its reliance on ICD-10 codes for classification limited its ability to capture nuanced clinical events, especially those involving historical or overlapping bleeding conditions.
Conclusions:
This study highlights the potential of NLP models to enhance bleeding ADE detection in EMR data, offering a more accurate and nuanced alternative to traditional ICD-10-based methods. The NLP model’s ability to process unstructured clinical narratives and distinguish overlapping bleeding conditions makes it a valuable tool for improving patient safety and supporting clinical decision-making. Future work should focus on refining temporal reasoning capabilities and expanding datasets to ensure generalisability across diverse healthcare settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.