JMIR Preprints #67837: A comparative cross-sectional study of Natural Language Processing and ICD-10 Coding for detecting bleeding events in discharge summaries

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A comparative cross-sectional study of Natural Language Processing and ICD-10 Coding for detecting bleeding events in discharge summaries

Frederic Gaspar;
Mehdi Zayerne;
Claire Coumau;
Elliott Bertrand;
Marie Bettex;
Marie Annick Le Pogam;
Chantal Csajka

ABSTRACT

Background:

Bleeding adverse drug events (ADEs), particularly among older adult patients on antithrombotic therapy, are a significant concern in hospital settings. These events often go undetected using traditional rule-based methods relying on structured data from electronic medical records, underscoring the need for more effective detection approaches.

Objective:

This study aimed to develop and evaluate a natural language processing (NLP) model to accurately detect and categorise bleeding events in older adult inpatients’ discharge summaries. Specifically, it would identify ADEs related to antithrombotic therapy and compare the NLP model’s performance with Boolean algorithms based on International Classification of Diseases, 10th Revision (ICD-10) codes.

Methods:

Clinicians manually annotated 400 discharge summaries, comprising 65,706 sentences, into four categories: ‘no bleeding’, ‘clinically significant bleeding’, ‘severe bleeding’ and ‘history of bleeding’. These annotations were used to train and validate two detection models: an NLP model using binary logistic regression and support vector machine classifiers and a rule-based model using ICD-10 codes specific to bleeding ADEs. We assessed both models’ performance using accuracy, precision, recall, F1 score and the area under the curve (AUC) from receiver operating characteristic (ROC) analysis. Manual annotations served as the gold standard.

Results:

The NLP model outperformed the rule-based model, especially in identifying ‘clinically significant’ and ‘severe bleeding’. The NLP model achieved macro-averages of 0.81 for accuracy and 0.80 for the F1 score. It also demonstrated high precision in distinguishing current bleeding ADEs from past ones, with a strong true positive rate and minimal false positives.

Conclusions:

This study highlights a significant advance in using artificial intelligence for healthcare, with the NLP model surpassing traditional ICD-10 coding for detecting bleeding ADEs in electronic medical records. The NLP model provides a more precise tool for clinical decision-making involving older adult patients on antithrombotic therapy.

Citation

Please cite as:

Gaspar F, Zayerne M, Coumau C, Bertrand E, Bettex M, Le Pogam MA, Csajka C

Natural Language Processing and ICD-10 Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study

JMIR Med Inform 2025;13:e67837

DOI: 10.2196/67837

PMID: 40882207

PMCID: 12396801

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 22, 2024

Date Accepted: May 7, 2025

A comparative cross-sectional study of Natural Language Processing and ICD-10 Coding for detecting bleeding events in discharge summaries

ABSTRACT

Citation

Copyright