Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 22, 2024
Date Accepted: May 7, 2025

The final, peer-reviewed published version of this preprint can be found here:

Natural Language Processing and ICD-10 Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study

Gaspar F, Zayerne M, Coumau C, Bertrand E, Bettex M, Le Pogam MA, Csajka C

Natural Language Processing and ICD-10 Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study

JMIR Med Inform 2025;13:e67837

DOI: 10.2196/67837

PMID: 40882207

PMCID: 12396801

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

A comparative cross-sectional study of Natural Language Processing and ICD-10 Coding for detecting bleeding events in discharge summaries

  • Frederic Gaspar; 
  • Mehdi Zayerne; 
  • Claire Coumau; 
  • Elliott Bertrand; 
  • Marie Bettex; 
  • Marie Annick Le Pogam; 
  • Chantal Csajka

ABSTRACT

Background:

Bleeding adverse drug events (ADEs), particularly among older adult patients on antithrombotic therapy, are a significant concern in hospital settings. These events often go undetected using traditional rule-based methods relying on structured data from electronic medical records, underscoring the need for more effective detection approaches.

Objective:

This study aimed to develop and evaluate a natural language processing (NLP) model to accurately detect and categorise bleeding events in older adult inpatients’ discharge summaries. Specifically, it would identify ADEs related to antithrombotic therapy and compare the NLP model’s performance with Boolean algorithms based on International Classification of Diseases, 10th Revision (ICD-10) codes.

Methods:

Clinicians manually annotated 400 discharge summaries, comprising 65,706 sentences, into four categories: ‘no bleeding’, ‘clinically significant bleeding’, ‘severe bleeding’ and ‘history of bleeding’. These annotations were used to train and validate two detection models: an NLP model using binary logistic regression and support vector machine classifiers and a rule-based model using ICD-10 codes specific to bleeding ADEs. We assessed both models’ performance using accuracy, precision, recall, F1 score and the area under the curve (AUC) from receiver operating characteristic (ROC) analysis. Manual annotations served as the gold standard.

Results:

The NLP model outperformed the rule-based model, especially in identifying ‘clinically significant’ and ‘severe bleeding’. The NLP model achieved macro-averages of 0.81 for accuracy and 0.80 for the F1 score. It also demonstrated high precision in distinguishing current bleeding ADEs from past ones, with a strong true positive rate and minimal false positives.

Conclusions:

This study highlights a significant advance in using artificial intelligence for healthcare, with the NLP model surpassing traditional ICD-10 coding for detecting bleeding ADEs in electronic medical records. The NLP model provides a more precise tool for clinical decision-making involving older adult patients on antithrombotic therapy.


 Citation

Please cite as:

Gaspar F, Zayerne M, Coumau C, Bertrand E, Bettex M, Le Pogam MA, Csajka C

Natural Language Processing and ICD-10 Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study

JMIR Med Inform 2025;13:e67837

DOI: 10.2196/67837

PMID: 40882207

PMCID: 12396801

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.