Accepted for/Published in: JMIR Formative Research
Date Submitted: Sep 5, 2024
Date Accepted: Apr 7, 2025
Identification of major bleeding events in postoperative patients with malignant tumors in Chinese electronic medical records: development and validation of coding algorithms
ABSTRACT
Background:
Postoperative bleeding is a serious complication following abdominal tumor surgery, but it is often not clearly diagnosed and documented in clinical practice in China. Previous studies have relied on manual interpretation of medical records to determine the presence of postoperative bleeding in patients, which is time-consuming and laborious. It remains unclear whether machine learning can play a role in processing a large volume of medical text to identify postoperative bleeding effectively.
Objective:
To develop a machine learning model tool for identifying postoperative patients with major bleeding based on electronic medical record system.
Methods:
This study used the available information in the National Health and Medical Big Data (Eastern) Center in Jiangsu Province of China. We randomly selected the medical records of 2,000 patients who underwent in-hospital tumor resection surgery between January 2018 and December 2021 from the database. Physicians classified each note as present or absent for a major bleeding event during the postoperative hospital stay. Feature engineering was created by bleeding expressions, high frequency related expressions and quantitative logical judgment. Logistic regression (LR), K-nearest neighbor (KNN), and convolutional neural network (CNN) were developed and trained using the 1600-note training set. The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model.
Results:
Major bleeding was present in 4.31% of training set and 4.75% of test set. For the training set, LR method has the sensitivity of 1.0000 and specificity of 0.8152 while CNN method has the sensitivity of 0.9710 and specificity of 0.9027. LR and CNN methods both perform well in the sensitivity and specificity in the test set. Although the KNN method has high specificity in the training set and test set, its sensitivity is very low in both sets.
Conclusions:
Both LR method and CNN method perform well in identifying major bleeding occurring in postoperative patients with malignant tumors from electronic medical records, with high sensitivity and specificity.
Citation
The author of this paper has made a PDF available, but requires the user to login, or create an account.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.