Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 26, 2021
Date Accepted: Nov 8, 2021
Sequential Data-Based Patient Similarity Framework and its Application in Patient Outcome Prediction
ABSTRACT
Background:
The sequential information stored in electrical medical records (EMR) was valuable and helpful for patient outcome prediction, but rarely used for patient similarity measurement because of its unevenness, irregularity, and heterogeneity.
Objective:
Our study aimed to propose a patient similarity framework based on sequential and cross-sectional information stored in EMR system for further patient outcome prediction.
Methods:
The timestamped event sequence contributed to the sequence similarity using edit distance, and the temporal signal contributed to the trend similarity using dynamic time warping (DTW) and Haar decomposition, respectively. We also extracted cross-sectional information, namely demographics, laboratory tests, and radiological reports, for further similarity calculation. We then validated the effectiveness and superiority of the proposed similarity framework via constructing k-nearest neighbors (kNN) classifiers in predicting mortality and readmission for acute myocardial infarction patients based on a public dataset and a private dataset. We especially made the predictions at three time points: at admission, seven days after admission, and at discharge to provide the early alarms of patient outcomes. Some state-of-the-art models were built as baseline models simultaneously.
Results:
Based on the public dataset, the kNN model including the sequence similarity and DTW-based trend similarity (KNNED), and the kNN model built on the sequence similarity (KNNE) shown the highest average AUCs of 0.883 and 0.623 for mortality and readmission prediction, respectively. Based on the private dataset, KNNED performed best with the average AUC of 0.954 for mortality prediction when using all information from admission to discharge, exceeding other baseline models. The KNNED also shown the highest average AUC of 0.902 when using information before seven days after admission to predict mortality. When the mortality prediction was made using information at admission, the random forest model performed best with the average AUC of 0.864. All models’ performances obviously increased with the three time points for mortality prediction. For readmission prediction, the KNNE model performed best (AUC, 0.656) when using all available information during a hospitalization episode.
Conclusions:
The proposed method helped deal with the challenge of sequential similarity calculation for uneven EMR data, and helped improve the predictive performance and early alarm of patient outcomes.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.