Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 29, 2023
Date Accepted: Apr 30, 2024

The final, peer-reviewed published version of this preprint can be found here:

Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study

Ben Yehuda O, Itelman E, Vaisman A, Segal G, Lerner B

Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study

J Med Internet Res 2024;26:e48595

DOI: 10.2196/48595

PMID: 39079116

PMCID: 11322683

Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning Enables Identification of New, Unidentified Risk Factors

  • Ori Ben Yehuda; 
  • Edward Itelman; 
  • Adva Vaisman; 
  • Gad Segal; 
  • Boaz Lerner

ABSTRACT

Background:

Under- or late identification of Pulmonary Embolism (PE)—a potentially lethal thrombosis of one or more pulmonary arteries that seriously threatens patients’ lives—is a major challenge confronting modern medicine worldwide.

Objective:

We aim to establish accurate and informative models to identify patients at high risk for PE, upon hospital admission before the first clinical checkup is made and using only information available from the patient's medical history.

Methods:

We trained a random forest (RF) to detect PE at the earliest possible time during hospitalization, already upon a patient’s hospital admission. We obtained a 13-year data set of 46,639 (1,942 PE and 44,697 non-PE) patients admitted to all internal departments of a tertiary medical center, including patient demographics, prior diagnoses, and chronic medications. Our first suggested method to remedy data imbalance sets the decision threshold determining the probability above which a patient is classified as positive for PE at the minority-to-majority class ratio. Our second method trains as many classifiers as the inverse of this ratio on a balanced set of PE and (random) control patients before averaging performance over the ensemble on a balanced test set. Then, to identify significant features from different experiments, we propose a non-parametric statistical test to compare feature importance lists obtained from the RF model over several data permutations. Further, we suggest a supervised clustering method to identify informative clusters that may relate patient demographic and clinical characteristics on hospital admission to improve care.

Results:

The models of the methods to tackle the imbalance data predicted PE based on age, sex, body mass index, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, returning an ~80% value of the geometric mean—an informative performance measure for imbalance data. Although only ~4% of the patients had a final diagnosis of PE, we found two 5-cluster clustering schemes, each with a cluster or two with over 61% positive patients for PE. The cluster of the first scheme included 36% of all PE patients who were characterized by a definitive past PE diagnosis, and six- and three-times larger prevalence of deep vein thrombosis and pneumonia compared with patients of the other clusters. In the second scheme, two clusters (one of only males and one of only females) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third cluster included only patients with a past diagnosis of pneumonia.

Conclusions:

Despite the highly imbalanced scenario and using only information available from the patient's medical history, our models were both accurate and informative in identifying patients at high risk for PE, already upon hospital admission before even the first clinical checkup was made.


 Citation

Please cite as:

Ben Yehuda O, Itelman E, Vaisman A, Segal G, Lerner B

Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study

J Med Internet Res 2024;26:e48595

DOI: 10.2196/48595

PMID: 39079116

PMCID: 11322683

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.