Currently submitted to: JMIR Medical Informatics
Date Submitted: Mar 18, 2025
Open Peer Review Period: Mar 26, 2025 - May 21, 2025
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Interpretable Machine Learning Model for Pulmonary hypertension Risk Prediction: a retrospective cohort study
ABSTRACT
Background:
Pulmonary hypertension (PH) is a progressive disorder characterized by elevated pulmonary artery pressure and increased pulmonary vascular resistance, ultimately leading to right heart failure. Early detection is critical for improving patient outcomes.
Objective:
To establish a novel machine learning-based diagnostic model for PH.
Methods:
A diagnostic model for the early detection of pulmonary hypertension (PH) was developed through a two-step approach. First, Recursive Feature Elimination (RFE) was employed to select the most relevant echocardiographic variables, which were subsequently integrated into a composite ultrasound index using machine learning techniques such as XGBoost. In the second step, this ultrasound index was integrated with clinical variables identified through LASSO regression. Together, these elements were combined to construct a logistic regression model for diagnosis. The model’s performance was rigorously evaluated using ROC curves, calibration plots, and decision curve analysis (DCA) to ensure its clinical relevance and accuracy.
Results:
Machine learning identified key echocardiographic and clinical predictors, with the XGBoost model achieving high AUC, sensitivity, and specificity. LASSO regression identified critical clinical variables, including prothrombin time activity and serum cystatin C. The diagnostic model demonstrated high predictive accuracy, with an AUC of 0.997. Calibration and decision curve analyses indicated close alignment between predicted and observed outcomes validating the model’s clinical value, especially at higher risk thresholds.
Conclusions:
This model enhances early pulmonary hypertension (PH) diagnosis through a non-invasive approach and demonstrates strong predictive accuracy. It facilitates early intervention and personalized treatment, with potential applications in broader cardiovascular disease management. Clinical Trial: The study was approved by the Research Ethics Commission of Wuhan Zhongnan Hospital and the requirement for informed consent was waived by the Ethics Commission (Approval No.2023185)
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.