Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 12, 2025
Date Accepted: Mar 31, 2026
Multimodal Data Fusion of Echocardiogram Images and Electronic Medical Records for Heart Disease Screening with Explainable Analysis
ABSTRACT
Background:
Echocardiography is a fundamental imaging modality for the diagnosis of heart disease, but its interpretation remains operator-dependent and lacks standardized, data-driven decision support. While artificial intelligence (AI) has shown potential in enhancing diagnostic accuracy, the integration of longitudinal clinical data remains underexplored.
Objective:
This study aims to develop an explainable AI framework that integrates multimodal data—including echocardiogram (ECHO) images and longitudinal electronic medical records (EMRs)—to enhance the screening and interpretability of heart disease diagnosis.
Methods:
We retrospectively analyzed data from 5,884 patients, including 1,470 with confirmed heart disease. The proposed model combines spatial features extracted from echocardiography with temporal features derived from EMRs. These features were fed into a multimodal deep learning architecture equipped with a fusion module to identify key diagnostic cues. We further implemented post hoc explainability for model interpretability.
Results:
The proposed model achieved an AUC of 0.8361 on the independent test set. Attention-based visualization revealed that the model consistently attended to clinically relevant regions within echocardiographic images, including valvular structures and abnormal flow patterns. For the EMR modality, feature importance derived from the XGBoost model indicated that age, registration years, gender, and blood pressure were the most influential predictors of heart disease onset, aligning with established cardiovascular risk factors and enhancing model interpretability.
Conclusions:
The integration of echocardiographic and EMR data using an explainable AI framework enables accurate and interpretable screening for heart disease. This study underscores the potential of multimodal deep learning in improving diagnostic workflows and enhancing clinician trust through transparent model behavior.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.