Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 6, 2024
Date Accepted: Apr 17, 2025
Large language models and artificial neural networks for assessing one-year mortality in patients with myocardial infarction: analysis from the MIMIC-IV database
ABSTRACT
Background:
Qwen-2 and Llama-3 are online high-performance and open-source large language models (LMMs). The artificial neural network (ANN) algorithm derived from the SWEDEHEART registry, termed SWEDEHEART-AI, can predict prognosis after acute myocardial infarction (AMI).
Objective:
We aimed to evaluate the three above models in predicting one-year all-cause mortality in critically ill patients with AMI.
Methods:
The MIMIC-IV database is a publicly available dataset in critical care medicine, and we included 2,758 patients who were first admitted for AMI and discharged alive. SWEDEHEART-AI calculated the death rate via each patient’s 21 clinical variables. Qwen-2 and Llama-3 analyzed the content of patient’s discharge records and directly gave a one-decimal value between 0 and 1 to represent one-year death risk probabilities. The patient’s actual mortality could be verified from the follow-up data. The predictive performance of the three models was assessed and compared by Harrell’s C statistics (C-index), the area under the receiver operating characteristic curve (AUROC), calibration plots, Kaplan-Meier curve, and decision curve analysis.
Results:
SWEDEHEART-AI demonstrated significant discrimination in predicting one-year all-cause death in AMI patients, with a higher C-index than Qwen-2 and Llama-3 (0.72 [95% CI 0.69-0.74] vs 0.65 [0.62-0.67] vs 0.56 [0.53-0.58], respectively; all P<.001 for both comparisons). SWEDEHEART-AI yielded high and stable AUROC in the time-dependent ROC curve. The SWEDEHEART-AI calculated death rates were positively correlated with the actual mortality, and the three risk classes derived from this model showed good differentiation in the Kaplan-Meier curve. Calibration plots indicated that the SWEDEHEART-AI favored overestimating mortality risk (the observed-to-expected ratio was 0.478). SWEDEHEART-AI displayed positive and larger net benefits at a risk threshold of less than 19 percent compared to LLMs.
Conclusions:
SWEDEHEART-AI performed best in predicting one-year all-cause mortality in AMI patients. As LLMs, Qwen-2 outperformed Llama-3 and showed moderate predictive value. Clinical Trial: Not applicable.The study was performed according to the guidelines of the Helsinki Declaration. The review committee of the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center approved the access to the MIMIC-IV database. These data were de-identified; therefore, the study was exempted from ethical approval statements and informed consent requirements.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.