Accepted for/Published in: JMIR Mental Health
Date Submitted: Dec 12, 2025
Date Accepted: Mar 17, 2026
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Machine Learning for Comparative Antidepressant Selection in Major Depressive Disorder: A Systematic Review
ABSTRACT
Background:
Major depressive disorder affects 322 million individuals worldwide, yet antidepressant selection relies on trial-and-error with 42-53% response rates. Current artificial intelligence (AI) models focus on individual treatments rather than comparative selection across therapeutic options, which limits their clinical utility, as treatment decisions require comparing expected outcomes across multiple medications to identify the optimal choice for each patient.
Objective:
This systematic review aimed to evaluate AI models that examined two or more pharmacological interventions for predicting treatment outcomes in major depressive disorder, facilitating comparative treatment selection between medications or medication classes.
Methods:
We systematically reviewed PubMed, Scopus, and Web of Science from January 2015 to March 2025 following PRISMA guidelines. We included studies that examined adult (age ≥ 18 years old) patients with Major Depressive Disorder (MDD), utilized AI models to predict treatment outcomes, and examined two or more pharmacological interventions to enable comparative treatment selection. We extracted data on modeling strategies, validation methods, and performance metrics for comparative prediction.
Results:
From 1,902 initial records, 15 studies met the inclusion criteria. Dataset sample sizes ranged from 49 to 77,226 participants. STAR*D was the most frequently used dataset (20.0% of studies), followed by GENDEP and EMBARC (13.3% each). Studies compared 2 to 15 antidepressants using two main modeling approaches: separate drug-specific models trained independently for each medication (n=5 studies) or unified frameworks using clustering or trajectory methods to enable cross-treatment comparison (n=10 studies). Performance varied substantially, with Area Under the Curve (AUC) values ranging from 0.59 to 0.95 for classification tasks and accuracies between 62% and 95.1%. Only 4 studies (26.7%) conducted external validation on independent datasets. Depression severity was assessed using standardized scales in 86.6% of studies, though response and remission definitions varied considerably.
Conclusions:
Current AI models for antidepressant selection face critical limitations for clinical translation. Most studies lack the capability to provide patient-level comparative predictions needed for treatment selection, whether using clustering-based or drug-specific approaches. Key barriers include limited external validation, overreliance on a few established datasets, absence of explainability, and methodological heterogeneity preventing evidence synthesis. Future research should prioritize unified comparative frameworks with calibrated individual-level predictions, rigorous external validation, and transparent methodologies aligned with clinical workflows.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.