Accepted for/Published in: JMIR AI
Date Submitted: Jul 30, 2025
Open Peer Review Period: Aug 1, 2025 - Sep 26, 2025
Date Accepted: Dec 5, 2025
Date Submitted to PubMed: Dec 8, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Accelerating Discovery of Leukemia Inhibitors with AI-Driven QSAR Modeling
ABSTRACT
Background:
Leukemia treatment remains a major challenge in oncology. While Thiadiazolidinone (TDZD) analogs show potential to inhibit leukemia cell proliferation, they often lack sufficient potency and selectivity. Traditional drug discovery struggles to efficiently explore the vast chemical landscape, highlighting the need for innovative computational strategies. Machine learning (ML)-enhanced QSAR modeling offers a promising route to identify and optimize inhibitors with improved activity and specificity.
Objective:
To develop and validate an integrated machine learning–enhanced QSAR modeling workflow for the rational design and prediction of Thiadiazolidinone (TDZD) analogs with improved anti-leukemia activity, by systematically evaluating molecular descriptors and algorithmic approaches to identify key determinants of potency and guide future inhibitor optimization.
Methods:
We analyzed 35 TDZD derivatives with confirmed anti-leukemia activity, removing outliers for data quality. Using Schrödinger MAESTRO, we calculated 220 molecular descriptors (1D–4D). Seventeen ML models, including Random Forests, XGBoost, and Neural Networks, were trained on 70% of data and tested on 30%, using stratified sampling. Model performance was assessed with 12 metrics, including MSE, R², and SHAP values, and optimized via hyperparameter tuning and 5-fold cross-validation.
Results:
Ensemble methods, especially LightGBM and Random Forest, showed superior predictive performance (LightGBM: MSE = 0.00063 ± 0.00012; R² = 0.971 ± 0.0084). Isotonic Regression ranked second, outperforming baseline models by over 15% in explained variance. SHAP analysis identified hydrogen bond acceptor count (r_qp_accptHB), electronic properties, and solubility as key features for anti-leukemia activity.
Conclusions:
Integrating ML with QSAR modeling refines leukemia inhibitors and enhances prediction accuracy while revealing underlying mechanisms. This approach accelerates identification of potent compounds and offers a pathway to overcome therapeutic resistance in leukemia.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.