Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jul 30, 2025
Open Peer Review Period: Aug 1, 2025 - Sep 26, 2025
Date Accepted: Dec 5, 2025
Date Submitted to PubMed: Dec 8, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation

Kakraba S, Agyemang EF, Shmookler Reis RJ

Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation

JMIR AI 2026;5:e81552

DOI: 10.2196/81552

PMID: 41358925

PMCID: 12892034

Accelerating Discovery of Leukemia Inhibitors with AI-Driven QSAR Modeling

  • Samuel Kakraba; 
  • Edmund Fosu Agyemang; 
  • Robert J. Shmookler Reis

ABSTRACT

Background:

Leukemia treatment remains a major challenge in oncology. While Thiadiazolidinone (TDZD) analogs show potential to inhibit leukemia cell proliferation, they often lack sufficient potency and selectivity. Traditional drug discovery struggles to efficiently explore the vast chemical landscape, highlighting the need for innovative computational strategies. Machine learning (ML)-enhanced QSAR modeling offers a promising route to identify and optimize inhibitors with improved activity and specificity.

Objective:

To develop and validate an integrated machine learning–enhanced QSAR modeling workflow for the rational design and prediction of Thiadiazolidinone (TDZD) analogs with improved anti-leukemia activity by systematically evaluating molecular descriptors and algorithmic approaches to identify key determinants of potency and guide future inhibitor optimization.

Methods:

We analyzed 35 TDZD derivatives with confirmed anti-leukemia activity, removing outliers for data quality. Using Schrödinger MAESTRO, we calculated 220 molecular descriptors (1D–4D). Seventeen ML models, including Random Forests, XGBoost, and Neural Networks, were trained on 70% of data and tested on 30%, using stratified sampling. Model performance was assessed with 12 metrics, including MSE, R², and SHAP values, and optimized via hyperparameter tuning and 5-fold cross-validation. Additional analyses including train-test gap assessment, comparison to baseline linear models, and cross-validation stability analysis were performed to assess genuine learning rather than overfitting.

Results:

Ensemble methods, especially LightGBM and Random Forest, showed superior predictive performance (LightGBM: MSE = 0.00063 ± 0.00012; R² = 0.971 ± 0.0084). Training-to-test performance degradation was modest (ΔR² = -0.01, ΔMSE = +0.000126), suggesting genuine pattern learning rather than memorization. Isotonic Regression ranked second, outperforming baseline models by over 15% in explained variance. SHAP analysis revealed that the most influential features contributing to anti-leukemia activity were global molecular shape (r_qp_glob; mean SHAP value = 0.52), weighted polar surface area (r_qp_WPSA; ~0.50), polarizability (r_qp_QPpolrz; ~0.49), partition coefficient (r_qp_QPlogPC16; ~0.48), solvent-accessible surface area (r_qp_SASA; ~0.48), hydrogen bond donor count (r_qp_donorHB; ~0.48), and the sum of topological distances between oxygen and chlorine atoms (i_desc_Sum_of_topological_distances_between_O.Cl; ~0.47). These parameters highlight the importance of steric complementarity and the three-dimensional arrangement of functional groups. Aqueous solubility (r_qp_QPlogS; ~0.47) and hydrogen bond acceptor count (r_qp_accptHB; ~0.44) were also among the top ten features. The significance of these descriptors was consistent across multiple algorithmic models, including Random Forest, XGBoost, and PLS approaches.

Conclusions:

Integrating advanced ML with QSAR modeling enables systematic analysis of structure-activity relationships in TDZD analogs on this dataset. While ensemble methods capture complex patterns with high internal validation metrics, external validation on independent compounds and prospective experimental testing are essential before broad therapeutic claims can be made. This work provides a methodological foundation and identifies molecular features for future validation efforts.


 Citation

Please cite as:

Kakraba S, Agyemang EF, Shmookler Reis RJ

Accelerating Discovery of Leukemia Inhibitors Using AI-Driven Quantitative Structure-Activity Relationship: Algorithm Development and Validation

JMIR AI 2026;5:e81552

DOI: 10.2196/81552

PMID: 41358925

PMCID: 12892034

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.