Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Oct 23, 2024
Date Accepted: Mar 4, 2025

The final, peer-reviewed published version of this preprint can be found here:

Web-Based Explainable Machine Learning-Based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: Model Development and Validation Study

Chan FY, Ku YE, Lie WN, Chen HY

Web-Based Explainable Machine Learning-Based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: Model Development and Validation Study

JMIR Form Res 2025;9:e67767

DOI: 10.2196/67767

PMID: 40209178

PMCID: 12005597

Online Explainable Machine Learning-based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: A Multicenter Retrospective Study

  • Fan-Ying Chan; 
  • Yi-En Ku; 
  • Wen-Nung Lie; 
  • Hsiang-Yin Chen

ABSTRACT

Background:

Machine learning models with time-series data, unlike one-snap data collection that only identifies high-risk patients, can predict the adverse event to aid cancer therapy's race against time.

Objective:

This study used a time-series data collecting method to develop and validate machine learning models for sunitinib- and sorafenib-associated thyroid dysfunction.

Methods:

Time series data of patients first prescribed sunitinib or sorafenib were collected from a deidentified clinical research database. Logistic regression, random forest, Adaptive Boosting, Light Gradient Boosting Machine, and Gradient Boosting Decision Tree (GBDT) were used to develop the models. Prediction performances were compared using the accuracy, precision, recall, f1 score, the area under the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). The optimal threshold of the best-performing model was selected based on the maximum f1 score. SHapley Additive exPlanations (SHAP) were applied to understand feature importance and contributions for the cohort and each patient.

Results:

The training cohort contained 609 patients, whereas the temporal validation cohort had 198 patients. The GBDT without resampling outperformed other models, with respective values of the AUPRC, AUROC, and f1 score of 0.600, 0.876, and 0.583 after adjusting the threshold. The SHAP analysis revealed that the most important features were a higher cholesterol level, longer summed days of medication use, and histology of clear cell adenocarcinoma. The final model was further integrated into a web-based application.

Conclusions:

The model can be an explainable adverse drug reaction surveillance system for predicting sunitinib- and sorafenib-associated thyroid dysfunction.


 Citation

Please cite as:

Chan FY, Ku YE, Lie WN, Chen HY

Web-Based Explainable Machine Learning-Based Drug Surveillance for Predicting Sunitinib- and Sorafenib-Associated Thyroid Dysfunction: Model Development and Validation Study

JMIR Form Res 2025;9:e67767

DOI: 10.2196/67767

PMID: 40209178

PMCID: 12005597

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.