Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 18, 2022
Date Accepted: Dec 31, 2022
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Can a single variable predict early dropout from digital health interventions? A comparison of predictive models from large randomized trials
ABSTRACT
Background:
Background:
A single generalizable metric that accurately predicts early dropout from digital health interventions has the potential to readily inform intervention targets and treatment augmentations that could boost retention and intervention outcomes. We recently identified a type of early dropouts from digital health interventions for smoking cessation, specifically users who login during the first week of the intervention and then stop thereafter (called “one-week users”), who had a substantially lower smoking cessation rate with our iCanQuit stop smoking app as compared to users who used the app for longer time periods.
Objective:
Objectives: To explore whether login count data, using standard statistical methods, can precisely predict whether an individual will become an iCanQuit early dropout (i.e., one-week user), while validating the approach using other statistical methods and randomized trial data from three other digital interventions for smoking cessation (combined N randomized = 4529).
Methods:
Methods:
Standard logistic regression models were used to predict early dropouts (i.e., one-week users) in iCanQuit as well as in the National Cancer Institute’s QuitGuide cessation app, WebQuit.org cessation intervention website, and Smokefree.gov cessation intervention website. The main predictors were the number of times a participant logged in per day during the first seven days following randomization. The area under the curve (AUC) assessed the performance of the logistic regression models, which were compared with decision tree, support vector machine (SVM), and neural network models. We also examined whether a limited number of commonly collected baseline demographic variables (e.g., age, education) might improve this prediction.
Results:
Results:
The AUC (95% confidence interval) for each logistic regression model using only the first seven days of login count variables was 0.94 (0.90, 0.97) for iCanQuit; 0.88 (0.83, 0.93) for QuitGuide; 0.85 (0.80, 0.88) for WebQuit.org; and 0.60 (0.54, 0.66) for Smokefree.gov. Replacing logistic regression models by more complex decision tree, SVM, or neural network models did not significantly increase the AUC, nor did including additional baseline variables as predictors. The sensitivity and specificity were generally good, and were excellent for iCanQuit (i.e., 0.91 and 0.85, respectively, at the 0.5 classification threshold).
Conclusions:
Conclusions:
Logistic regression models using only the first seven days of login count data are generally good at predicting early dropouts. These models performed well using simple, automated, and readily available login count data, whereas including self-reported socio-demographic baseline variables did not improve the prediction. The results will inform the early identification of people at risk for early dropout from digital health interventions, with the goal of intervening further by providing them augmented treatments to increase their retention, and ultimately, intervention outcomes.
Citation