Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 5, 2019
Open Peer Review Period: Feb 8, 2019 - Apr 5, 2019
Date Accepted: Jul 7, 2019
(closed for review but you can still tweet)
Predicting dropouts from an eHealth platform for lifestyle interventions: Analysis of methods and predictors
ABSTRACT
Background:
The increasing prevalence and economic impact of chronic diseases challenge health care systems globally. Digital solutions can potentially improve efficiency and quality of care, but these initiatives struggle with nonusage attrition. Machine learning methods have proven to predict dropouts in other settings but lack implementation in healthcare.
Objective:
This study seeks to gain insight into the causes of attrition for patients in an eHealth intervention for chronic lifestyle diseases and evaluate if attrition can be predicted and consequently prevented. We aim to build predictive models that can identify patients in a digital lifestyle intervention at high risk of dropout by analyzing several predictor variables applied in different models and further assess the possibilities and impact of implementing such models into an eHealth platform.
Methods:
Data from 2,684 patients using an eHealth platform was iteratively analyzed using logistic regression, decision trees, and random forest models. The dataset was split into an 80% training and cross-validation set and a 20% hold-out test set. Trends in activity patterns were analyzed to assess engagement over time. Development and implementation were performed iteratively with health coaches.
Results:
Patients in the test dataset were classified as dropouts with an 89% precision using a random forest model and 11 predictor variables. The most significant predictors were the provider of the intervention, two weeks inactivity, and the number of advice received from the health coach. Engagement in the platform drops significantly leading up to the time of dropout.
Conclusions:
Dropouts from eHealth lifestyle interventions can be predicted using various data mining methods. This can support health coaches in preventing attrition by receiving proactive warnings. The best performing predictive model was found to be the random forest.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.