Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 6, 2020
Date Accepted: Nov 5, 2020
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development of a Prediction Model for Customer Churn with Life-log Data for Digital Healthcare Applications: A Retrospective Observational Study
ABSTRACT
Background:
In digital healthcare, user churn prediction is important not only in terms of revenue for a company but also for the improvement of the health of users. Churn prediction has been studied in many past studies, but most of them applied time-invariant model structures and primarily used structured data. However, an increasing amount of unstructured data has become available, and it became necessary to process daily time-series log data in churn prediction.
Objective:
The purpose of this study is to apply a recurrent neural network structure to accept time-series patterns using life-log data and text message data to predict the churn of digital healthcare users.
Methods:
This study was based on a digital healthcare application that provides the functions of food, exercise, and weight logging, and interactive messages with human coaches. Among the users in Korea enrolled between January 1, 2017 and January 1, 2019, we defined churn users according to the following criteria: 1) users who received a refund before the paid program ended; and 2) users who received a refund after 7 days of the trial period. We used LSTM with a masking layer to receive sequence data of different lengths. We also carried out topic modeling to vectorize text messages. To interpret the contributions of each variable to the predictions of the model, we used integrated gradients, which is an attribution method
Results:
A total of 1,868 eligible users were included in this study. The final classification performance of churn prediction was 0.89 (F1-score), and the score decreased by 0.12 when the data of the final week were excluded (0.77, F1-score). In addition, when text data were included, the predicted performance increased by approximately 0.085 (F1-score) on average at every time point. As for the contribution of each variable, the number of steps per day had the largest contribution (0.1085, contribution on model output), and among the topic variables, topic about bad habits (e.g., drinking, overeating, and late-night eating) showed the largest contribution (0.0875).
Conclusions:
The model with recurrent neural network architecture that uses user log data and message data demonstrates high performance in churn classification. In addition, the contribution analysis of variables is expected to help identify signs of user churn in advance and improve the compliance rate in digital healthcare.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.