Accepted for/Published in: JMIR Formative Research
Date Submitted: Mar 4, 2022
Open Peer Review Period: Mar 4, 2022 - Mar 11, 2022
Date Accepted: Apr 14, 2022
Date Submitted to PubMed: Apr 14, 2022
(closed for review but you can still tweet)
A Machine Learning Approach Detecting Digital Behavioural Patterns of Depression Using Non-intrusive Smartphone Data - A Complementary Path to PHQ-9 Assessment: A Prospective Observational Study
ABSTRACT
Background:
Depression is a major global cause of morbidity, an economic burden and the greatest health challenge leading to chronic disability. Mobile monitoring of mental conditions has long been a sought-after metric to overcome the problems associated with the screening, diagnosis and monitoring of depression and its heterogeneous presentation. The widespread availability of smartphones has made it possible to use its data to generate digital behavioural models which can be used for both clinical and remote screening and monitoring purposes, providing a tentative and scalable solution to the pressing global need for early and effective solutions. This study is novel because it adds to the field by conducting a trial using private and non-intrusive sensors that can help detect and monitor depression in a continuous passive manner.
Objective:
This study demonstrates a novel mental behavioral profiling metric (Mental Health Similarity Score) derived from analyzing passively monitored, private and non-intrusive smartphone usage data, to identify and track depressive behavior and its progression. The analysis is performed using machine learning models trained on different levels of depression severity measured through the PHQ-9 (Patient Health Questionnaire-9) questionnaire.
Methods:
Smartphone data sets and self-reported 9-item PHQ depression assessments were collected from 558 smartphone users on the Android operating system in an observational study over an average of 10.7 days (SD=23.7). We quantified 37 digital behavioral markers from the passive smartphone data set and explored the relationship between the digital behavioral markers and depression using correlation coefficients. We leveraged four separate supervised random forest machine learning (ML) classification algorithms with hyperparameter optimization, fifteen-fold cross-validation, bootstrapping and imbalanced data handling to predict depression and its severity using PHQ-9 scores as the ground truth. We also quantified an additional three digital markers from gyroscope sensors and explored its feasibility in improving the model’s accuracy in detecting depression.
Results:
Of the 558 participants, 254 (46%) were males and 286 (51%) were females and 18 (3%) preferred not to say. Participants age distribution is as follows: 474 (85%) users between the ages of 18-25, 29 (5%) aged between 26-35 , 42 (7%) aged between 36-55, 10 (2%) were aged between 56-64 and 3 (<1%) above 64 years of age. Of the 558 reported PHQ-9 assessments, 63 responses were none (not depressed; scored <5), 124 responses indicated mild depression (scored 5-9), 162 indicated moderate depression (scored 10-14), 131 indicated moderately severe (scored 15-19) and 78 indicated severe depression (scored 20-27), as identified by the PHQ-9 cut off points. Gender imbalance was present within each of the 5 severity groups, with a male majority in the none and mild groups and female majority in the moderate, moderately severe, and severe groups. Of the 469 participants that reported having ‘No Diagnosis’ as their current diagnostic status in their demographics questionnaire, 307 (65%) scored moderate to severe depression (PHQ-9 scores >=10). The PHQ-9 Binary Non-sensor (none vs. severe) model achieved the following metrics: precision 85-89%; recall 85-89%; F1 87%, and overall accuracy is 87%. The PHQ-9 three class (none vs. mild vs. severe) model achieved the following metrics: precision 74-86%; recall 76-83%; F1 75-84%, and overall accuracy is 78%. When correlating all 9 items of the PHQ-9, a significant positive Pearson correlation was found specifically between PHQ-9 questions 2, 6 and 9 within the severe category users and the mental behavioral profiling metric (r=0.73). The PHQ-9 question specific (questions 2,6, and 9) model achieved the following metrics: precision 76-80%; recall 75-81%; F1 78-89%, and overall accuracy is 78%. When adding a gyroscope sensor as a feature, the Pearson correlation between 2,6 and 9 dropped from r= 0.73 to r=0.46. Mean activity (P<.001) and average gap activity (P<.001) features from the gyroscope sensors had statistically significant differences between none and severe individuals. The PHQ-9 Gyroscope sensor model achieved the following metrics: precision 74-78%; recall 67-83%; F1 72-78%, and overall accuracy is 76%.
Conclusions:
Our results demonstrate that the Mental Health Similarity Score can be used to identify and track depressive behavior and its progression with high accuracy. Therefore, the current and traditional methods of assessing depression can be coupled with digital behavioral markers to have a significant impact in mitigating depression and its far-reaching consequences.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.