Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 11, 2025
Date Accepted: Dec 5, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Digital Phenotyping for Adolescent Mental Health: A Feasibility Study Employing Machine Learning to Predict Mental Health Risk From Active and Passive Smartphone Data
ABSTRACT
Background:
Adolescents are particularly vulnerable to mental disorders, with over 75% of cases manifesting before the age of 25. Research indicates that only 18 to 34% of young people experiencing high levels of depression or anxiety symptoms seek support. Digital tools leveraging smartphones offer scalable and cost-effective early mental health intervention opportunities. Active (self-reported) and passive (sensor-based) data collected through smartphones enable digital phenotyping, providing rich insights into behavioural and environmental factors influencing mental health. Despite these advances, integrating these data streams from non-clinical adolescent populations remains underexplored.
Objective:
This study aimed to evaluate the feasibility of integrating active and passive smartphone data to predict mental health outcomes in non-clinical adolescents using a novel machine learning framework. Specifically, we investigated the utility of the Mindcraft app in predicting risks for internalising and externalising disorders, eating disorders, insomnia and suicidal ideation, with an emphasis on improving prediction accuracy through data integration and advanced modelling techniques.
Methods:
Participants (N=103; mean age 16.1 years) were recruited from three London schools. At baseline, participants completed the Strengths and Difficulties Questionnaire (SDQ), the Eating Disorders-15 Questionnaire (ED-15), the Sleep Condition Indicator Questionnaire (SCI) and indicated the presence/absence of suicidal ideation. They used the Mindcraft app for 14 days, contributing active data via self-reports such as mood, sleep, and loneliness and passive data from smartphone sensors such as step count, location, and ambient noise. A contrastive pretraining phase was applied to enhance user-specific feature stability, followed by supervised fine-tuning. The model evaluation employed leave-one-subject-out cross-validation using balanced accuracy as the primary metric. Comparative analyses were conducted with CatBoost and MLP models without pretraining. SHAP values provided interpretability for feature contributions.
Results:
The integration of active and passive data achieved superior performance compared to individual data sources, with mean balanced accuracies of 0.71 for SDQ-High risk, 0.67 for insomnia, 0.77 for suicidal ideation and 0.70 for eating disorders. The contrastive learning framework stabilised daily behavioural representations, enhancing predictive robustness. SHAP analysis revealed clinically meaningful features, such as negative thinking and location entropy, underscoring the complementary nature of active and passive data.
Conclusions:
This study demonstrates the potential of integrating active and passive smartphone data with advanced machine learning techniques for predicting adolescent mental health risks. By using innovative machine learning approaches, such as contrastive learning, and leveraging a scalable platform like Mindcraft, we establish a comprehensive framework for identifying early mental health challenges across a range of outcomes. These results pave the way for developing more accessible strategies to support early detection and interventions in adolescent mental health.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.