Currently submitted to: JMIR Medical Informatics
Date Submitted: Jun 5, 2026
Open Peer Review Period: Jun 30, 2026 - Aug 25, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Smartphone-Derived Behavioral Summaries and Psychosocial Risk Factors for Identifying Past-Year Nonsuicidal Self-Injury in College Students: Machine Learning Model Development and External Validation Study
ABSTRACT
Background:
Background:
Nonsuicidal self-injury (NSSI) is a major mental health concern among college students. Scalable identification of students with past-year NSSI remains challenging because the outcome is uncommon in general student populations and many smartphone-related studies rely on retrospective self-report rather than device-generated behavioral summaries.
Objective:
Objective:
This study aimed to develop and externally validate machine learning models for identifying students with past-year NSSI by integrating psychosocial risk factors with app-collected smartphone-derived behavioral summaries, and to evaluate whether these summaries provide scalable digital context beyond conventional psychosocial and sleep-related domains.
Methods:
Methods:
We conducted a model development and external validation study among Chinese college students. The development cohort included 18,723 students from six universities in 2022, and the external validation cohort included 21,063 students from a separate university after identical quality-control rules. Past-year NSSI was assessed using the Ottawa Self-Injury Inventory and coded positive if any self-injurious act was reported in the past 12 months. Smartphone indicators were collected through a study app as aggregated weekly operating-system usage summaries, including total use duration and pickup/unlock counts; no message content, contact lists, keystrokes, location traces, or media files were collected. Because upstream app installation and permission denominators were unavailable, smartphone-derived findings apply to students with valid synchronized summaries. Feature ranking, final feature selection, model tuning, model selection, and threshold selection were performed within the development training workflow only. The locked primary model for external validation was an XGBoost classifier. Performance was assessed using AUC, average precision (AP), calibration, decision curve analysis, threshold-based metrics, and exploratory subgroup analyses.
Results:
Results:
In external validation (N=21,063; prevalence=3.04%), the XGBoost final model achieved AUC=0.896 (95% CI 0.883-0.908) and AP=0.300 (95% CI 0.266-0.339), approximately 9.9 times the no-skill AP baseline. At the primary threshold of 0.10, recall was 0.718 (95% CI 0.683-0.753), precision was 0.164 (95% CI 0.150-0.177), F1 was 0.266 (95% CI 0.247-0.285), and accuracy was 0.880 (95% CI 0.875-0.884). Calibration showed a Brier score of 0.0258, slope of 1.09, and intercept of -0.47, indicating overestimation of absolute risk in the lower-prevalence external cohort. Prominent predictors included anxiety, adverse childhood experiences, sleep quality, social support, academic and health-behavior variables, and depressive symptoms.
Conclusions:
Conclusions:
In this large model development and external validation study, a multidomain machine learning model achieved strong discrimination and useful precision-recall performance for identifying students with past-year NSSI. Smartphone-derived summaries provided objective behavioral context, but psychosocial and sleep-related predictors dominated performance. The model should be considered a risk-stratification aid for voluntary, low-burden outreach rather than a diagnostic or automated screening tool.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.