Predicting Workers’ Stress Using a High-Performance Algorithm Adopting Suitable Training Data Extraction Based on Working-Style Characteristics Similar to Those of the Prediction Target Person
ABSTRACT
Background:
Work characteristics, such as teleworking rate, have been studied in relation to stress; however, using work-related data to improve a high-performance stress prediction model that suits an individual’s lifestyle have not been evaluated.
Objective:
To develop a novel, high-performance stress prediction algorithm predicting the stress of an employee from a group of employees with similar working characteristics.
Methods:
This prospective observational study evaluated participants’ responses to web based questionnaires, including attendance records and data collected using a wearable device. Twelve-week data (January 17, 2022-April 10, 2022) were collected from 194 Shionogi Group employees. The Fitbit Charge4 wearable device collected daily sleep, activity, and heart rate data. Daily work shift data included details of working hours. Weekly questionnaire responses included the K6 questionnaire for depression/anxiety, behavioral questionnaire, and number of days lunch was missed. The proposed prediction model used a neighborhood group (n=20) with working-style characteristics similar to those of the prediction target person. Data from the previous week predicted stress levels in the following week. Three models were compared: single model (training data: first 7-weeks; test data: latter 5-weeks), proposed method 1 (training data: 12-weeks of neighborhood group+first 7 weeks of prediction target; test data: latter 5-weeks of prediction target), and proposed method 2 (training data: first 7-weeks of neighborhood group and prediction target; test data: latter 5-weeks of prediction target; threshold adaptation: latter 5-weeks of neighborhood group). SHapley Additive exPlanations (SHAP) was calculated for the top 10 extracted features (XGBoost) to evaluate the amount and contribution direction categorized by teleworking rates (mean): low: <0.2 (>4 days/week in office), middle: 0.2 to <0.6 (2-4 days/week in office), and high: ≥0.6 (<2 days/week in office).
Results:
Data of 190 participants were used, with a teleworking rate ranging from 0% to 79%. The AUC of proposed method 2 was 0.84 (TP vs FP: 0.77 vs 0.26). Among participants with low teleworking rates, most features extracted were related to sleep, followed by activity and work; among participants with high teleworking rates, most features were related to activity, followed by sleep and work. SHAP showed that in participants with high teleworking rates, skipping lunch, working more/less than scheduled duration of hours, higher fluctuations in heart rate, and lower mean sleep duration contributed to stress; in participants with low teleworking rates, coming too early/late to work (before/after 9am), a higher/lower than mean heart rate, lower fluctuations in heart rate, and burning more/less calories than normal contributed to stress.
Conclusions:
Forming a neighborhood group with similar working styles based on teleworking rates and using it as training data improved the prediction performance. The validity of the neighborhood group approach is indicated by differences in the contributing features and their contribution directions among teleworking levels. Clinical Trial: UMIN000046394
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.