Accepted for/Published in: JMIR mHealth and uHealth
Date Submitted: Jun 1, 2018
Open Peer Review Period: Jun 2, 2018 - Jul 20, 2018
Date Accepted: Nov 14, 2018
(closed for review but you can still tweet)
Applying Multivariate Segmentation Methods to Human Activity Recognition from Wearable Sensors Data
ABSTRACT
Background:
Time-resolved quantification of physical activity can contribute to both personalized medicine and modern epidemiological research studies, for example managing and/or identifying triggers of asthma exacerbations, as in the National Institutes of Health (NIH) Pediatric Research using Integrated Sensor Monitoring Systems (PRISMS) program. A growing number of reportedly accurate machine learning algorithms for human activity recognition (HAR) have been developed using data from wearable devices (e.g., smartwatch, smartphone). However, many HAR algorithms depend on fixed-size sampling windows that may poorly adapt to real-world conditions in which activity bouts are of unequal duration. A small sliding window can produce noisy predictions under stable conditions, whereas a large sliding window may miss brief bursts of intense activity.
Objective:
We aimed to create an HAR framework adapted to variable duration activity bouts by: (a) detecting the change points of activity bouts in a multivariate time-series and (b) predicting activity for each homogeneous window defined by these change points.
Methods:
We applied standard fixed width sliding windows (4-6 different sizes) or Greedy Gaussian Segmentation (GGS) to identify breakpoints in filtered triaxial accelerometer and gyroscope data. After standard feature engineering, we applied an Xgboost model to predict physical activity within each window, and then converted windowed predictions to instantaneous predictions to facilitate comparison across segmentation methods. We applied these methods in two datasets: the “HARuS†dataset where N=30 adults performed activities of approximately equal duration (~20s each) while wearing a waist-worn smartphone and the “BREATHE†dataset where N=14 children performed 6 activities for ~10 minutes each while wearing a smartwatch. To mimic a real-world scenario, we generated artificial unequal activity bout durations in the BREATHE data by randomly subdividing each activity bout into 10 segments and randomly concatenating the 60 activity bouts. Each dataset was divided into ~90% training and ~10% holdout testing.
Results:
In the HARuS data, GGS produced the least noisy predictions of six physical activities and had the second highest accuracy rate of 84.0% (the highest accuracy rate was 85.3% for the sliding window of size 0.8s). In the BREATHE data, GGS again produced the least noisy predictions and had the highest accuracy rate of 79.4% of predictions for six physical activities.
Conclusions:
In a scenario with variable duration activity bouts, GGS multivariate segmentation produced “smart-sized†windows that resulted in more stable predictions and a higher accuracy rate than traditional fixed-size sliding window approaches. Overall, accuracy was good in both datasets but, as expected, it was slightly lower in the more real-world study using wrist-worn smartwatches in children (BREATHE) than in the more tightly controlled study using waist-worn smartphones in adults (HARuS). We implemented GGS in an offline setting, but it could be adapted for real-time prediction with streaming data.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.