JMIR Preprints #40524: Data quality degradation on prediction models generated from continuous activity and heart rate monitoring: an exploratory analysis using simulation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Data quality degradation on prediction models generated from continuous activity and heart rate monitoring: an exploratory analysis using simulation

Jason Hearn;
Jef Van den Eynde;
Bhargava Chinni;
Ari Cedars;
Danielle Gottlieb-Sen;
Shelby Kutty;
Cedric Manlhiot

ABSTRACT

Background:

Limited data accuracy is often cited as a reason for caution in the integration of physiological data obtained from consumer-oriented wearable devices in care management pathways. The effect of decreasing accuracy on predictive models generated from these data has not previously been investigated.

Objective:

The objective of this study is to simulate the effect of data degradation on the reliability of prediction models generated from that data and thus determine the extent to which lower device accuracy might or might not limit their use in clinical settings.

Methods:

Using the Multilevel Monitoring of Activity and Sleep in Healthy People (MMASH) dataset, which includes continuous free-living step count and heart rate data from 21 healthy volunteers, we trained a random forest model to predict cardiac competence. Model performance in 75 perturbed datasets with increasing missingness, noisiness, bias and a combination of all three perturbations was compared to model performance for the unperturbed dataset.

Results:

The unperturbed dataset achieved a mean root mean square error (RMSE) of 0.079±0.001 in predicting cardiac competence index. For all types of perturbations, RMSE remained stable up to 20-30% perturbation. Above this level, RMSE climbed to a point at which the model was no longer predictive at 80% for noise, 50% for missingness and 35% for the combination of all perturbations. Introducing systematic bias in the underlying data had no effect on RMSE.

Conclusions:

In this proof-of-concept study, performance of predictive models for cardiac competence generated from continuously-acquired physiological data was relatively stable with declining quality of the source data. As such, should these findings be replicated in other context, it follows that lower accuracy might not be an absolute contraindication for the use of consumer-oriented wearable devices in clinical prediction models.

Citation

Please cite as:

Hearn J, Van den Eynde J, Chinni B, Cedars A, Gottlieb-Sen D, Kutty S, Manlhiot C

Data Quality Degradation on Prediction Models Generated From Continuous Activity and Heart Rate Monitoring: Exploratory Analysis Using Simulation

JMIR Cardio 2023;7:e40524

DOI: 10.2196/40524

PMID: 37133921

PMCID: 10193221

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Cardio

Date Submitted: Jun 24, 2022

Date Accepted: Nov 30, 2022

Data quality degradation on prediction models generated from continuous activity and heart rate monitoring: an exploratory analysis using simulation

ABSTRACT

Citation

Copyright