Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Mar 19, 2026
Open Peer Review Period: Mar 22, 2026 - May 17, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Data Missingness in Digital Phenotyping and Its Implications for Clinical Inference and Decision Making: the TRIM framework

  • Daniel S. Barron; 
  • Joanna Shen; 
  • Nathan Huey; 
  • Kareem Abdelkader; 
  • Ndey Isatou Jobe; 
  • Zacharia Isaac; 
  • Danielle Sarno; 
  • Jennifer Kruz; 
  • Melanie Fu; 
  • David Silbersweig; 
  • Jukka-Pekka Onnela

ABSTRACT

Background:

Digital phenotyping — the use of personal digital devices to capture real-world behavioral and physiological data — holds promise for measuring patient function over time. However, missing data remains a pervasive challenge in longitudinal studies where missingness may obscure clinically relevant signals. Existing statistical methods focus on group-level inference and offer limited guidance on how causes of missingness should inform clinical interpretation and decision-making.

Objective:

This study aimed to (1) characterize data missingness across multiple timescales in a longitudinal digital phenotyping study of patients with chronic pain, (2) evaluate the effect of imputation on clinical inference, and (3) propose a practical framework for categorizing and responding to missing data in clinical digital phenotyping research.

Methods:

We analyzed data from 85 patients with chronic musculoskeletal pain (mean age 55.2 years, SD 15.7; 51 female, 32 male, 1 transgender) recruited from the Pain Intervention and Digital Research Program. Active data (PROMIS-29 surveys, daily pain scores) and passive data (accelerometer, GPS) were collected via the Beiwe Research Platform over 180 days. Data completeness was computed at day, hour, and minute levels. Linear mixed-effects models assessed associations between daily missingness and Forest-derived summary measures (cadence, home time, significant locations). Cumulative Link Mixed Models and linear mixed models with autoregressive error structures evaluated associations between PROMIS domain scores and digital measures, adjusting for age, sex, race, and within-subject correlation. Complete-case analyses were compared against multiple imputation (predictive mean matching, proportional odds logistic regression, and MidasTouch) using 100 imputed datasets combined via Rubin's rules.

Results:

Median accelerometer completeness was 60% at the day level, 37% at the hour level, and 26% at the minute level; GPS completeness followed a similar pattern (57%, 34%, and 5%, respectively). Cadence showed no significant association with missingness (false discovery rate [FDR]-adjusted P=.32). The number of significant locations declined modestly with higher missingness (beta=-0.073 per 10 percentage points; FDR-adjusted P<.001). In complete-case analysis, higher cadence was associated with lower depression scores (95% CI -2.09 to -0.28; P=.01); this association was attenuated after multiple imputation (P=.13). Older participants remained enrolled longer (hazard ratio 0.979 per year; P=.02) but were less engaged while enrolled (odds ratio 0.962; P=.004). Race and sex were not significantly associated with engagement or retention.

Conclusions:

Data missingness in digital phenotyping varies substantially by the timescale at which it is assessed, and imputation choices can alter clinical interpretations. We propose the Triage and Response for Interpreting Missingness (TRIM) framework, which categorizes causes of missing data into Technology Failure, Clinically Relevant events, and Extraneous life events, each requiring distinct operational and analytical responses depending on the clinical context. TRIM provides a shared vocabulary for clinicians, statisticians, and engineers to ensure that missing data is meaningfully interpreted rather than merely imputed.


 Citation

Please cite as:

Barron DS, Shen J, Huey N, Abdelkader K, Jobe NI, Isaac Z, Sarno D, Kruz J, Fu M, Silbersweig D, Onnela JP

Data Missingness in Digital Phenotyping and Its Implications for Clinical Inference and Decision Making: the TRIM framework

JMIR Preprints. 19/03/2026:95468

DOI: 10.2196/preprints.95468

URL: https://preprints.jmir.org/preprint/95468

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.