Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Formative Research

Date Submitted: Jun 10, 2026
Open Peer Review Period: Jun 11, 2026 - Aug 6, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Development and Validation of the Cardiac Wellness Risk Index Using Grocery Purchases and Wearable Devices

  • Kapil Kumar Reddy Poreddy; 
  • Ajit Sahu; 
  • Sanjoy Mukherjee

ABSTRACT

Background:

Cardiovascular disease risk assessment traditionally relies on infrequent clinical visits and self-reported dietary data subject to recall bias. Retail grocery transaction data and wearable device metrics provide continuous, objective behavioral signals that have not been systematically evaluated for cardiovascular risk prediction. Prior efforts to integrate these data streams have relied on synthetic cohorts, limiting clinical interpretability of reported performance metrics.

Objective:

We developed and validated the Cardiac Wellness Risk Index (CWRI), a privacy-preserving deep learning model integrating retail grocery purchase data as a dietary proxy with wearable device metrics to predict 10-year cardiovascular disease risk.

Methods:

We conducted a multi-cohort retrospective study using four publicly available real-world datasets. The CWRI model was developed using two real clinical cohorts: the UCI Heart Disease dataset (C1; n=918; 4-site multinational) and the Framingham Heart Study teaching dataset (C2; n=4,189). External validation used the National Health and Nutrition Examination Survey 2017-2018 (NHANES; C3/C4; n=4,881; 7.4% CVD event rate), a nationally representative US probability sample held out entirely during model development. A cohort (C5; n=50,000) was used exclusively for federated learning and differential privacy experiments; no diagnostic performance claims derive from it. Dietary exposure was derived from retail grocery transaction data (Instacart, Dunnhumby datasets) mapped to a 0-100 dietary quality score. Physical activity data came from consumer wearable datasets (PMData, CAPTURE-24, Fitbit). We developed a deep learning ensemble combining three architectures (Multi-Layer Perceptron with attention, TabNet, and Tabular Transformer) with federated learning and differential privacy (ε=1.0, δ=10⁻⁵). The dietary proxy was validated externally against NHANES self-rated diet quality and HbA1c biomarker data. Model performance was evaluated using AUC-ROC, Net Reclassification Improvement (NRI), Integrated Discrimination Improvement (IDI), and calibration metrics. Causal effect estimation used structural causal models with backdoor adjustment and E-value sensitivity analysis applied to observational development cohort data.

Results:

CWRI achieved AUC-ROC 0.863 (95% CI 0.851-0.875) on the real held-out test set, significantly outperforming SCORE2 (0.798, 95% CI 0.783-0.813, P<.001) and Framingham Risk Score (0.781, 95% CI 0.765-0.797, P<.001). NRI was 0.142 (95% CI 0.118-0.166, P<.001) and IDI was 0.089 (95% CI 0.076-0.102, P<.001). External validation of the dietary proxy on real NHANES data (n=4,881; 48.2% male; ethnically diverse) demonstrated correlation with participant self-rated diet quality (r=0.195, P<10⁻³⁸) and HbA1c (r=0.059, P=0.0001). CVD event rate declined monotonically across dietary quality tertiles (low: 8.7%; medium: 7.5%; high: 6.5%), consistent with the expected protective direction. Federated learning with differential privacy achieved AUC 0.856 (95% CI 0.843-0.869), only 0.007 lower than centralized training. Counterfactual simulations in the observational development cohort demonstrated that improving dietary quality from 25th to 75th percentile plus increasing moderate-vigorous physical activity by 30 minutes daily was associated with a 7.2 percentage point lower 10-year CVD risk (95% CI 6.1-8.3); these estimates are hypothesis-generating and not evidence-based clinical recommendation.

Conclusions:

Integrating retail grocery purchase data with wearable metrics through privacy-preserving deep learning significantly improved cardiovascular risk prediction beyond established clinical scores in real-world development cohorts. External validation of the dietary proxy on nationally representative NHANES data (n=4,881) confirmed biologically consistent associations with self-rated diet quality and HbA1c. This represents the first systematic evaluation of retail transaction data as a dietary proxy for cardiovascular risk assessment using multiple independent real-world datasets. Validation in prospective longitudinal cohorts with adjudicated CVD endpoints and direct retail transaction linkage remains the essential next step before clinical implementation. Key limitations include cross-sectional data sources, self-reported CVD outcomes in NHANES, and the purchase-consumption gap.


 Citation

Please cite as:

Poreddy KKR, Sahu A, Mukherjee S

Development and Validation of the Cardiac Wellness Risk Index Using Grocery Purchases and Wearable Devices

JMIR Preprints. 10/06/2026:104293

DOI: 10.2196/preprints.104293

URL: https://preprints.jmir.org/preprint/104293

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.