Accepted for/Published in: JMIR Medical Informatics
Date Submitted: May 11, 2021
Open Peer Review Period: May 10, 2021 - Jul 5, 2021
Date Accepted: Jan 2, 2022
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Deriving Weight from Big Data: A Comparison of Body Weight Measurement Cleaning Algorithms
ABSTRACT
Background:
Patient body weight is a frequently utilized measure in biomedical studies, yet there are exist no standard methods for processing and cleaning weight data. Conflicting documentation on constructing body weight measurements presents challenges for research and program evaluation.
Objective:
We sought to describe and compare methods for extracting and cleaning weight data from electronic health record (EHR) databases to develop guidelines for standardized approaches that promote reproducibility.
Methods:
We conducted a systematic review of studies that used Veterans Health Administration (VHA) EHR weight data, published from 2008 – 2018 and documented the algorithms for constructing patient weight. We applied these algorithms to a cohort of veterans with at least one Primary Care visit in 2016. The resulting weight measures were compared at the patient and site levels.
Results:
We identified 496 studies and included 62 that utilized weight as outcome variables; 48% included a replicable algorithm. Algorithms varied from cut-offs of implausible weights to complex models using measures within patient over time. We found differences in the number of weight values after applying the algorithms (86% to 99% of raw data) and decreased variance (SD = 68 to 54), but little difference in average weights across methods (216 to 220 lbs.). The percent of patients with at least 5% weight loss over one year ranged from 18% to 24%.
Conclusions:
Determining the best method to assess weight using EHR data can be computationally demanding. Our results suggest that for many studies, applying simple cut-offs that require fewer computing resources and are easier to understand may be sufficient. We present guidelines for situations where more complex approaches may be warranted.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.