Accepted for/Published in: JMIR Formative Research
Date Submitted: Mar 5, 2024
Date Accepted: May 6, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Concordance between survey and electronic health record data in the COVID-19 Citizen Science study: a retrospective cohort analysis
ABSTRACT
Background:
Real-world data reported by patients and extracted from electronic health records is increasingly leveraged for research, policy, and clinical decision-making. However, it is not always obvious the extent to which these two data sources agree with each other.
Objective:
To evaluate the concordance of variables reported by participants enrolled in an electronic cohort study and data available in their electronic health records.
Methods:
Survey data from COVID-19 Citizen Science, an electronic cohort study, were linked to electronic health record data from 7 health systems, comprising 34,908 participants. Concordance was evaluated for demographics, chronic conditions, and COVID-19 characteristics. Overall agreement, sensitivity, specificity, positive predictive value, negative predictive value, and κ statistics with 95% CIs were calculated.
Results:
Of 34,017 participants with complete information, 62.3% were female, and the median age was 57 (IQR, 42-68). Agreement (κ) was high for sex (κ = 0.99) and Black (κ = 0.94), AAPI (κ = 0.93), and White (κ = 0.87) race and ethnicity but only moderate (κ = 0.54) for smoking status. Compared with chart data, participant report of chronic conditions had lower sensitivity and higher specificity, with widely varying levels of agreement (κ). Compared with participant report of COVID-19, electronic health record data had low sensitivity (32.2%) but higher specificity (95.8%). COVID-19 vaccination was the least concordant event (κ = 0.05) but had moderate sensitivity (49.7%) and high sensitivity (98.2%) compared to participant reports.
Conclusions:
Results suggest that additional work is required to integrate and prioritize participant-reported data in pragmatic research. Clinical Trial: ClinicalTrials.gov Identifier NCT5548803
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.