JMIR Preprints #30697: Analyses of Original and Computationally-Derived Electronic Health Record Data: The National COVID Cohort Collaborative

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Analyses of Original and Computationally-Derived Electronic Health Record Data: The National COVID Cohort Collaborative

Randi Foraker;
Aixia Guo;
Jason Thomas;
Noa Zamstein;
Philip R.O. Payne;
Adam Wilcox;
N3C Collaborative

ABSTRACT

Background:

Synthetic data can be used by collaborators to generate and share data in support of answering critical research questions to address the COVID-19 pandemic. Computationally-derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record (EHR) data.

Objective:

Objectives: To compare the results of analyses using synthetic derivatives to analyses using the original data downloaded from a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) to assess the strengths and limitations of leveraging computationally-derived data for research purposes.

Methods:

We used the National COVID Cohort Collaborative’s (N3C) instance of MDClone, comprising EHR data from 34 N3C institutional partners. We tested three use cases, including (1) exploring the distributions of key features of the COVID-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-related measures and outcomes, and constructing their respective epidemic curves. We compared the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, temporal and spatial representations of the data.

Results:

For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. While the synthetic and original data yielded overall nearly the same results, there were exceptions which included an odds ratio on either side of the null in multivariable analyses (0.97 versus 1.01) and epidemic curves constructed for zip codes with low population counts.

Conclusions:

Discussion & Conclusion: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. Clinical Trial: N/A

Citation

Please cite as:

Foraker R, Guo A, Thomas J, Zamstein N, Payne PR, Wilcox A, N3C Collaborative

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data

J Med Internet Res 2021;23(10):e30697

DOI: 10.2196/30697

PMID: 34559671

PMCID: 8491642

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 3, 2021

Open Peer Review Period: Jun 3, 2021 - Jul 29, 2021

Date Accepted: Sep 12, 2021

Date Submitted to PubMed: Sep 24, 2021

(closed for review but you can still tweet)

Analyses of Original and Computationally-Derived Electronic Health Record Data: The National COVID Cohort Collaborative

ABSTRACT

Citation

Copyright