Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 3, 2021
Open Peer Review Period: Jun 3, 2021 - Jul 29, 2021
Date Accepted: Sep 12, 2021
Date Submitted to PubMed: Sep 24, 2021
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data

Foraker R, Guo A, Thomas J, Zamstein N, Payne PR, Wilcox A, N3C Collaborative

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data

J Med Internet Res 2021;23(10):e30697

DOI: 10.2196/30697

PMID: 34559671

PMCID: 8491642

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Analyses of Original and Computationally-Derived Electronic Health Record Data: The National COVID Cohort Collaborative

  • Randi Foraker; 
  • Aixia Guo; 
  • Jason Thomas; 
  • Noa Zamstein; 
  • Philip R.O. Payne; 
  • Adam Wilcox; 
  • N3C Collaborative

ABSTRACT

Background:

Background:

Synthetic data can be used by collaborators to generate and share data in support of answering critical research questions to address the COVID-19 pandemic. Computationally-derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record (EHR) data.

Objective:

Objectives: To compare the results of analyses using synthetic derivatives to analyses using the original data downloaded from a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) to assess the strengths and limitations of leveraging computationally-derived data for research purposes.

Methods:

Methods:

We used the National COVID Cohort Collaborative’s (N3C) instance of MDClone, comprising EHR data from 34 N3C institutional partners. We tested three use cases, including (1) exploring the distributions of key features of the COVID-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-related measures and outcomes, and constructing their respective epidemic curves. We compared the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, temporal and spatial representations of the data.

Results:

Results:

For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. While the synthetic and original data yielded overall nearly the same results, there were exceptions which included an odds ratio on either side of the null in multivariable analyses (0.97 versus 1.01) and epidemic curves constructed for zip codes with low population counts.

Conclusions:

Discussion & Conclusion: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. Clinical Trial: N/A


 Citation

Please cite as:

Foraker R, Guo A, Thomas J, Zamstein N, Payne PR, Wilcox A, N3C Collaborative

The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data

J Med Internet Res 2021;23(10):e30697

DOI: 10.2196/30697

PMID: 34559671

PMCID: 8491642

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.