Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Public Health and Surveillance

Date Submitted: Feb 14, 2025
Open Peer Review Period: Feb 14, 2025 - Apr 11, 2025
Date Accepted: Jul 4, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the Rapid Acceleration of Diagnostics (RADx) Data Hub

Martínez-Romero M, Horridge M, Mistry N, Weyhmiller A, Yu JK, Fujimoto A, Henry A, O'Connor MJ, Sier A, Suber S, Akdogan MU, Cao Y, Valliappan S, Mieczkowska JO, Krishnamurthy A, Keller MA, Musen MA, RADx Data Hub Team

A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the Rapid Acceleration of Diagnostics (RADx) Data Hub

JMIR Public Health Surveill 2025;11:e72677

DOI: 10.2196/72677

PMID: 40834404

PMCID: 12409176

A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the RADx Data Hub

  • Marcos Martínez-Romero; 
  • Matthew Horridge; 
  • Nilesh Mistry; 
  • Aubrie Weyhmiller; 
  • Jimmy K Yu; 
  • Alissa Fujimoto; 
  • Aria Henry; 
  • Martin J O'Connor; 
  • Ashley Sier; 
  • Stephanie Suber; 
  • Mete U Akdogan; 
  • Yan Cao; 
  • Somu Valliappan; 
  • Joanna O Mieczkowska; 
  • Ashok Krishnamurthy; 
  • Michael A Keller; 
  • Mark A Musen; 
  • RADx Data Hub Team

ABSTRACT

Background:

The COVID-19 pandemic exposed significant limitations in existing data infrastructure, particularly the lack of systems for rapidly collecting, integrating, and analyzing data to support timely and evidence-based public health responses. These shortcomings hampered efforts to conduct comprehensive analyses and make rapid, data-driven decisions in response to emerging threats. To overcome these challenges, the U.S. National Institutes of Health (NIH) launched the Rapid Acceleration of Diagnostics (RADx) initiative. A key component of this initiative is the RADx Data Hub—a centralized, cloud-based platform designed to support data sharing, harmonization, and reuse across multiple COVID-19 research programs and data sources.

Objective:

This paper presents the design, implementation, and capabilities of the RADx Data Hub, a cloud-based platform developed to support FAIR (Findable, Accessible, Interoperable, and Reusable) data practices and enable secondary analyses of COVID-19-related data contributed by a nationwide network of researchers.

Methods:

The RADx Data Hub was developed on a scalable cloud infrastructure, grounded in the FAIR data principles. The platform integrates heterogeneous data types—including clinical data, diagnostic test results, behavioral data, and social determinants of health—submitted by over 100 research organizations across 46 U.S. states and territories. The data pipeline includes automated and manual processes for de-identification, quality validation, expert curation, and harmonization. Metadata standards are enforced using ontology-driven tools such as the CEDAR Workbench and BioPortal. Data files are structured using a unified specification to support consistent representation and machine-actionable metadata.

Results:

As of May 2025, the RADx Data Hub hosts 178 studies and over data and metadata 1,700 files, spanning four RADx programs: RADx-UP, RADx-Tech, RADx-rad, and RADx-DHT. The Study Explorer and Analytics Workbench components enable researchers to discover relevant studies, inspect rich metadata, and conduct analyses within a secure cloud-based environment. Harmonized data conforming to a core set of Common Data Elements (CDEs) facilitate cross-study integration and support secondary use. The platform provides persistent identifiers (DOIs) for each study and supports access to structured metadata that adheres to the CEDAR specification, available in both JSON and YAML formats for seamless integration into computational workflows.

Conclusions:

The RADx Data Hub successfully addresses key data integration challenges by providing a centralized, FAIR-compliant platform for public health research. Its adaptable architecture and data management practices are designed to support secondary analyses and can be repurposed for other scientific disciplines, strengthening data infrastructure and enhancing preparedness for future health crises.


 Citation

Please cite as:

Martínez-Romero M, Horridge M, Mistry N, Weyhmiller A, Yu JK, Fujimoto A, Henry A, O'Connor MJ, Sier A, Suber S, Akdogan MU, Cao Y, Valliappan S, Mieczkowska JO, Krishnamurthy A, Keller MA, Musen MA, RADx Data Hub Team

A Cloud-Based Platform for Harmonized COVID-19 Data: Design and Implementation of the Rapid Acceleration of Diagnostics (RADx) Data Hub

JMIR Public Health Surveill 2025;11:e72677

DOI: 10.2196/72677

PMID: 40834404

PMCID: 12409176

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.