Currently submitted to: JMIR Formative Research
Date Submitted: Jun 5, 2026
Open Peer Review Period: Jun 7, 2026 - Aug 2, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Piloting the automatic transformation of real-world data to the OMOP common data model: the MS core dataset to OMOP SwitchBox
ABSTRACT
Background:
Real-world data (RWD) gathered from multiple sclerosis (MS) registries and cohorts is essential for deepening our understanding of this complex autoimmune disease. However, the transition from RWD to real-world evidence (RWE) faces challenges due to substantial data heterogeneity arising from varying dataset definitions and varying local compliance with global standards. This inconsistency constrains the comparability and applicability of data across various studies and settings.
Objective:
This paper presents the pilot of a tool, named "SwitchBox," to facilitate and streamline the conversion of MS RWD into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), enhancing data harmonisation and enabling easier collaboration across healthcare datasets.
Methods:
The SwitchBox is a framework for ETL (Extract-Transform-Load) modules that transform data in a specific format into the OMOP CDM. The MS Data Alliance (MSDA) Core Dataset (CDS) acts as the pilot ETL module. In a collaboration between academia and industry, the development focused on two major parts: (1) the framework and (2) the MSDA CDS ETL development. The mapping process followed the basic recommendations for ETL development provided by EHDEN/OHDSI, including the development of a synthetic dataset based on MSDA CDS data dictionary without real patient information to ensure data privacy, the creation of a scan report using the OHDSI tool "White Rabbit" and the utilisation of "Rabbit in a Hat" for developing a structural mapping document, aligning source variables with OMOP CDM tables and concept IDs via "Athena."
Results:
The developed SwitchBox successfully transforms data in the MSDA CDS format to the OMOP CDM. The mapping process turned out to be the most challenging aspect of the SwitchBox project. For the 44 MSDA CDS variables, we targeted eight OMOP CDM tables, with the Observation table being the most utilised. We identified specific challenges in the structural mapping process, which were not unique to MS data but could inform other registry-based sources. The SwitchBox pilot is distributed as a Docker container and is provided via a GitHub container registry.
Conclusions:
The development of the SwitchBox represents a significant advancement in the standardisation of MS RWD collection and analysis. By simplifying the conversion to the OMOP CDM, this tool mitigates barriers to data harmonisation and facilitates broader collaborations across healthcare research. The findings highlight the need to integrate user-friendly solutions to transform RWD into actionable insights, underscoring the potential for greater data utility in clinical, research, or policy-making contexts. Further research should focus on expanding the tool’s application to a broader array of data sources and disease areas.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.