Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jan 16, 2022
Date Accepted: Feb 25, 2022
Construction of a Big Data Platform in Healthcare with Multi-source, Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application
ABSTRACT
Background:
With the advent of data-intensive science, a full integration of big data science and healthcare will bring a cross-field revolution to the medical community in China. The concept of big data represents not only a technology but also a resource and a method. Big data is regarded as an important strategic resource both at the national level and at the medical institutional level, so great importance has been attached to the construction of a big data platform in healthcare.
Objective:
This study describes the development and implementation a big data platform in healthcare for a large hospital that overcomes the difficulties of integrating, calculating, storing, and governing multi-source, heterogeneous data in a standardized way, as well as ensuring healthcare data security. The platform can combine the operation systems’ data of all departments/sections in a hospital to form a massive, high-dimensional, high-quality healthcare database. This will enable the reutilization of electronic medical records (EMRs) and effectively tap into the value of data to fully support the hospital's clinical services, scientific research, and operations management.
Methods:
The project to build a big data platform at West China Hospital of Sichuan University (WCH-BDP) was launched in 2017. It has extracted, integrated, and governed the data from different departments/sections of the hospital going back to January 2008. A master–slave mode was implemented to realize real-time integration of multi-source, heterogeneous massive data, and a heterogeneous characteristic data storage and calculation environment that separates storage and calculation processes was built. A standardized healthcare data governance system and a scientific, closed-loop data security ecology were established. A data lineage–based model was improved for data quality control.
Results:
After 3 years of design, development, and testing, WCH-BDP was formally put online in November 2020. It has formed a massive multi-dimensional data resource database including more than 12.49 million patients, 75.67 million visits, and 8,475 data variables. Along with the hospital's operations, newly generated data are entered into the platform in real time. During the year-plus since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams. It enabled the data support model to shift from conventional manual extraction to self-service retrieval, which has reached 8,561 retrievals per month.
Conclusions:
The application of WCH-BDP has shown that this healthcare big data platform can successfully generate multi-source, heterogeneous data storage and computing power. By effectively governing massive multi-dimensional data gathered from multiple sources, WCH-BDP provides highly available data assets and thus has high application value in the healthcare field. WCH-BDP has made the utilization of EMR data in real-world research simpler and more efficient.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.