Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 16, 2022
Date Accepted: Feb 25, 2022

The final, peer-reviewed published version of this preprint can be found here:

Big Data Health Care Platform With Multisource Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application

Wang M, Li S, Zheng T, Li N, Shi Q, Zhuo X, Ding R, Huang Y

Big Data Health Care Platform With Multisource Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application

JMIR Med Inform 2022;10(4):e36481

DOI: 10.2196/36481

PMID: 35416792

PMCID: 9047713

Construction of a Big Data Platform in Healthcare with Multi-source, Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application

  • Miye Wang; 
  • Sheyu Li; 
  • Tao Zheng; 
  • Nan Li; 
  • Qingke Shi; 
  • Xuejun Zhuo; 
  • Renxin Ding; 
  • Yong Huang

ABSTRACT

Background:

With the advent of data-intensive science, a full integration of big data science and healthcare will bring a cross-field revolution to the medical community in China. The concept of big data represents not only a technology but also a resource and a method. Big data is regarded as an important strategic resource both at the national level and at the medical institutional level, so great importance has been attached to the construction of a big data platform in healthcare.

Objective:

This study describes the development and implementation a big data platform in healthcare for a large hospital that overcomes the difficulties of integrating, calculating, storing, and governing multi-source, heterogeneous data in a standardized way, as well as ensuring healthcare data security. The platform can combine the operation systems’ data of all departments/sections in a hospital to form a massive, high-dimensional, high-quality healthcare database. This will enable the reutilization of electronic medical records (EMRs) and effectively tap into the value of data to fully support the hospital's clinical services, scientific research, and operations management.

Methods:

The project to build a big data platform at West China Hospital of Sichuan University (WCH-BDP) was launched in 2017. It has extracted, integrated, and governed the data from different departments/sections of the hospital going back to January 2008. A master–slave mode was implemented to realize real-time integration of multi-source, heterogeneous massive data, and a heterogeneous characteristic data storage and calculation environment that separates storage and calculation processes was built. A standardized healthcare data governance system and a scientific, closed-loop data security ecology were established. A data lineage–based model was improved for data quality control.

Results:

After 3 years of design, development, and testing, WCH-BDP was formally put online in November 2020. It has formed a massive multi-dimensional data resource database including more than 12.49 million patients, 75.67 million visits, and 8,475 data variables. Along with the hospital's operations, newly generated data are entered into the platform in real time. During the year-plus since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams. It enabled the data support model to shift from conventional manual extraction to self-service retrieval, which has reached 8,561 retrievals per month.

Conclusions:

The application of WCH-BDP has shown that this healthcare big data platform can successfully generate multi-source, heterogeneous data storage and computing power. By effectively governing massive multi-dimensional data gathered from multiple sources, WCH-BDP provides highly available data assets and thus has high application value in the healthcare field. WCH-BDP has made the utilization of EMR data in real-world research simpler and more efficient.


 Citation

Please cite as:

Wang M, Li S, Zheng T, Li N, Shi Q, Zhuo X, Ding R, Huang Y

Big Data Health Care Platform With Multisource Heterogeneous Data Integration and Massive High-Dimensional Data Governance for Large Hospitals: Design, Development, and Application

JMIR Med Inform 2022;10(4):e36481

DOI: 10.2196/36481

PMID: 35416792

PMCID: 9047713

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.