Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 7, 2020
Date Accepted: Mar 3, 2021

The final, peer-reviewed published version of this preprint can be found here:

Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies

Park JA, Sung MD, Kim HH, Park YR

Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies

JMIR Med Inform 2021;9(4):e21043

DOI: 10.2196/21043

PMID: 33818396

PMCID: 8056295

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Weight-based Framework for Predictive Modeling of Multiple Databases with Non-iterative Communication without Data Sharing: Privacy-protecting Analytic Method for Multi-institutional Studies

  • Ji Ae Park; 
  • Min Dong Sung; 
  • Ho Heon Kim; 
  • Yu Rang Park

ABSTRACT

Background:

Securing the representativeness of the study population is crucial in biomedical research because it can increase the generalizability of a study. In this respect, using multi-institutional data has great advantages in medicine. However, it is difficult to combine data physically because the confidential nature of biomedical data causes privacy issues. Therefore, to use multi-institution medical data for research, a methodological approach is needed to build a model without sharing data between institutions.

Objective:

The objective of our study is to build an integrated predictive model of multi-institutional data, which is not require iterative communication between institutions, to increase generalizability of the model under privacy-preserving without sharing patient-level data.

Methods:

The weight-based integrated model (WIM) generates a weight for each institutional model and builds an integrated model for multi-institutional data based on the weight. We performed two simulations to show weight’s characteristics and to decide the number of repetitions of the weight for obtaining stable the weight. And, we conducted an experiment using real multi-institutional data to verify the developed WIM. It selected 10 hospitals (a total of 2,845 ICU stays) from the eICU Collaborative Research Database for predicting ICU mortality with 11 features. To evaluate validity of our model compared to centralized model, which was built by combining all the data of 10 hospitals, we used proportional overlap (0.5 or less indicates a significant difference at a significance level of 0.05; 2 indicates two CIs overlapping completely). Standard and firth logistic regression models were applied for two simulations and the experiment.

Results:

As results of simulations, we showed that the weight of each institution is determined by two factors, the data size of each institution and how well each institutional model fits into the overall institutional data, and that it is necessary to repeatedly generate 200 weights per institution. In the experiment, the estimated AUC and 95% CIs were 81.36% (79.37–83.36%) and 81.95% (80.03–83.87%) in the centralized model and WIM, respectively. The proportion of overlap of the CIs for AUC in both WIM and the centralized model was approximately 1.70. The proportion of overlap of the 11 estimated ORs was over 1, except for one case.

Conclusions:

In the experiment using real multi-institutional data, our model showed the similar results as the centralized model without iterative communication between institutions. Also, WIM provided a weighted average model by integrating 10 models overfitted or underfitted compared to the centralized model. WIM will provide an efficient distributed research algorithm in that it increases the generalizability of the model and does not iterative communication.


 Citation

Please cite as:

Park JA, Sung MD, Kim HH, Park YR

Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies

JMIR Med Inform 2021;9(4):e21043

DOI: 10.2196/21043

PMID: 33818396

PMCID: 8056295

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.