Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 6, 2018
Date Accepted: Feb 15, 2019

The final, peer-reviewed published version of this preprint can be found here:

Privacy-Preserving Analysis of Distributed Biomedical Data: Designing Efficient and Secure Multiparty Computations Using Distributed Statistical Learning Theory

Dankar FK, Madathil N, Dankar SK, Boughorbel S

Privacy-Preserving Analysis of Distributed Biomedical Data: Designing Efficient and Secure Multiparty Computations Using Distributed Statistical Learning Theory

JMIR Med Inform 2019;7(2):e12702

DOI: 10.2196/12702

PMID: 31033449

PMCID: 6658266

Secure Multi-party Computations for Biomedical Data using Distributed Statistical Learning

  • Fida K. Dankar; 
  • Nisha Madathil; 
  • Samar K. Dankar; 
  • Sabri Boughorbel

ABSTRACT

Background:

biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world which raises many privacy, ethical, and legal concerns. Current de-identification methods are inadequate forcing privacy experts to explore different approaches to privacy protection. Secure Multiparty Computations (SMC) is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data, however it incurs heavy computation time and requires extensive communication between the involved parties.

Objective:

our goal in this paper is to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data-sharing without fear of privacy abuse.

Methods:

we introduce distributed statistical computing into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties’ sites independently and then combine these computations to form one estimator for the collective dataset. Thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy preserving model is demonstrated through a linear regression application.

Results:

our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results show no loss of accuracy (over non-secure regression) and very good performance (20 minutes for 100 million records)

Conclusions:

We used distributed statistical computing to securely calculate a linear regression model over multiple separate datasets. Our experiments show very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression.


 Citation

Please cite as:

Dankar FK, Madathil N, Dankar SK, Boughorbel S

Privacy-Preserving Analysis of Distributed Biomedical Data: Designing Efficient and Secure Multiparty Computations Using Distributed Statistical Learning Theory

JMIR Med Inform 2019;7(2):e12702

DOI: 10.2196/12702

PMID: 31033449

PMCID: 6658266

Per the author's request the PDF is not available.