Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 4, 2021
Date Accepted: Sep 6, 2021

The final, peer-reviewed published version of this preprint can be found here:

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

Sung M, Cha D, Park YR

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

JMIR Med Inform 2021;9(11):e26914

DOI: 10.2196/26914

PMID: 34747711

PMCID: 8663640

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Local differential privacy in medical domain to protect sensitive information: Algorithm Development and Real-World Validation

  • MinDong Sung; 
  • Dongchul Cha; 
  • Yu Rang Park

ABSTRACT

Background:

Privacy is of increasing interest in the present big data era, particularly regarding medical data. Specifically, differential privacy has emerged as the standard method for privacy-preserving data analysis and data publishing.

Objective:

We applied differential privacy to medical data with diverse parameters and checked the (i) feasibility of our algorithms with synthetic data and (ii) the balance between data privacy and utility, using machine learning techniques.

Methods:

All data were normalized to range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the categorical variables’ cardinality, we performed post-processing via discretization. The algorithm was evaluated using both synthetic and real-world data (eICU Collaborative Research Database). We evaluated the difference between the original data and perturbated data using misclassification rates and the mean squared error, for categorical data and continuous data, respectively. Further, we compared the performances of classification models that predict in-hospital mortality using real-world data.

Results:

The misclassification rate of categorical variables ranged between 0.49 and 0.85, when epsilon was 0.1, and it converged to 0 when epsilon was increased. When epsilon was between 102 and 103, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of continuous variables decreased as epsilon increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as epsilon increased. In particular, the accuracy of a random forest model developed from original data was 0.801, and it ranged from 0.757 to 0.81 when epsilon was 0.1 and 10,000.

Conclusions:

We applied local differential privacy to medical domain data, which are diverse and high-dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.


 Citation

Please cite as:

Sung M, Cha D, Park YR

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

JMIR Med Inform 2021;9(11):e26914

DOI: 10.2196/26914

PMID: 34747711

PMCID: 8663640

The author of this paper has made a PDF available, but requires the user to login, or create an account.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.