JMIR Preprints #26914: Local differential privacy in medical domain to protect sensitive information: Algorithm Development and Real-World Validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Local differential privacy in medical domain to protect sensitive information: Algorithm Development and Real-World Validation

MinDong Sung;
Dongchul Cha;
Yu Rang Park

ABSTRACT

Background:

Privacy is of increasing interest in the present big data era, particularly regarding medical data. Specifically, differential privacy has emerged as the standard method for privacy-preserving data analysis and data publishing.

Objective:

We applied differential privacy to medical data with diverse parameters and checked the (i) feasibility of our algorithms with synthetic data and (ii) the balance between data privacy and utility, using machine learning techniques.

Methods:

All data were normalized to range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the categorical variables’ cardinality, we performed post-processing via discretization. The algorithm was evaluated using both synthetic and real-world data (eICU Collaborative Research Database). We evaluated the difference between the original data and perturbated data using misclassification rates and the mean squared error, for categorical data and continuous data, respectively. Further, we compared the performances of classification models that predict in-hospital mortality using real-world data.

Results:

The misclassification rate of categorical variables ranged between 0.49 and 0.85, when epsilon was 0.1, and it converged to 0 when epsilon was increased. When epsilon was between 102 and 103, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of continuous variables decreased as epsilon increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as epsilon increased. In particular, the accuracy of a random forest model developed from original data was 0.801, and it ranged from 0.757 to 0.81 when epsilon was 0.1 and 10,000.

Conclusions:

We applied local differential privacy to medical domain data, which are diverse and high-dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.

Citation

Please cite as:

Sung M, Cha D, Park YR

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

JMIR Med Inform 2021;9(11):e26914

DOI: 10.2196/26914

PMID: 34747711

PMCID: 8663640

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 4, 2021

Date Accepted: Sep 6, 2021

Local differential privacy in medical domain to protect sensitive information: Algorithm Development and Real-World Validation

ABSTRACT

Citation

Copyright