Currently submitted to: Journal of Medical Internet Research
Date Submitted: Aug 22, 2017
Open Peer Review Period: Aug 22, 2017 - Oct 17, 2017
Privacy-preserving logistic regression based on homomorphic encryption
Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run analysis without ever decrypting the data to preserve the privacy. Homomorphic encryption technique is a perfect match for secure data outsourcing but it is a very challenging task to support real-world machine learning tasks. Existing framework can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis.
The aim of this study is to give a practical support to the mainstream learning models (e.g., logistic regression).
We innovated on: (1) a novel homomorphic encryption scheme optimized for real numbers computation, (2) the least squares approximation of the logistic function for accuracy and efficiency (i.e., reduce computation cost), and (3) new packing and parallelization techniques.
Using real world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took about 114 minutes to obtain the model parameter from homomorphically encrypted training model of Edinburgh dataset. In addition, it could give quite correct predictions on the testing dataset.
We present the first homomorphically encrypted logistic regression model based on the critical observation that a precision loss of classification models is sufficiently small so that the decision plan stays still.