Skip to content

Latest commit

 

History

History
152 lines (92 loc) · 23.4 KB

homomorphic_learning.md

File metadata and controls

152 lines (92 loc) · 23.4 KB

Homomorphic Encryption

Introduction

Sharing private data with third parties, such as cloud services or other companies, is a challenge due to privacy regulations such as GDPR and CCPA. Failure to comply with these regulations can lead to serious fines and damage business reputations. Traditional encryption method provides an efficient way to store private data on the cloud in an encrypted form. However, to perform computations on data encrypted with these methods, businesses need to decrypt the data on the cloud, which can lead to security problems.

Homomorphic encryption (HE) enables businesses to share private data with third parties to get computational service securely. With HE, the cloud service or the outsourcing company has access only to encrypted data and performs computations on it. These services then return the encrypted result to the owner who can decrypt it with the private key. The concept of HE indicates that operations can be performed on encrypted data without the need to share the secret key needed to decrypt the data with the cloud provider. If decryption is carried on the result of any operation, it will be same as if calculations were done on the raw data.

Suppose we consider data x1, x2, ..., _xf and we successfully encrypt them to be Enc(x1), Enc(x2), ..., Enc(xf). The homomorphic encryption system will allow us to efficiently compute a ciphertext function that will encrypt f(x1,x2,...,xf) for any computable function f. This was first investigated by Rivest et al. (1978)

Homomorphic Encryption

Types of Homomorphic Encryption

HE schemes are classified depending on the possible circuits they can evaluate on encrypted data, differences lie in the available gates to use, and the depths of those circuits. In other words, depending on the possible functions, f, you can compute and how many operations can be chained on a ciphertext, HE schemes can be classified into three main types:

  • Partially-HE (PHE) : This type of scheme can evaluate any circuit composed of a single type of gate, addition or multiplication, but never both. It doesn’t restrict neither the size nor depth of the circuit. This type is well suited for the applications that only need to perform either addition or multiplication on encrypted data. The RSA cryptosystem is an example of a PHE that allows an unbounded number of modular multiplications.

  • Somewhat-HE (SHE) : This type of scheme can evaluate circuits composed of addition and multiplication gates, but with the restriction on the depth. SHE is useful for evaluating low degree polynomials up to some level, however we sometimes need to evaluate circuits of arbitrary depth.

  • Fully-HE (FHE) :A concept first conceived by Rivest et al. (1978) but it remained unrealized until Craig Gentry presented a first feasible FHE scheme in 2009. This encryption scheme can evaluate circuits composed of both addition and multiplication gates. In contrast to SHE, FHE has unlimited circuit depths which makes It suitable for deep learning applications. Although many FHE schemes have been proposed during the last decade, it has been difficult to use them in practice. In the linked paper, Craig built FHE on top of SHE by using what he called bootstrapping. Although FHE being the most powerful type, in order to put such a scheme into practice, one needs to consider other factors as well like the cost of evaluation, size of ciphertext, domain of plain text (integer or real numbers), and the cost of bootstrapping. FHE has gone from theoretical breakthrough to practical deployment, dropping the initial 30 minutes required to compute the multiplication between two encrypted values down to less than 20 milliseconds. Even then, FHE multiplication is still around seven orders of magnitude slower than native CPU integer multiplication instructions.

HE Schemes

As mentioned earlier, there are various types of HE but there are various HE scemes that can also be categorized into the partial and fully homomorphic. Here are some schemes that are widely known :

  • Pallier - The Paillier cryptosystem, invented by Pascal Paillier in 1999, is a partial homomorphic encryption scheme which allows two types of computation:

    • addition of two ciphertexts
    • multiplication of a ciphertext by a plaintext number

The problem of computing n-th residue classes is believed to be computationally difficult. The decisional composite residuosity assumption is the intractability hypothesis upon which this cryptosystem is based. In short, it achieves HE with respect to addition by encrypting the message as an exponent of the public key. This way when multiplying two ciphertexts encrypted with the same key the result is a valid encryption of the sum.

  • CKKS - Cheon-Kim-Kim-Song (CKKS) scheme, enables to perform computations on vectors of complex values (thus real values as well). CKKS, a.k.a. Homomorphic Encryption for Arithmetic of Approximate Numbers (HEAAN), was proposed to offer homomorphic computation on real numbers.

The main idea is to consider the noise, a.k.a. error e , which is introduced in Ring-Learning with Errors (Ring-LWE) based FHE schemes for security purposes, as part of the message m (which we call here payload) we want to encrypt. The payload and the noise are combined to generate the plaintext (m + e) that we encrypt.

A message m, a vector of values on which we want to perform certain computation, is first encoded into a plaintext polynomial p(X) and then encrypted using a public key.

Once the message m is encrypted into c, a couple of polynomials, CKKS provides several operations that can be performed on it, such as addition, multiplication and rotation.

If we denote a function by f, which is a composition of homomorphic operations, then decrypting c’ = f(c) with the secret key will yield p’ = f(p). Therefore once we decode it, we will get m = f(m).

The central idea to implement a homomorphic encryption scheme is to have homomorphic properties on the encoder, decoder, encryptor and decryptor. This way, operations on ciphertext will be decrypted and decoded correctly and provide outputs as if operations were done directly on plaintext.

image

CKKS exploits the rich structure of integer polynomial rings for its plaintext and ciphertext spaces. Nonetheless, data comes more often in the form of vectors than in polynomials.

  • BFV - Brakerski,Fan and Vercauteren developed the scheme known as BFV encryption scheme and is a SHE (somewhat encryption scheme) and it widely researched topic in cryptography. Any SHE scheme can be a BFV family of schemes if it consists of these three algorithms :
  1. Key generation algorithm - It takes the security parameter k as an input, and outputs a public key pk and a secret key sk.
  2. Encryption algorithm - It takes the message m ∈ M, a public key pk, and a randomness r ∈ R as inputs, and outputs a ciphertext c ∈ C
  3. Decryption algorithm - It takes a ciphertext c ∈ C, a secret key sk, and an integer i ∈ Z+ as inputs, and outputs a message m ∈ M.

There are a few Homomorphic Operations that can be performed by using the BFV scheme and are as follows:

  1. Additive homomorphism - In the BFV scheme, the standard polynomial addition algorithm implements the additive homomorphism.
  2. Multiplicative Homomorphism - In the BFV scheme, the standard polynomial multiplication algorithm implements the multiplicative homomorphism. The multiplicative homomorphism is sometimes called the “plaintext multiplication” or the “scalar multiplication”.
  3. Relinearization - Relinearization is a homomorphic operation used in the BFV scheme to reduce the number of ciphertexts generated by homomorphic operations. In the BFV scheme, the relinearization homomorphism is implemented by the standard polynomial addition and multiplication algorithms.
  4. Rotation - Rotation is a homomorphic operation used in the BFV scheme to implement the power operation efficiently. It can be used to implement a large class of homomorphic operations on encrypted data. In the BFV scheme, the rotation homomorphism is implemented by the standard polynomial multiplication algorithm. The rotation is sometimes called the “power operation”.
  • GSW - GSW was developed by Craig Gentry, Amit Sahai, and Brent Waters. GSW uses LWE applied to linear algebra where the messages are encrypted as eigenvalues of matrices which have a common eigenvector. In previous schemes, the homomorphic evaluator needs to obtain the user’s “evaluation key”, which consists of a chain of encrypted secret keys. This scheme has no evaluation key. The evaluator can do homomorphic operations without knowing the user’s public key at all, except for some basic parameters.

The ciphertexts in GSW are square matrices, and homomorphic additions and multiplications are just matrix additions and multiplications, respectively. Therefore, ciphertext dimension always keeps constant and key switching is no longer necessary. Scale-invariance can also be achieved in GSW via the flatten technique; thus modulus switching is also no longer necessary. GSW is simpler and more natural than previous LWE-based FHE schemes. However, matrix multiplication still brings about a high computational cost.

  • DM - Ducas and Miccianico proposed a new FHE scheme with homomorphic NOT AND (NAND) gates, which is known as the DM scheme. Homomorphic operations in DM are just ciphertext vector additions, which are very simple operations. However, ciphertexts in DM need to be refreshed after each homomorphic operation, which becomes a bottleneck for the overall efficiency. Although GSW and DM are conceptually simpler than most other FHE schemes, both of them still suffer from efficiency bottlenecks.

  • BGV - BGV was developed by Zvika Brakerski,Craig Gentry and Vinod Vaikunathan. It uses modulus switching which is an alternate way to manage noise. BGV defines plaintext and ciphertext rings. The encryption procedure maps input plaintext elements of the plaintext ring to ciphertext elements of the ciphertext ring.

In this scheme, the encryption is done by concealing the plaintext message with an almost random mask that is computed using the public key (or the secret key in the symmetric-key operation mode). The output of encryption is typically two elements of the ciphertext space; the first of which contains the masked plaintext data whereas the second contains auxiliary information that can be used in the decryption procedure. Decryption uses the secret key and the auxiliary information in the ciphertext to remove the mask and recover the plaintext message.

Applications

Homomorphic Encryption has various advantages and can be applied to application starting from education and healthcare to smart electric grids and MLaaS (Machine Learning as a service). Basically it can be applied to anywhere where the input data privacy is the most important thing. Anyway the data usage is super complex due to the regulations, privacy concerns and the vitality of the data. Industries and sectors working with non-intrusive and privacy conserving security - detection of secturing breaches etc. Thanks to encryption, all these opertaions, parameters are kept private but often not easily reversible.

Craig Gentry mentioned in his thesis that Full Homomorphic encryption has numerous applications. For example, it enables private queries to a search engine - the user submits an encrypted query and the search engine computes a succinct encrypted answer without ever looking at the query in the clear. It also enables searching an encrypted data - a user stores encrypted files on a remote file server and can later have the server retrieve only files that, when decrypted, satisfy the Boolean constraint, even though the server cannot decrypt the files on its own. More broadly, FHE improves the efficiency of secure multi party computation.

Researchers have already identified several practical applications of FHE, some of which are discussed below:

  • Security Data Stored in the Cloud : Using homomorphic encryption, you can secure the data that you stored in the cloud while also retaining the ability to calculate and search ciphered information that you can later decrypt without compromising the integrity of the data as a whole.

  • Enabling Data Analytics in Regulated Industries : HE allows data to be encrypted and outsourced to commercial cloud environments for research and data sharing purposes while protecting user or patient data privacy. It can be used for businesses and organizations across a variety of industries including financial services, retail, information technology, and healthcare to allow people to use data without seeing its unencrypted values. Examples include predictive analysis for medical data without putting data privacy at risk, preserving customer privacy in personalized advertising, financial privacy for functions like stock price prediction algorithms, and forensic image recognition.

  • Improving Election Security and Transparency : Researchers are working on how to use homomorphic encryption to make democratic elections more secure and transparent. For example, the Paillier encryption scheme, which uses additionoperations, would be best suited for voting-related applications because it allows users to add up various values in an unbaised way while keeping their values private. This technology could not only protect data from manipulation, it could allow it to be independently verified by authorized third parties.

Existing Tools, Libraries and Research

Technological companies, (like Microsoft, Google), have initiated programs to advance homomorphic encryption to make it more universally available and user friendly. Microsoft, has created SEAL (Simply Encrypted Arithmatic Library), a set of encryption libraries that allow computations to be performed directly on encrypted data. Companies can use SEAL to create platforms to perform data analytics on information while it's still encrypted and the owners of the data never have to share their encryption keys to anyone else. Google, with its open-source cryptographic tool, Private Join and Compute, is focussed on analyzing data in its encrypted form, with only the insights derived from the analysis visible, and not the underlying data itself.

Efficient implementations of FHE are mostly written in high performance languages like C++, posing a high entry barrier to novice users. However, for most data science and machine learning applications, Python is a standard language. This can be achieved with the implementation of Python wrapper function. Pyfhel provides a Python wrapper for the Microsoft Seal library, extendable to other C++ libraries, that goes beyond merely exposing the underlying API by adding a carefully designed abstraction layer that feels at home in Python. Pyfhel offers:

  • One-click installation, including the underlying libraries.
  • A high-level Python first abstraction layer that makes working with FHE significantly easier.
  • High-level API's for low level functionalities not generally exposed.

In additon to Pyfhel, their exists a plethora of Python wrappers for FHE libraries. Most rely on automatic C++ wrapping tools like pybind11 or Boost.

  • Python-paillier open-source implementation in python of the Paillier scheme.
  • PySEAL is no-longer maintained pybind11-based wrapper. Many require the user to compile underlying libraries themselves, using the Unix-only toolchain, like more recent SEAL-Python.
  • TenSEAL which appeared several years after the initial release of Pyfhel shows the most promise. It is pybind11 based and features a one-click setup, but focussed mostly on high level Machine Learning and Tensor operations.
  • PySyft is OpenMined's open source stack that provides secure and private Data Science in Python. Syft decouples private data from model training, using techniques like Federated Learning, Differential Privacy, and Encrypted Computation. This is done with a numpy-like interface and integration with Deep Learning frameworks, so that you as a Data Scientist can maintain your current workflow while using these new privacy-enhancing techniques.
  • Other approaches (e.g., pyFHE) implement schemes directly in Python, at the cost of significantly slower operations.

A curated list of amazing homomorphic encryption libraries, software and resources can be found here.

We will be performing a comparative study between these opensource python tools and libraries and based on these results, we will create a proof of concept use case that demonstrates a basic homomorphic learning example.

Federated Learning

Similar to Homomorphic Learning, Federated Learning caters to the problem of not being able to centralize the training data due to data privacy, secrecy, regulatory compliance and heavy volume and loads of data.

IBM Federated Learning

IBM Federated Learning supports multiple machine learning models out-of-the-box, including:

  • Models written in Keras, PyTorch and TensorFlow
  • Linear classifiers/regressions (with regularizer): logistic regression, linear SVM, ridge regression and more
  • Decision Tree ID3
  • Deep Reinforcement Learning algorithms including DQN, DDPG, PPO and more
  • Naïve Bayes

IBM FL is part of Watson Studio and is available as a service on the cloud here, and you may also use it as part of your Cloud Pak for Data installation. The community edition is a stand-alone library that is suited for academic users, researchers, not for production or commercial use. We will be trying this out as well as a part of tool-evaluation study.

Limitations and Conclusions

Homomorphic encryption is a very exciting subject with a tremendous potential to disrupt the landscape of online privacy and AI evolution. The urgent need for such a solution is apparant and some of the first use cases has been implemented. Between slow computation speed or accuracy problems, FHE remains commercially infeasible for computationally-heavy applications. There is certainly much progress to be made and many more to come just around the corner.

Besides being computationally expensive Homomorphic encryption limits the set of operations one can perform on the data. For instance operations like division, inverse multiplication etc are not feasible either.

References

  1. A brief history on Homomorphic learning: A privacy-focused approach to machine learning
  2. https://paperswithcode.com/paper/fully-homomorphically-encrypted-deep-learning
  3. https://towardsdatascience.com/homomorphic-encryption-intro-part-1-overview-and-use-cases-a601adcff06c
  4. Automated Exploration of Homomorphic Encryption Scheme Input Parameters
  5. https://avalonlibrary.net/ebooks/David%20Kahn%20-%20The%20Codebreakers.pdf
  6. Multiparty Homomorphic Encryption
  7. Systematic Review on Fully Homomorphic Encryption Scheme and Its Application | SpringerLink
  8. Privacy Preserving Multi-party Machine Learning with Homomorphic Encryption
  9. [https://www.mdpi.com/2410-387X/6/3/34/htm[(https://www.mdpi.com/2410-387X/6/3/34/htm)
  10. Homomorphic Encryption and Federated Learning based Privacy-Preserving CNN Training: COVID-19 Detection Use-Case
  11. Fully Homomorphic Encryption Using Ideal Lattices
  12. Federated Learning by IBM
  13. Openminded Blog
  14. Paillier Cryptosystem
  15. Paillier Scheme
  16. CKKS blog and diagram
  17. BGV - Fully Homomorphic Encryption without Bootstrapping
  18. A More Efficient Fully Homomorphic Encryption Scheme Based on GSW and DM Schemes