-
slytHErin: An Agile Framework for Encrypted Deep Neural Network Inference
Authors:
Francesco Intoci,
Sinem Sav,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantl…
▽ More
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, develo** HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantly depends on the deployment scenario and on leveraging the characteristics of modern HE schemes. Prior works on privacy-preserving PaaS focus solely on protecting the confidentiality of the client data uploaded to a remote model provider, e.g., a cloud offering a prediction API, and assume (or take advantage of the fact) that the model is held in plaintext. Furthermore, their aim is to either minimize the latency of the service by processing one sample at a time, or to maximize the number of samples processed per second, while processing a fixed (large) number of samples. In this work, we present slytHErin, an agile framework that enables privacy-preserving PaaS beyond the application scenarios considered in prior works. Thanks to its hybrid design leveraging HE and its multiparty variant (MHE), slytHErin enables novel PaaS scenarios by encrypting the data, the model or both. Moreover, slytHErin features a flexible input data packing approach that allows processing a batch of an arbitrary number of samples, and several computation optimizations that are model-and-setting-agnostic. slytHErin is implemented in Go and it allows end-users to perform encrypted PaaS on custom deep learning models comprising fully-connected, convolutional, and pooling layers, in a few lines of code and without having to worry about the cumbersome implementation and optimization concerns inherent to HE.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Scalable and Privacy-Preserving Federated Principal Component Analysis
Authors:
David Froelicher,
Hyunghoon Cho,
Manaswitha Edupalli,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Bonnie Berger,
Jean-Pierre Hubaux
Abstract:
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermedia…
▽ More
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize…
▽ More
Tree-based models are among the most efficient machine learning techniques for data mining nowadays due to their accuracy, interpretability, and simplicity. The recent orthogonal needs for more data and privacy protection call for collaborative privacy-preserving solutions. In this work, we survey the literature on distributed and privacy-preserving training of tree-based models and we systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model. We use this to identify the strengths and limitations of these works and provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
△ Less
Submitted 18 June, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Privacy-Preserving and Efficient Verification of the Outcome in Genome-Wide Association Studies
Authors:
Anisa Halimi,
Leonard Dervishi,
Erman Ayday,
Apostolos Pyrgelis,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux,
Xiaoqian Jiang,
Jaideep Vaidya
Abstract:
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy i…
▽ More
Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. Workflow systems model and record provenance describing the steps performed to obtain the final results of a computation. In this work, we propose a framework that verifies the correctness of the statistical test results that are conducted by a researcher while protecting individuals' privacy in the researcher's dataset. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another statistical study (using publicly available datasets) to distinguish between correct statistics and incorrect ones. We use case the proposed framework in the genome-wide association studies (GWAS), in which the goal is to identify highly associated point mutations (variants) with a given phenotype. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata in the GWAS use-case and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results.
△ Less
Submitted 7 November, 2022; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Revolutionizing Medical Data Sharing Using Advanced Privacy Enhancing Technologies: Technical, Legal and Ethical Synthesis
Authors:
James Scheibner,
Jean Louis Raisaro,
Juan Ramón Troncoso-Pastoriza,
Marcello Ienca,
Jacques Fellay,
Effy Vayena,
Jean-Pierre Hubaux
Abstract:
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data usability. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely on bespoke data sharing contracts. These contracts increase the inefficiency of data sharing and may disincentivize impor…
▽ More
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data usability. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely on bespoke data sharing contracts. These contracts increase the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between two novel advanced privacy enhancing technologies (PETs): Homomorphic Encryption and Secure Multiparty Computation (defined together as Multiparty Homomorphic Encryption or MHE). These PETs provide a mathematical guarantee of privacy, with MHE providing a performance advantage over separately using HE or SMC. We argue MHE fulfills legal requirements for medical data sharing under the General Data Protection Regulation (GDPR) which has set a global benchmark for data protection. Specifically, the data processed and shared using MHE can be considered anonymized data. We explain how MHE can reduce the reliance on customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research whilst offering additional incentives for healthcare and research institutes to employ common data interoperability standards.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
POSEIDON: Privacy-Preserving Federated Neural Network Learning
Authors:
Sinem Sav,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
David Froelicher,
Jean-Philippe Bossuat,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation…
▽ More
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrap** operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
△ Less
Submitted 8 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Privacy and Integrity Preserving Computations with CRISP
Authors:
Sylvain Chatel,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verify…
▽ More
In the digital era, users share their personal data with service providers to obtain some utility, e.g., access to high-quality services. Yet, the induced information flows raise privacy and integrity concerns. Consequently, cautious users may want to protect their privacy by minimizing the amount of information they disclose to curious service providers. Service providers are interested in verifying the integrity of the users' data to improve their services and obtain useful knowledge for their business. In this work, we present a generic solution to the trade-off between privacy, integrity, and utility, by achieving authenticity verification of data that has been encrypted for offloading to service providers. Based on lattice-based homomorphic encryption and commitments, as well as zero-knowledge proofs, our construction enables a service provider to process and reuse third-party signed data in a privacy-friendly manner with integrity guarantees. We evaluate our solution on different use cases such as smart-metering, disease susceptibility, and location-based activity tracking, thus showing its versatility. Our solution achieves broad generality, quantum-resistance, and relaxes some assumptions of state-of-the-art solutions without affecting performance.
△ Less
Submitted 12 January, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Scalable Privacy-Preserving Distributed Learning
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Apostolos Pyrgelis,
Sinem Sav,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the e…
▽ More
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.
△ Less
Submitted 14 July, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Drynx: Decentralized, Secure, Verifiable System for Statistical Queries and Machine Learning on Distributed Datasets
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we propose Drynx, a decentralized system for privacy-con…
▽ More
Data sharing has become of primary importance in many domains such as big-data analytics, economics and medical research, but remains difficult to achieve when the data are sensitive. In fact, sharing personal information requires individuals' unconditional consent or is often simply forbidden for privacy and security reasons. In this paper, we propose Drynx, a decentralized system for privacy-conscious statistical analysis on distributed datasets. Drynx relies on a set of computing nodes to enable the computation of statistics such as standard deviation or extrema, and the training and evaluation of machine-learning models on sensitive and distributed data. To ensure data confidentiality and the privacy of the data providers, Drynx combines interactive protocols, homomorphic encryption, zero-knowledge proofs of correctness, and differential privacy. It enables an efficient and decentralized verification of the input data and of all the system's computations thus provides auditability in a strong adversarial model in which no entity has to be individually trusted. Drynx is highly modular, dynamic and parallelizable. Our evaluation shows that it enables the training of a logistic regression model on a dataset (12 features and 600,000 records) distributed among 12 data providers in less than 2 seconds. The computations are distributed among 6 computing nodes, and Drynx enables the verification of the query execution's correctness in less than 22 seconds.
△ Less
Submitted 27 February, 2020; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Multivariate Cryptosystems for Secure Processing of Multidimensional Signals
Authors:
Alberto Pedrouzo-Ulloa,
Juan Ramón Troncoso-Pastoriza,
Fernando Pérez-González
Abstract:
Multidimensional signals like 2-D and 3-D images or videos are inherently sensitive signals which require privacy-preserving solutions when processed in untrustworthy environments, but their efficient encrypted processing is particularly challenging due to their structure, dimensionality and size. This work introduces a new cryptographic hard problem denoted m-RLWE (multivariate Ring Learning with…
▽ More
Multidimensional signals like 2-D and 3-D images or videos are inherently sensitive signals which require privacy-preserving solutions when processed in untrustworthy environments, but their efficient encrypted processing is particularly challenging due to their structure, dimensionality and size. This work introduces a new cryptographic hard problem denoted m-RLWE (multivariate Ring Learning with Errors) which generalizes RLWE, and proposes several relinearization-based techniques to efficiently convert signals with different structures and dimensionalities. The proposed hard problem and the developed techniques give support to lattice cryptosystems that enable encrypted processing of multidimensional signals and efficient conversion between different structures. We show an example cryptosystem and prove that it outperforms its RLWE counterpart in terms of security against basis-reduction attacks, efficiency and cipher expansion for encrypted image processing, and we exemplify some of the proposed transformation techniques in critical and ubiquitous block-based processing applications
△ Less
Submitted 3 December, 2017;
originally announced December 2017.
-
On Ring Learning with Errors over the Tensor Product of Number Fields
Authors:
Alberto Pedrouzo-Ulloa,
Juan Ramón Troncoso-Pastoriza,
Fernando Pérez-González
Abstract:
The "Ring Learning with Errors" (RLWE) problem was formulated as a variant of the "Learning with Errors" (LWE) problem, with the purpose of taking advantage of an additional algebraic structure in the underlying considered lattices; this enables improvements on the efficiency and cipher expansion on those cryptographic applications which were previously based on the LWE problem. In Eurocrypt 2010,…
▽ More
The "Ring Learning with Errors" (RLWE) problem was formulated as a variant of the "Learning with Errors" (LWE) problem, with the purpose of taking advantage of an additional algebraic structure in the underlying considered lattices; this enables improvements on the efficiency and cipher expansion on those cryptographic applications which were previously based on the LWE problem. In Eurocrypt 2010, Lyubashevsky et al. introduced this hardness problem and showed its relation to some known hardness problems over lattices with a special structure. In this work, we generalize these results and the problems presented by Lyubashevsky et al. to the more general case of multivariate rings, highlighting the main differences with respect to the security proof for the RLWE counterpart. This hardness problem is denoted as "Multivariate Ring Learning with Errors" ($m$-RLWE or multivariate RLWE) and we show its relation to hardness problems over the tensor product of ideal lattices. Additionally, the $m$-RLWE problem is more adequate than its univariate version for cryptographic applications dealing with multidimensional structures.
△ Less
Submitted 1 February, 2018; v1 submitted 18 July, 2016;
originally announced July 2016.
-
Number Theoretic Transforms for Secure Signal Processing
Authors:
Alberto Pedrouzo-Ulloa,
Juan Ramón Troncoso-Pastoriza,
Fernando Pérez-González
Abstract:
Multimedia contents are inherently sensitive signals that must be protected whenever they are outsourced to an untrusted environment. This problem becomes a challenge when the untrusted environment must perform some processing on the sensitive signals; a paradigmatic example is Cloud-based signal processing services. Approaches based on Secure Signal Processing (SSP) address this challenge by prop…
▽ More
Multimedia contents are inherently sensitive signals that must be protected whenever they are outsourced to an untrusted environment. This problem becomes a challenge when the untrusted environment must perform some processing on the sensitive signals; a paradigmatic example is Cloud-based signal processing services. Approaches based on Secure Signal Processing (SSP) address this challenge by proposing novel mechanisms for signal processing in the encrypted domain and interactive secure protocols to achieve the goal of protecting signals without disclosing the sensitive information they convey.
This work presents a novel and comprehensive set of approaches and primitives to efficiently process signals in an encrypted form, by using Number Theoretic Transforms (NTTs) in innovative ways. This usage of NTTs paired with appropriate signal pre- and post-coding enables a whole range of easily composable signal processing operations comprising, among others, filtering, generalized convolutions, matrix-based processing or error correcting codes. The main focus is on unattended processing, in which no interaction from the client is needed; for implementation purposes, efficient lattice-based somewhat homomorphic cryptosystems are used. We exemplify these approaches and evaluate their performance and accuracy, proving that the proposed framework opens up a wide variety of new applications for secured outsourced-processing of multimedia contents.
△ Less
Submitted 29 January, 2018; v1 submitted 18 July, 2016;
originally announced July 2016.