-
Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems
Authors:
Pierre Tholoniat,
Kelly Kostopoulou,
Peter McNeely,
Prabhpreet Singh Sodhi,
Anirudh Varanasi,
Benjamin Case,
Asaf Cidon,
Roxana Geambasu,
Mathias Lécuyer
Abstract:
With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs…
▽ More
With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection.
△ Less
Submitted 12 July, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Differentially Private Training of Mixture of Experts Models
Authors:
Pierre Tholoniat,
Huseyin A. Inan,
Janardhan Kulkarni,
Robert Sim
Abstract:
This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational an…
▽ More
This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational and privacy concerns. Our study addresses these issues by exploring the potential of MoE models, known for their computational efficiency, and the application of DP, a standard for privacy preservation. We present the first known attempt to train MoE models under the constraints of DP, addressing the unique challenges posed by their architecture and the complexities of DP integration. Our initial experimental studies demonstrate that MoE models can be effectively trained with DP, achieving performance that is competitive with their non-private counterparts. This initial study aims to provide valuable insights and ignite further research in the domain of privacy-preserving MoE models, softly laying the groundwork for prospective developments in this evolving field.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Turbo: Effective Caching in Differentially-Private Databases
Authors:
Kelly Kostopoulou,
Pierre Tholoniat,
Asaf Cidon,
Roxana Geambasu,
Mathias Lécuyer
Abstract:
Differentially-private (DP) databases allow for privacy-preserving analytics over sensitive datasets or data streams. In these systems, user privacy is a limited resource that must be conserved with each query. We propose Turbo, a novel, state-of-the-art caching layer for linear query workloads over DP databases. Turbo builds upon private multiplicative weights (PMW), a DP mechanism that is powerf…
▽ More
Differentially-private (DP) databases allow for privacy-preserving analytics over sensitive datasets or data streams. In these systems, user privacy is a limited resource that must be conserved with each query. We propose Turbo, a novel, state-of-the-art caching layer for linear query workloads over DP databases. Turbo builds upon private multiplicative weights (PMW), a DP mechanism that is powerful in theory but ineffective in practice, and transforms it into a highly-effective caching mechanism, PMW-Bypass, that uses prior query results obtained through an external DP mechanism to train a PMW to answer arbitrary future linear queries accurately and "for free" from a privacy perspective. Our experiments on public Covid19 and CitiBike datasets show that Turbo with PMW-Bypass conserves 1.7-15.9x more budget compared to vanilla PMW and simpler cache designs, a significant improvement. Moreover, Turbo provides support for range query workloads, such as timeseries or streams, where opportunities exist to further conserve privacy budget through DP parallel composition and warm-starting of PMW state. Our work provides a theoretical foundation and general system design for effective caching in DP databases.
△ Less
Submitted 23 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Packing Privacy Budget Efficiently
Authors:
Pierre Tholoniat,
Kelly Kostopoulou,
Mosharaf Chowdhury,
Asaf Cidon,
Roxana Geambasu,
Mathias Lécuyer,
Junfeng Yang
Abstract:
Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to trai…
▽ More
Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Holistic Verification of Blockchain Consensus
Authors:
Nathalie Bertrand,
Vincent Gramoli,
Igor Konnov,
Marijana Lazić,
Pierre Tholoniat,
Josef Widder
Abstract:
Blockchain has recently attracted the attention of the industry due, in part, to its ability to automate asset transfers. It requires distributed participants to reach a consensus on a block despite the presence of malicious (a.k.a. Byzantine) participants. Malicious participants exploit regularly weaknesses of these blockchain consensus algorithms, with sometimes devastating consequences. In fact…
▽ More
Blockchain has recently attracted the attention of the industry due, in part, to its ability to automate asset transfers. It requires distributed participants to reach a consensus on a block despite the presence of malicious (a.k.a. Byzantine) participants. Malicious participants exploit regularly weaknesses of these blockchain consensus algorithms, with sometimes devastating consequences. In fact, these weaknesses are quite common and are well illustrated by the flaws in the hand-written proofs of existing blockchain consensus protocols [63]. Paradoxically, until now, no blockchain consensus has been holistically verified using model checking.
In this paper, we remedy this paradox by model checking for the first time a blockchain consensus used in industry. We propose a holistic approach to verify the consensus algorithm of the Red Belly Blockchain [20], for any number $n$ of processes and any number $f<n/3$ of Byzantine processes. We decompose directly the algorithm pseudocode in two parts -- an inner broadcast algorithm and an outer decision algorithm -- each modelled as a threshold automaton [36], and we formalize their expected properties in linear-time temporal logic. We then automatically check the inner broadcasting algorithm, under a carefully identified fairness assumption. For the verification of the outer algorithm, we simplify the model of the inner algorithm by relying on its checked properties. Doing so, we formally verify not only the safety properties of the Red Belly Blockchain consensus but also its liveness in about 70 seconds.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Privacy Budget Scheduling
Authors:
Tao Luo,
Mingen Pan,
Pierre Tholoniat,
Asaf Cidon,
Roxana Geambasu,
Mathias Lécuyer
Abstract:
Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be care…
▽ More
Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models.
We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) -- a variant of the popular Dominant Resource Fairness (DRF) algorithm -- that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF.
We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over Rényi DP, a highly composable form of DP.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Feasibility of Cross-Chain Payment with Success Guarantees
Authors:
Rob van Glabbeek,
Vincent Gramoli,
Pierre Tholoniat
Abstract:
We consider the problem of cross-chain payment whereby customers of different escrows---implemented by a bank or a blockchain smart contract---successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success, or any form of progress. We demonstrate that it is possible to solve this problem when assuming synchrony, in th…
▽ More
We consider the problem of cross-chain payment whereby customers of different escrows---implemented by a bank or a blockchain smart contract---successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success, or any form of progress. We demonstrate that it is possible to solve this problem when assuming synchrony, in the sense that each message is guaranteed to arrive within a known amount of time, but impossible to solve without assuming synchrony. Yet, we solve a weaker variant of this problem, where success is conditional on the patience of the participants, without assuming synchrony, and in the presence of Byzantine failures. We also discuss the relation with the recently defined cross-chain deals.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing
Authors:
Théo Ryffel,
Pierre Tholoniat,
David Pointcheval,
Francis Bach
Abstract:
We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural…
▽ More
We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural networks such as ReLU, MaxPool and BatchNorm. For instance, we perform private comparison for ReLU operations with a single message of the size of the input during the online phase, and with preprocessing keys close to 4X smaller than previous work. Last, we propose an extension to support n-party private federated learning. We implement our framework as an extensible system on top of PyTorch that leverages CPU and GPU hardware acceleration for cryptographic and machine learning operations. We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet. We show that computation rather than communication is the main bottleneck and that using GPUs together with reduced key size is a promising solution to overcome this barrier.
△ Less
Submitted 28 October, 2021; v1 submitted 8 June, 2020;
originally announced June 2020.
-
Cross-Chain Payment Protocols with Success Guarantees
Authors:
Rob van Glabbeek,
Vincent Gramoli,
Pierre Tholoniat
Abstract:
In this paper, we consider the problem of cross-chain payment whereby customers of different escrows -- implemented by a bank or a blockchain smart contract -- successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success or any form of progress. We introduce a new specification formalism called Asynchronous Networks…
▽ More
In this paper, we consider the problem of cross-chain payment whereby customers of different escrows -- implemented by a bank or a blockchain smart contract -- successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success or any form of progress. We introduce a new specification formalism called Asynchronous Networks of Timed Automata (ANTA) to formalise such protocols.
We present the first cross-chain payment protocol that ensures termination in a bounded amount of time and works correctly in the presence of clock skew. We then demonstrate that it is impossible to solve this problem without assuming synchrony, in the sense that each message is guaranteed to arrive within a known amount of time. We also offer a protocol that solves an eventually terminating variant of this cross-chain payment problem without synchrony, and even in the presence of Byzantine failures.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Formal Verification of Blockchain Byzantine Fault Tolerance
Authors:
Pierre Tholoniat,
Vincent Gramoli
Abstract:
To implement a blockchain, the trend is now to integrate a non-trivial Byzantine fault tolerant consensus algorithm instead of the seminal idea of waiting to receive blocks to decide upon the longest branch. After a decade of existence, blockchains trade now large amounts of valuable assets and a simple disagreement could lead to disastrous losses. Unfortunately, Byzantine consensus solutions used…
▽ More
To implement a blockchain, the trend is now to integrate a non-trivial Byzantine fault tolerant consensus algorithm instead of the seminal idea of waiting to receive blocks to decide upon the longest branch. After a decade of existence, blockchains trade now large amounts of valuable assets and a simple disagreement could lead to disastrous losses. Unfortunately, Byzantine consensus solutions used in blockchains are at best proved correct "by hand" as we are not aware of any of them having been formally verified. In this paper, we propose two contributions: (i) we illustrate the severity of the problem by listing six vulnerabilities of blockchain consensus including two new counter-examples; (ii) we then formally verify two Byzantine fault tolerant components of Red Belly Blockchain using the ByMC model checker. First, we specify a simple broadcast primitive in 116 lines of code that is verified in 40 seconds on a 2-core Intel machine. Then, we specify a blockchain consensus algorithm in 276 lines of code that is verified in 17 minutes on a 64-core AMD machine using MPI. To conclude, we argue that it has now become both relatively simple and crucial to formally verify the correctness of blockchain consensus protocols.
△ Less
Submitted 14 October, 2019; v1 submitted 16 September, 2019;
originally announced September 2019.