Search | arXiv e-print repository

Alistair: Efficient On-device Budgeting for Differentially-Private Ad-Measurement Systems

Authors: Pierre Tholoniat, Kelly Kostopoulou, Peter McNeely, Prabhpreet Singh Sodhi, Anirudh Varanasi, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer

Abstract: With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs… ▽ More With the impending removal of third-party cookies from major browsers and the introduction of new privacy-preserving advertising APIs, the research community has a timely opportunity to assist industry in qualitatively improving the Web's privacy. This paper discusses our efforts, within a W3C community group, to enhance existing privacy-preserving advertising measurement APIs. We analyze designs from Google, Apple, Meta and Mozilla, and augment them with a more rigorous and efficient differential privacy (DP) budgeting component. Our approach, called Alistair, enforces well-defined DP guarantees and enables advertisers to conduct more private measurement queries accurately. By framing the privacy guarantee in terms of an individual form of DP, we can make DP budgeting more efficient than in current systems that use a traditional DP definition. We incorporate Alistair into Chrome and evaluate it on microbenchmarks and advertising datasets. Across all workloads, Alistair significantly outperforms baselines in enabling more advertising measurements under comparable DP protection. △ Less

Submitted 12 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: added section 3.3 Algorithm

arXiv:2402.07334 [pdf, other]

Differentially Private Training of Mixture of Experts Models

Authors: Pierre Tholoniat, Huseyin A. Inan, Janardhan Kulkarni, Robert Sim

Abstract: This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational an… ▽ More This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing. As Large Language Models (LLMs) scale to billions of parameters, leveraging expansive datasets, they exhibit enhanced linguistic capabilities and emergent abilities. However, this growth raises significant computational and privacy concerns. Our study addresses these issues by exploring the potential of MoE models, known for their computational efficiency, and the application of DP, a standard for privacy preservation. We present the first known attempt to train MoE models under the constraints of DP, addressing the unique challenges posed by their architecture and the complexities of DP integration. Our initial experimental studies demonstrate that MoE models can be effectively trained with DP, achieving performance that is competitive with their non-private counterparts. This initial study aims to provide valuable insights and ignite further research in the domain of privacy-preserving MoE models, softly laying the groundwork for prospective developments in this evolving field. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: Preliminary work presented as a poster at the 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI 24)

arXiv:2306.16163 [pdf, other]

doi 10.1145/3600006.3613174

Turbo: Effective Caching in Differentially-Private Databases

Authors: Kelly Kostopoulou, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer

Abstract: Differentially-private (DP) databases allow for privacy-preserving analytics over sensitive datasets or data streams. In these systems, user privacy is a limited resource that must be conserved with each query. We propose Turbo, a novel, state-of-the-art caching layer for linear query workloads over DP databases. Turbo builds upon private multiplicative weights (PMW), a DP mechanism that is powerf… ▽ More Differentially-private (DP) databases allow for privacy-preserving analytics over sensitive datasets or data streams. In these systems, user privacy is a limited resource that must be conserved with each query. We propose Turbo, a novel, state-of-the-art caching layer for linear query workloads over DP databases. Turbo builds upon private multiplicative weights (PMW), a DP mechanism that is powerful in theory but ineffective in practice, and transforms it into a highly-effective caching mechanism, PMW-Bypass, that uses prior query results obtained through an external DP mechanism to train a PMW to answer arbitrary future linear queries accurately and "for free" from a privacy perspective. Our experiments on public Covid19 and CitiBike datasets show that Turbo with PMW-Bypass conserves 1.7-15.9x more budget compared to vanilla PMW and simpler cache designs, a significant improvement. Moreover, Turbo provides support for range query workloads, such as timeseries or streams, where opportunities exist to further conserve privacy budget through DP parallel composition and warm-starting of PMW state. Our work provides a theoretical foundation and general system design for effective caching in DP databases. △ Less

Submitted 23 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: Extended version of a paper presented at the 29th ACM Symposium on Operating Systems Principles (SOSP '23)

arXiv:2212.13228 [pdf, other]

Packing Privacy Budget Efficiently

Authors: Pierre Tholoniat, Kelly Kostopoulou, Mosharaf Chowdhury, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, Junfeng Yang

Abstract: Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to trai… ▽ More Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2206.04489 [pdf, ps, other]

Holistic Verification of Blockchain Consensus

Authors: Nathalie Bertrand, Vincent Gramoli, Igor Konnov, Marijana Lazić, Pierre Tholoniat, Josef Widder

Abstract: Blockchain has recently attracted the attention of the industry due, in part, to its ability to automate asset transfers. It requires distributed participants to reach a consensus on a block despite the presence of malicious (a.k.a. Byzantine) participants. Malicious participants exploit regularly weaknesses of these blockchain consensus algorithms, with sometimes devastating consequences. In fact… ▽ More Blockchain has recently attracted the attention of the industry due, in part, to its ability to automate asset transfers. It requires distributed participants to reach a consensus on a block despite the presence of malicious (a.k.a. Byzantine) participants. Malicious participants exploit regularly weaknesses of these blockchain consensus algorithms, with sometimes devastating consequences. In fact, these weaknesses are quite common and are well illustrated by the flaws in the hand-written proofs of existing blockchain consensus protocols [63]. Paradoxically, until now, no blockchain consensus has been holistically verified using model checking. In this paper, we remedy this paradox by model checking for the first time a blockchain consensus used in industry. We propose a holistic approach to verify the consensus algorithm of the Red Belly Blockchain [20], for any number $n$ of processes and any number $f<n/3$ of Byzantine processes. We decompose directly the algorithm pseudocode in two parts -- an inner broadcast algorithm and an outer decision algorithm -- each modelled as a threshold automaton [36], and we formalize their expected properties in linear-time temporal logic. We then automatically check the inner broadcasting algorithm, under a carefully identified fairness assumption. For the verification of the outer algorithm, we simplify the model of the inner algorithm by relying on its checked properties. Doing so, we formally verify not only the safety properties of the Red Belly Blockchain consensus but also its liveness in about 70 seconds. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2106.15335 [pdf, other]

Privacy Budget Scheduling

Authors: Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer

Abstract: Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be care… ▽ More Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) -- a variant of the popular Dominant Resource Fairness (DRF) algorithm -- that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over Rényi DP, a highly composable form of DP. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: Extended version of a paper presented at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21)

arXiv:2007.08152 [pdf, other]

doi 10.1145/3350755.3400264

Feasibility of Cross-Chain Payment with Success Guarantees

Authors: Rob van Glabbeek, Vincent Gramoli, Pierre Tholoniat

Abstract: We consider the problem of cross-chain payment whereby customers of different escrows---implemented by a bank or a blockchain smart contract---successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success, or any form of progress. We demonstrate that it is possible to solve this problem when assuming synchrony, in th… ▽ More We consider the problem of cross-chain payment whereby customers of different escrows---implemented by a bank or a blockchain smart contract---successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success, or any form of progress. We demonstrate that it is possible to solve this problem when assuming synchrony, in the sense that each message is guaranteed to arrive within a known amount of time, but impossible to solve without assuming synchrony. Yet, we solve a weaker variant of this problem, where success is conditional on the patience of the participants, without assuming synchrony, and in the presence of Byzantine failures. We also discuss the relation with the recently defined cross-chain deals. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: This is a summary of the work reported in arXiv:1912.04513

ACM Class: C.2.4; D.2.4; F.1.1; F.1.2

Journal ref: Proc. 32nd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA'20, July 2020, pp. 579-581

arXiv:2006.04593 [pdf, other]

ARIANN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing

Authors: Théo Ryffel, Pierre Tholoniat, David Pointcheval, Francis Bach

Abstract: We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural… ▽ More We propose AriaNN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data. Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural networks such as ReLU, MaxPool and BatchNorm. For instance, we perform private comparison for ReLU operations with a single message of the size of the input during the online phase, and with preprocessing keys close to 4X smaller than previous work. Last, we propose an extension to support n-party private federated learning. We implement our framework as an extensible system on top of PyTorch that leverages CPU and GPU hardware acceleration for cryptographic and machine learning operations. We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet. We show that computation rather than communication is the main bottleneck and that using GPUs together with reduced key size is a promising solution to overcome this barrier. △ Less

Submitted 28 October, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 26 pages

arXiv:1912.04513 [pdf, other]

Cross-Chain Payment Protocols with Success Guarantees

Authors: Rob van Glabbeek, Vincent Gramoli, Pierre Tholoniat

Abstract: In this paper, we consider the problem of cross-chain payment whereby customers of different escrows -- implemented by a bank or a blockchain smart contract -- successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success or any form of progress. We introduce a new specification formalism called Asynchronous Networks… ▽ More In this paper, we consider the problem of cross-chain payment whereby customers of different escrows -- implemented by a bank or a blockchain smart contract -- successfully transfer digital assets without trusting each other. Prior to this work, cross-chain payment problems did not require this success or any form of progress. We introduce a new specification formalism called Asynchronous Networks of Timed Automata (ANTA) to formalise such protocols. We present the first cross-chain payment protocol that ensures termination in a bounded amount of time and works correctly in the presence of clock skew. We then demonstrate that it is impossible to solve this problem without assuming synchrony, in the sense that each message is guaranteed to arrive within a known amount of time. We also offer a protocol that solves an eventually terminating variant of this cross-chain payment problem without synchrony, and even in the presence of Byzantine failures. △ Less

Submitted 10 December, 2019; originally announced December 2019.

arXiv:1909.07453 [pdf, other]

Formal Verification of Blockchain Byzantine Fault Tolerance

Authors: Pierre Tholoniat, Vincent Gramoli

Abstract: To implement a blockchain, the trend is now to integrate a non-trivial Byzantine fault tolerant consensus algorithm instead of the seminal idea of waiting to receive blocks to decide upon the longest branch. After a decade of existence, blockchains trade now large amounts of valuable assets and a simple disagreement could lead to disastrous losses. Unfortunately, Byzantine consensus solutions used… ▽ More To implement a blockchain, the trend is now to integrate a non-trivial Byzantine fault tolerant consensus algorithm instead of the seminal idea of waiting to receive blocks to decide upon the longest branch. After a decade of existence, blockchains trade now large amounts of valuable assets and a simple disagreement could lead to disastrous losses. Unfortunately, Byzantine consensus solutions used in blockchains are at best proved correct "by hand" as we are not aware of any of them having been formally verified. In this paper, we propose two contributions: (i) we illustrate the severity of the problem by listing six vulnerabilities of blockchain consensus including two new counter-examples; (ii) we then formally verify two Byzantine fault tolerant components of Red Belly Blockchain using the ByMC model checker. First, we specify a simple broadcast primitive in 116 lines of code that is verified in 40 seconds on a 2-core Intel machine. Then, we specify a blockchain consensus algorithm in 276 lines of code that is verified in 17 minutes on a 64-core AMD machine using MPI. To conclude, we argue that it has now become both relatively simple and crucial to formally verify the correctness of blockchain consensus protocols. △ Less

Submitted 14 October, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

Showing 1–10 of 10 results for author: Tholoniat, P