Search | arXiv e-print repository

Efficient and Adaptive Posterior Sampling Algorithms for Bandits

Authors: Bingshan Hu, Zhiming Huang, Tianyue H. Zhang, Mathias Lécuyer, Nidhi Hegde

Abstract: We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-wo… ▽ More We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$α$) and Thompson Sampling with Timestamp Duelling (TS-TD-$α$), where $α\in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{α+1}(T)/Δ\right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $Δ$ denotes the single round performance loss when pulling a sub-optimal arm. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2110.14874 [pdf, other]

Sayer: Using Implicit Feedback to Optimize System Policies

Authors: Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

Abstract: We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and ca… ▽ More We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias. We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning -- randomized exploration and unbiased counterfactual estimators -- to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2103.01379 [pdf, other]

Practical Privacy Filters and Odometers with Rényi Differential Privacy and Applications to Differentially Private Deep Learning

Authors: Mathias Lécuyer

Abstract: Differential Privacy (DP) is the leading approach to privacy preserving deep learning. As such, there are multiple efforts to provide drop-in integration of DP into popular frameworks. These efforts, which add noise to each gradient computation to make it DP, rely on composition theorems to bound the total privacy loss incurred over this sequence of DP computations. However, existing composition… ▽ More Differential Privacy (DP) is the leading approach to privacy preserving deep learning. As such, there are multiple efforts to provide drop-in integration of DP into popular frameworks. These efforts, which add noise to each gradient computation to make it DP, rely on composition theorems to bound the total privacy loss incurred over this sequence of DP computations. However, existing composition theorems present a tension between efficiency and flexibility. Most theorems require all computations in the sequence to have a predefined DP parameter, called the privacy budget. This prevents the design of training algorithms that adapt the privacy budget on the fly, or that terminate early to reduce the total privacy loss. Alternatively, the few existing composition results for adaptive privacy budgets provide complex bounds on the privacy loss, with constants too large to be practical. In this paper, we study DP composition under adaptive privacy budgets through the lens of Rényi Differential Privacy, proving a simpler composition theorem with smaller constants, making it practical enough to use in algorithm design. We demonstrate two applications of this theorem for DP deep learning: adapting the noise or batch size online to improve a model's accuracy within a fixed total privacy loss, and stop** early when fine-tuning a model to reduce total privacy loss. △ Less

Submitted 4 June, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:1909.01502 [pdf, other]

doi 10.1145/3341301.3359639

Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

Authors: Mathias Lecuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu

Abstract: Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of t… ▽ More Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of the most pressing systems challenges of global DP: running out of privacy budget and the privacy-utility tradeoff. To address the former, we develop block composition, a new privacy loss accounting method that leverages the growing database regime of ML workloads to keep training models endlessly on a sensitive data stream while enforcing a global DP guarantee for the stream. To address the latter, we develop privacy-adaptive training, a process that trains a model on growing amounts of data and/or with increasing privacy parameters until, with high probability, the model meets developer-configured quality criteria. They illustrate how a systems focus on characteristics of ML workloads enables pragmatic solutions that are not apparent when one focuses on individual algorithms, as most DP ML literature does. △ Less

Submitted 6 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Extended version of a paper presented at the 27th ACM Symposium on Operating Systems Principles (SOSP '19)

arXiv:1802.03471 [pdf, other]

Certified Robustness to Adversarial Examples with Differential Privacy

Authors: Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana

Abstract: Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustn… ▽ More Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustness to norm-bounded attacks, but they either do not scale to large datasets or are limited in the types of models they can support. This paper presents the first certified defense that both scales to large networks and datasets (such as Google's Inception network for ImageNet) and applies broadly to arbitrary model types. Our defense, called PixelDP, is based on a novel connection between robustness against adversarial examples and differential privacy, a cryptographically-inspired formalism, that provides a rigorous, generic, and flexible foundation for defense. △ Less

Submitted 29 May, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

Showing 1–5 of 5 results for author: Lécuyer, M