Search | arXiv e-print repository

Privacy-Engineered Value Decomposition Networks for Cooperative Multi-Agent Reinforcement Learning

Authors: Parham Gohari, Matthew Hale, Ufuk Topcu

Abstract: In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's long-term rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingl… ▽ More In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's long-term rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm, an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination, and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guarantee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data, past environment interactions, when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination. △ Less

Submitted 12 September, 2023; originally announced November 2023.

Comments: Paper accepted at 62nd IEEE Conference on Decision and Control

arXiv:2311.01258 [pdf, other]

doi 10.1561/2600000029

Formal Methods for Autonomous Systems

Authors: Tichakorn Wongpiromsarn, Mahsa Ghasemi, Murat Cubuktepe, Georgios Bakirtzis, Steven Carr, Mustafa O. Karabag, Cyrus Neary, Parham Gohari, Ufuk Topcu

Abstract: Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. Th… ▽ More Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2205.12430 [pdf, ps, other]

Additive Logistic Mechanism for Privacy-Preserving Self-Supervised Learning

Authors: Yunhao Yang, Parham Gohari, Ufuk Topcu

Abstract: We study the privacy risks that are associated with training a neural network's weights with self-supervised learning algorithms. Through empirical evidence, we show that the fine-tuning stage, in which the network weights are updated with an informative and often private dataset, is vulnerable to privacy attacks. To address the vulnerabilities, we design a post-training privacy-protection algorit… ▽ More We study the privacy risks that are associated with training a neural network's weights with self-supervised learning algorithms. Through empirical evidence, we show that the fine-tuning stage, in which the network weights are updated with an informative and often private dataset, is vulnerable to privacy attacks. To address the vulnerabilities, we design a post-training privacy-protection algorithm that adds noise to the fine-tuned weights and propose a novel differential privacy mechanism that samples noise from the logistic distribution. Compared to the two conventional additive noise mechanisms, namely the Laplace and the Gaussian mechanisms, the proposed mechanism uses a bell-shaped distribution that resembles the distribution of the Gaussian mechanism, and it satisfies pure $ε$-differential privacy similar to the Laplace mechanism. We apply membership inference attacks on both unprotected and protected models to quantify the trade-off between the models' privacy and performance. We show that the proposed protection algorithm can effectively reduce the attack accuracy to roughly 50\%-equivalent to random guessing-while maintaining a performance loss below 5\%. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 15 pages, 2 figures

arXiv:2110.03054 [pdf, other]

doi 10.56553/popets-2023-0005

On the Privacy Risks of Deploying Recurrent Neural Networks in Machine Learning Models

Authors: Yunhao Yang, Parham Gohari, Ufuk Topcu

Abstract: We study the privacy implications of training recurrent neural networks (RNNs) with sensitive training datasets. Considering membership inference attacks (MIAs), which aim to infer whether or not specific data records have been used in training a given machine learning model, we provide empirical evidence that a neural network's architecture impacts its vulnerability to MIAs. In particular, we dem… ▽ More We study the privacy implications of training recurrent neural networks (RNNs) with sensitive training datasets. Considering membership inference attacks (MIAs), which aim to infer whether or not specific data records have been used in training a given machine learning model, we provide empirical evidence that a neural network's architecture impacts its vulnerability to MIAs. In particular, we demonstrate that RNNs are subject to a higher attack accuracy than feed-forward neural network (FFNN) counterparts. Additionally, we study the effectiveness of two prominent mitigation methods for preempting MIAs, namely weight regularization and differential privacy. For the former, we empirically demonstrate that RNNs may only benefit from weight regularization marginally as opposed to FFNNs. For the latter, we find that enforcing differential privacy through either of the following two methods leads to a less favorable privacy-utility trade-off in RNNs than alternative FFNNs: (i) adding Gaussian noise to the gradients calculated during training as a part of the so-called DP-SGD algorithm and (ii) adding Gaussian noise to the trainable parameters as a part of a post-training mechanism that we propose. As a result, RNNs can also be less amenable to mitigation methods, bringing us to the conclusion that the privacy risks pertaining to the recurrent architecture are higher than the feed-forward counterparts. △ Less

Submitted 15 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Under Double-Blind Review

arXiv:2102.09599 [pdf, other]

Privacy-Preserving Kickstarting Deep Reinforcement Learning with Privacy-Aware Learners

Authors: Parham Gohari, Bo Chen, Bo Wu, Matthew Hale, Ufuk Topcu

Abstract: Kickstarting deep reinforcement learning algorithms facilitate a teacher-student relationship among the agents and allow for a well-performing teacher to share demonstrations with a student to expedite the student's training. However, despite the known benefits, the demonstrations may contain sensitive information about the teacher's training data and existing kickstarting methods do not take any… ▽ More Kickstarting deep reinforcement learning algorithms facilitate a teacher-student relationship among the agents and allow for a well-performing teacher to share demonstrations with a student to expedite the student's training. However, despite the known benefits, the demonstrations may contain sensitive information about the teacher's training data and existing kickstarting methods do not take any measures to protect it. Therefore, we use the framework of differential privacy to develop a mechanism that securely shares the teacher's demonstrations with the student. The mechanism allows for the teacher to decide upon the accuracy of its demonstrations with respect to the privacy budget that it consumes, thereby granting the teacher full control over its data privacy. We then develop a kickstarted deep reinforcement learning algorithm for the student that is privacy-aware because we calibrate its objective with the parameters of the teacher's privacy mechanism. The privacy-aware design of the algorithm makes it possible to kickstart the student's learning despite the perturbations induced by the privacy mechanism. From numerical experiments, we highlight three empirical results: (i) the algorithm succeeds in expediting the student's learning, (ii) the student converges to a performance level that was not possible without the demonstrations, and (iii) the student maintains its enhanced performance even after the teacher stops sharing useful demonstrations due to its privacy budget constraints. △ Less

Submitted 4 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Under double-blind review

arXiv:2004.07778 [pdf, ps, other]

Privacy-Preserving Policy Synthesis in Markov Decision Processes

Authors: Parham Gohari, Matthew Hale, Ufuk Topcu

Abstract: In decision-making problems, the actions of an agent may reveal sensitive information that drives its decisions. For instance, a corporation's investment decisions may reveal its sensitive knowledge about market dynamics. To prevent this type of information leakage, we introduce a policy synthesis algorithm that protects the privacy of the transition probabilities in a Markov decision process. We… ▽ More In decision-making problems, the actions of an agent may reveal sensitive information that drives its decisions. For instance, a corporation's investment decisions may reveal its sensitive knowledge about market dynamics. To prevent this type of information leakage, we introduce a policy synthesis algorithm that protects the privacy of the transition probabilities in a Markov decision process. We use differential privacy as the mathematical definition of privacy. The algorithm first perturbs the transition probabilities using a mechanism that provides differential privacy. Then, based on the privatized transition probabilities, we synthesize a policy using dynamic programming. Our main contribution is to bound the "cost of privacy," i.e., the difference between the expected total rewards with privacy and the expected total rewards without privacy. We also show that computing the cost of privacy has time complexity that is polynomial in the parameters of the problem. Moreover, we establish that the cost of privacy increases with the strength of differential privacy protections, and we quantify this increase. Finally, numerical experiments on two example environments validate the established relationship between the cost of privacy and the strength of data privacy protections. △ Less

Submitted 16 April, 2020; originally announced April 2020.

Comments: Submitted to the Conference on Decision and Control (CDC) 2020

arXiv:1910.00043 [pdf, other]

The Dirichlet Mechanism for Differential Privacy on the Unit Simplex

Authors: Parham Gohari, Bo Wu, Matthew Hale, Ufuk Topcu

Abstract: As members of a network share more information with each other and network providers, sensitive data leakage raises privacy concerns. To address this need for a class of problems, we introduce a novel mechanism that privatizes vectors belonging to the unit simplex. Such vectors can be seen in many applications, such as privatizing a decision-making policy in a Markov decision process. We use diffe… ▽ More As members of a network share more information with each other and network providers, sensitive data leakage raises privacy concerns. To address this need for a class of problems, we introduce a novel mechanism that privatizes vectors belonging to the unit simplex. Such vectors can be seen in many applications, such as privatizing a decision-making policy in a Markov decision process. We use differential privacy as the underlying mathematical framework for these developments. The introduced mechanism is a probabilistic map** that maps a vector within the unit simplex to the same domain according to a Dirichlet distribution. We find the mechanism well-suited for inputs within the unit simplex because it always returns a privatized output that is also in the unit simplex. Therefore, no further projection back onto the unit simplex is required. We verify the privacy guarantees of the mechanism for two cases, namely, identity queries and average queries. In the former case, we derive expressions for the differential privacy level of privatizing a single vector within the unit simplex. In the latter case, we study the mechanism for privatizing the average of a collection of vectors, each of which is in the unit simplex. We establish a trade-off between the strength of privacy and the variance of the mechanism output, and we introduce a parameter to balance the trade-off between them. Numerical results illustrate these developments. △ Less

Submitted 30 September, 2019; originally announced October 2019.

Comments: Submitted to ACC 2020

Showing 1–7 of 7 results for author: Gohari, P