-
Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks
Authors:
Payam Delgosha,
Hamed Hassani,
Ramtin Pedarsani
Abstract:
We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as we…
▽ More
We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the $\ell_0$-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the $\ell_0$ ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Universal Low Complexity Compression Algorithm for Sparse Marked Graphs
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
Many modern applications involve accessing and processing graphical data, i.e. data that is naturally indexed by graphs. Examples come from internet graphs, social networks, genomics and proteomics, and other sources. The typically large size of such data motivates seeking efficient ways for its compression and decompression. The current compression methods are usually tailored to specific models,…
▽ More
Many modern applications involve accessing and processing graphical data, i.e. data that is naturally indexed by graphs. Examples come from internet graphs, social networks, genomics and proteomics, and other sources. The typically large size of such data motivates seeking efficient ways for its compression and decompression. The current compression methods are usually tailored to specific models, or do not provide theoretical guarantees. In this paper, we introduce a low-complexity lossless compression algorithm for sparse marked graphs, i.e. graphical data indexed by sparse graphs, which is capable of universally achieving the optimal compression rate in a precisely defined sense. In order to define universality, we employ the framework of local weak convergence, which allows one to make sense of a notion of stochastic processes for sparse graphs. Moreover, we investigate the performance of our algorithm through some experimental results on both synthetic and real-world data.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
Binary Classification Under $\ell_0$ Attacks for General Noise Distribution
Authors:
Payam Delgosha,
Hamed Hassani,
Ramtin Pedarsani
Abstract:
Adversarial examples have recently drawn considerable attention in the field of machine learning due to the fact that small perturbations in the data can result in major performance degradation. This phenomenon is usually modeled by a malicious adversary that can apply perturbations to the data in a constrained fashion, such as being bounded in a certain norm. In this paper, we study this problem…
▽ More
Adversarial examples have recently drawn considerable attention in the field of machine learning due to the fact that small perturbations in the data can result in major performance degradation. This phenomenon is usually modeled by a malicious adversary that can apply perturbations to the data in a constrained fashion, such as being bounded in a certain norm. In this paper, we study this problem when the adversary is constrained by the $\ell_0$ norm; i.e., it can perturb a certain number of coordinates in the input, but has no limit on how much it can perturb those coordinates. Due to the combinatorial nature of this setting, we need to go beyond the standard techniques in robust machine learning to address this problem. We consider a binary classification scenario where $d$ noisy data samples of the true label are provided to us after adversarial perturbations. We introduce a classification method which employs a nonlinear component called truncation, and show in an asymptotic scenario, as long as the adversary is restricted to perturb no more than $\sqrt{d}$ data samples, we can almost achieve the optimal classification error in the absence of the adversary, i.e. we can completely neutralize adversary's effect. Surprisingly, we observe a phase transition in the sense that using a converse argument, we show that if the adversary can perturb more than $\sqrt{d}$ coordinates, no classifier can do better than a random guess.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Efficient and Robust Classification for Sparse Attacks
Authors:
Mark Beliaev,
Payam Delgosha,
Hamed Hassani,
Ramtin Pedarsani
Abstract:
In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the $\ell_0$--norm, which have…
▽ More
In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the $\ell_0$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. To this end, we propose a novel defense method that consists of "truncation" and "adversarial training". We then theoretically study the Gaussian mixture setting and prove the asymptotic optimality of our proposed classifier. Motivated by the insights we obtain, we extend these components to neural network classifiers. We conduct numerical experiments in the domain of computer vision using the MNIST and CIFAR datasets, demonstrating significant improvement for the robust classification error of neural networks.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
A Universal Lossless Compression Method applicable to Sparse Graphs and Heavy-Tailed Sparse Graphs
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
Graphical data arises naturally in several modern applications, including but not limited to internet graphs, social networks, genomics and proteomics. The typically large size of graphical data argues for the importance of designing universal compression methods for such data. In most applications, the graphical data is sparse, meaning that the number of edges in the graph scales more slowly than…
▽ More
Graphical data arises naturally in several modern applications, including but not limited to internet graphs, social networks, genomics and proteomics. The typically large size of graphical data argues for the importance of designing universal compression methods for such data. In most applications, the graphical data is sparse, meaning that the number of edges in the graph scales more slowly than $n^2$, where $n$ denotes the number of vertices. Although in some applications the number of edges scales linearly with $n$, in others the number of edges is much smaller than $n^2$ but appears to scale superlinearly with $n$. We call the former sparse graphs and the latter heavy-tailed sparse graphs. In this paper we introduce a universal lossless compression method which is simultaneously applicable to both classes. We do this by employing the local weak convergence framework for sparse graphs and the sparse graphon framework for heavy-tailed sparse graphs.
△ Less
Submitted 17 July, 2021;
originally announced July 2021.
-
Robust Classification Under $\ell_0$ Attack for the Gaussian Mixture Model
Authors:
Payam Delgosha,
Hamed Hassani,
Ramtin Pedarsani
Abstract:
It is well-known that machine learning models are vulnerable to small but cleverly-designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $\ell_0$-bounded…
▽ More
It is well-known that machine learning models are vulnerable to small but cleverly-designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $\ell_0$-bounded adversarial perturbations, a.k.a. sparse attacks. This setting is significantly different from other $\ell_p$-adversarial settings, with $p\geq 1$, as the $\ell_0$-ball is non-convex and highly non-smooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of trade-offs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: Filtration and Truncation. The key idea of our method is to first filter out the non-robust coordinates of the input and then apply a carefully-designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We also find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e. we characterize the asymptotically-optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Universal Lossless Compression of Graphical Data
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
Graphical data is comprised of a graph with marks on its edges and vertices. The mark indicates the value of some attribute associated to the respective edge or vertex. Examples of such data arise in social networks, molecular and systems biology, and web graphs, as well as in several other application areas. Our goal is to design schemes that can efficiently compress such graphical data without m…
▽ More
Graphical data is comprised of a graph with marks on its edges and vertices. The mark indicates the value of some attribute associated to the respective edge or vertex. Examples of such data arise in social networks, molecular and systems biology, and web graphs, as well as in several other application areas. Our goal is to design schemes that can efficiently compress such graphical data without making assumptions about its stochastic properties. Namely, we wish to develop a universal compression algorithm for graphical data sources. To formalize this goal, we employ the framework of local weak convergence, also called the objective method, which provides a technique to think of a marked graph as a kind of stationary stochastic processes, stationary with respect to movement between vertices of the graph. In recent work, we have generalized a notion of entropy for unmarked graphs in this framework, due to Bordenave and Caputo, to the case of marked graphs. We use this notion to evaluate the efficiency of a compression scheme. The lossless compression scheme we propose in this paper is then proved to be universally optimal in a precise technical sense. It is also capable of performing local data queries in the compressed form.
△ Less
Submitted 21 September, 2019;
originally announced September 2019.
-
A Notion of Entropy for Stochastic Processes on Marked Rooted Graphs
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
In this document, we introduce a notion of entropy for stochastic processes on marked rooted graphs. For this, we employ the framework of local weak limit theory for sparse marked graphs, also known as the objective method, due to Benjamini, Schramm, Aldous, Steele and Lyons. Our contribution is a generalization of the notion of entropy introduced by Bordenave and Caputo to graphs which carry mark…
▽ More
In this document, we introduce a notion of entropy for stochastic processes on marked rooted graphs. For this, we employ the framework of local weak limit theory for sparse marked graphs, also known as the objective method, due to Benjamini, Schramm, Aldous, Steele and Lyons. Our contribution is a generalization of the notion of entropy introduced by Bordenave and Caputo to graphs which carry marks on their vertices and edges.
The theory of time series is the engine driving an enormous range of applications in areas such as control theory, communications, information theory and signal processing. It is to be expected that a theory of stationary stochastic processes indexed by combinatorial structures, in particular graphs, would eventually have a similarly wide-ranging impact.
△ Less
Submitted 2 August, 2019;
originally announced August 2019.
-
Deep Switch Networks for Generating Discrete Data and Language
Authors:
Payam Delgosha,
Naveen Goela
Abstract:
Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive s…
▽ More
Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive switches which model conditional distributions of discrete random variables. An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. To learn network parameters, stochastic gradient descent is applied to the objective. This direct optimization is stable until convergence, and does not involve back-propagation over separate encoder and decoder networks, or adversarial training of dueling networks. While training remains tractable for moderately sized networks, Markov-chain Monte Carlo (MCMC) approximations of gradients are derived for deep networks which contain latent variables. The statistical framework is evaluated on synthetic data, high-dimensional binary data of handwritten digits, and web-crawled natural language data. Aspects of the model's framework such as interpretability, computational complexity, and generalization ability are discussed.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
Distributed Compression of Graphical Data
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
In contrast to time series, graphical data is data indexed by the vertices and edges of a graph. Modern applications such as the internet, social networks, genomics and proteomics generate graphical data, often at large scale. The large scale argues for the need to compress such data for storage and subsequent processing. Since this data might have several components available in different locatio…
▽ More
In contrast to time series, graphical data is data indexed by the vertices and edges of a graph. Modern applications such as the internet, social networks, genomics and proteomics generate graphical data, often at large scale. The large scale argues for the need to compress such data for storage and subsequent processing. Since this data might have several components available in different locations, it is also important to study distributed compression of graphical data. In this paper, we derive a rate region for this problem which is a counterpart of the Slepian-Wolf theorem. We characterize the rate region when the statistical description of the distributed graphical data can be modeled as being one of two types - as a member of a sequence of marked sparse Erdos-Renyi ensembles or as a member of a sequence of marked configuration model ensembles. Our results are in terms of a generalization of the notion of entropy introduced by Bordenave and Caputo in the study of local weak limits of sparse graphs. Furthermore, we give a generalization of this result for Erdos-Renyi and configuration model ensembles with more than two sources.
△ Less
Submitted 20 August, 2021; v1 submitted 21 February, 2018;
originally announced February 2018.
-
Load Balancing in Hypergraphs
Authors:
Payam Delgosha,
Venkat Anantharam
Abstract:
Consider a simple locally finite hypergraph on a countable vertex set, where each edge represents one unit of load which should be distributed among the vertices defining the edge. An allocation of load is called balanced if load cannot be moved from a vertex to another that is carrying less load. We analyze the properties of balanced allocations of load. We extend the concept of balancedness from…
▽ More
Consider a simple locally finite hypergraph on a countable vertex set, where each edge represents one unit of load which should be distributed among the vertices defining the edge. An allocation of load is called balanced if load cannot be moved from a vertex to another that is carrying less load. We analyze the properties of balanced allocations of load. We extend the concept of balancedness from finite hypergraphs to their local weak limits in the sense of Benjamini and Schramm (2001) and Aldous and Steele (2004). To do this, we define a notion of unimodularity for hypergraphs which could be considered an extension of unimodularity in graphs. We give a variational formula for the balanced load distribution and, in particular, we characterize it in the special case of unimodular hypergraph Galton Watson processes. Moreover, we prove the convergence of the maximum load under some conditions. Our work is an extension to hypergraphs of Anantharam and Salez (2016), which considered load balancing in graphs, and is aimed at more comprehensively resolving conjectures of Hajek (1990).
△ Less
Submitted 1 October, 2017;
originally announced October 2017.
-
High Probability Guarantees in Repeated Games: Theory and Applications in Information Theory
Authors:
Payam Delgosha,
Amin Gohari,
Mohammad Akbarpour
Abstract:
We introduce a "high probability" framework for repeated games with incomplete information. In our non-equilibrium setting, players aim to guarantee a certain payoff with high probability, rather than in expected value. We provide a high probability counterpart of the classical result of Mertens and Zamir for the zero-sum repeated games. Any payoff that can be guaranteed with high probability can…
▽ More
We introduce a "high probability" framework for repeated games with incomplete information. In our non-equilibrium setting, players aim to guarantee a certain payoff with high probability, rather than in expected value. We provide a high probability counterpart of the classical result of Mertens and Zamir for the zero-sum repeated games. Any payoff that can be guaranteed with high probability can be guaranteed in expectation, but the reverse is not true. Hence, unlike the average payoff case where the payoff guaranteed by each player is the negative of the payoff by the other player, the two guaranteed payoffs would differ in the high probability framework. One motivation for this framework comes from information transmission systems, where it is customary to formulate problems in terms of asymptotically vanishing probability of error. An application of our results to a class of compound arbitrarily varying channels is given.
△ Less
Submitted 28 September, 2015;
originally announced September 2015.
-
Impossibility of Local State Transformation via Hypercontractivity
Authors:
Payam Delgosha,
Salman Beigi
Abstract:
Local state transformation is the problem of transforming an arbitrary number of copies of a bipartite resource state to a bipartite target state under local operations. That is, given two bipartite states, is it possible to transform an arbitrary number of copies of one of them to one copy of the other state under local operations only? This problem is a hard one in general since we assume that t…
▽ More
Local state transformation is the problem of transforming an arbitrary number of copies of a bipartite resource state to a bipartite target state under local operations. That is, given two bipartite states, is it possible to transform an arbitrary number of copies of one of them to one copy of the other state under local operations only? This problem is a hard one in general since we assume that the number of copies of the resource state is arbitrarily large. In this paper we prove some bounds on this problem using the hypercontractivity properties of some super-operators corresponding to bipartite states. We measure hypercontractivity in terms of both the usual super-operator norms as well as completely bounded norms.
△ Less
Submitted 24 July, 2013; v1 submitted 10 July, 2013;
originally announced July 2013.
-
Information Theoretic cutting of a cake
Authors:
Payam Delgosha,
Amin Gohari
Abstract:
Cutting a cake is a metaphor for the problem of dividing a resource (cake) among several agents. The problem becomes non-trivial when the agents have different valuations for different parts of the cake (i.e. one agent may like chocolate while the other may like cream). A fair division of the cake is one that takes into account the individual valuations of agents and partitions the cake based on s…
▽ More
Cutting a cake is a metaphor for the problem of dividing a resource (cake) among several agents. The problem becomes non-trivial when the agents have different valuations for different parts of the cake (i.e. one agent may like chocolate while the other may like cream). A fair division of the cake is one that takes into account the individual valuations of agents and partitions the cake based on some fairness criterion. Fair division may be accomplished in a distributed or centralized way. Due to its natural and practical appeal, it has been a subject of study in economics. To best of our knowledge the role of partial information in fair division has not been studied so far from an information theoretic perspective. In this paper we study two important algorithms in fair division, namely "divide and choose" and "adjusted winner" for the case of two agents. We quantify the benefit of negotiation in the divide and choose algorithm, and its use in tricking the adjusted winner algorithm. Also we analyze the role of implicit information transmission through actions for the repeated divide and choose problem by finding a trembling hand perfect equilibrium for an specific setup. Lastly we consider a centralized algorithm for maximizing the overall welfare of the agents under the Nash collective utility function (CUF). This corresponds to a clustering problem of the type traditionally studied in data mining and machine learning. Drawing a conceptual link between this problem and the portfolio selection problem in stock markets, we prove an upper bound on the increase of the Nash CUF for a clustering refinement.
△ Less
Submitted 24 January, 2016; v1 submitted 18 May, 2012;
originally announced May 2012.