-
Towards unlocking the mystery of adversarial fragility of neural networks
Authors:
**gchao Gao,
Raghu Mudumbai,
Xiaodong Wu,
Jirong Yi,
Catherine Xu,
Hui Xie,
Weiyu Xu
Abstract:
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne…
▽ More
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness. Our matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models
Authors:
Raghu Mudumbai,
Tyler Bell
Abstract:
We propose a new asymptotic equipartition property for the perplexity of a large piece of text generated by a language model and present theoretical arguments for this property. Perplexity, defined as a inverse likelihood function, is widely used as a performance metric for training language models. Our main result states that the logarithmic perplexity of any large text produced by a language mod…
▽ More
We propose a new asymptotic equipartition property for the perplexity of a large piece of text generated by a language model and present theoretical arguments for this property. Perplexity, defined as a inverse likelihood function, is widely used as a performance metric for training language models. Our main result states that the logarithmic perplexity of any large text produced by a language model must asymptotically converge to the average entropy of its token distributions. This means that language models are constrained to only produce outputs from a ``typical set", which we show, is a vanishingly small subset of all possible grammatically correct outputs. We present preliminary experimental results from an open-source language model to support our theoretical claims. This work has possible practical applications for understanding and improving ``AI detection" tools and theoretical implications for the uniqueness, predictability and creative potential of generative models.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Trust, but Verify: Robust Image Segmentation using Deep Learning
Authors:
Fahim Ahmed Zaman,
Xiaodong Wu,
Weiyu Xu,
Milan Sonka,
Raghuraman Mudumbai
Abstract:
We describe a method for verifying the output of a deep neural network for medical image segmentation that is robust to several classes of random as well as worst-case perturbations i.e. adversarial attacks. This method is based on a general approach recently developed by the authors called "Trust, but Verify" wherein an auxiliary verification network produces predictions about certain masked feat…
▽ More
We describe a method for verifying the output of a deep neural network for medical image segmentation that is robust to several classes of random as well as worst-case perturbations i.e. adversarial attacks. This method is based on a general approach recently developed by the authors called "Trust, but Verify" wherein an auxiliary verification network produces predictions about certain masked features in the input image using the segmentation as an input. A well-designed auxiliary network will produce high-quality predictions when the input segmentations are accurate, but will produce low-quality predictions when the segmentations are incorrect. Checking the predictions of such a network with the original image allows us to detect bad segmentations. However, to ensure the verification method is truly robust, we need a method for checking the quality of the predictions that does not itself rely on a black-box neural network. Indeed, we show that previous methods for segmentation evaluation that do use deep neural regression networks are vulnerable to false negatives i.e. can inaccurately label bad segmentations as good. We describe the design of a verification network that avoids such vulnerability and present results to demonstrate its robustness compared to previous methods.
△ Less
Submitted 19 December, 2023; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Optimal Pooling Matrix Design for Group Testing with Dilution (Row Degree) Constraints
Authors:
Jirong Yi,
Myung Cho,
Xiaodong Wu,
Raghu Mudumbai,
Weiyu Xu
Abstract:
In this paper, we consider the problem of designing optimal pooling matrix for group testing (for example, for COVID-19 virus testing) with the constraint that no more than $r>0$ samples can be pooled together, which we call "dilution constraint". This problem translates to designing a matrix with elements being either 0 or 1 that has no more than $r$ '1's in each row and has a certain performance…
▽ More
In this paper, we consider the problem of designing optimal pooling matrix for group testing (for example, for COVID-19 virus testing) with the constraint that no more than $r>0$ samples can be pooled together, which we call "dilution constraint". This problem translates to designing a matrix with elements being either 0 or 1 that has no more than $r$ '1's in each row and has a certain performance guarantee of identifying anomalous elements. We explicitly give pooling matrix designs that satisfy the dilution constraint and have performance guarantees of identifying anomalous elements, and prove their optimality in saving the largest number of tests, namely showing that the designed matrices have the largest width-to-height ratio among all constraint-satisfying 0-1 matrices.
△ Less
Submitted 5 August, 2020;
originally announced August 2020.
-
Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning
Authors:
Jirong Yi,
Raghu Mudumbai,
Weiyu Xu
Abstract:
We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic pers…
▽ More
We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Low-Cost and High-Throughput Testing of COVID-19 Viruses and Antibodies via Compressed Sensing: System Concepts and Computational Experiments
Authors:
Jirong Yi,
Raghu Mudumbai,
Weiyu Xu
Abstract:
Coronavirus disease 2019 (COVID-19) is an ongoing pandemic infectious disease outbreak that has significantly harmed and threatened the health and lives of millions or even billions of people. COVID-19 has also negatively impacted the social and economic activities of many countries significantly. With no approved vaccine available at this moment, extensive testing of COVID-19 viruses in people ar…
▽ More
Coronavirus disease 2019 (COVID-19) is an ongoing pandemic infectious disease outbreak that has significantly harmed and threatened the health and lives of millions or even billions of people. COVID-19 has also negatively impacted the social and economic activities of many countries significantly. With no approved vaccine available at this moment, extensive testing of COVID-19 viruses in people are essential for disease diagnosis, virus spread confinement, contact tracing, and determining right conditions for people to return to normal economic activities. Identifying people who have antibodies for COVID-19 can also help select persons who are suitable for undertaking certain essential activities or returning to workforce. However, the throughputs of current testing technologies for COVID-19 viruses and antibodies are often quite limited, which are not sufficient for dealing with COVID-19 viruses' anticipated fast oscillating waves of spread affecting a significant portion of the earth's population.
In this paper, we propose to use compressed sensing (group testing can be seen as a special case of compressed sensing when it is applied to COVID-19 detection) to achieve high-throughput rapid testing of COVID-19 viruses and antibodies, which can potentially provide tens or even more folds of speedup compared with current testing technologies. The proposed compressed sensing system for high-throughput testing can utilize expander graph based compressed sensing matrices developed by us \cite{Weiyuexpander2007}.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Do Deep Minds Think Alike? Selective Adversarial Attacks for Fine-Grained Manipulation of Multiple Deep Neural Networks
Authors:
Zain Khan,
Jirong Yi,
Raghu Mudumbai,
Xiaodong Wu,
Weiyu Xu
Abstract:
Recent works have demonstrated the existence of {\it adversarial examples} targeting a single machine learning system. In this paper we ask a simple but fundamental question of "selective fooling": given {\it multiple} machine learning systems assigned to solve the same classification problem and taking the same input signal, is it possible to construct a perturbation to the input signal that mani…
▽ More
Recent works have demonstrated the existence of {\it adversarial examples} targeting a single machine learning system. In this paper we ask a simple but fundamental question of "selective fooling": given {\it multiple} machine learning systems assigned to solve the same classification problem and taking the same input signal, is it possible to construct a perturbation to the input signal that manipulates the outputs of these {\it multiple} machine learning systems {\it simultaneously} in arbitrary pre-defined ways? For example, is it possible to selectively fool a set of "enemy" machine learning systems but does not fool the other "friend" machine learning systems? The answer to this question depends on the extent to which these different machine learning systems "think alike". We formulate the problem of "selective fooling" as a novel optimization problem, and report on a series of experiments on the MNIST dataset. Our preliminary findings from these experiments show that it is in fact very easy to selectively manipulate multiple MNIST classifiers simultaneously, even when the classifiers are identical in their architectures, training algorithms and training datasets except for random initialization during training. This suggests that two nominally equivalent machine learning systems do not in fact "think alike" at all, and opens the possibility for many novel applications and deeper understandings of the working principles of deep neural networks.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Trust but Verify: An Information-Theoretic Explanation for the Adversarial Fragility of Machine Learning Systems, and a General Defense against Adversarial Attacks
Authors:
Jirong Yi,
Hui Xie,
Leixin Zhou,
Xiaodong Wu,
Weiyu Xu,
Raghuraman Mudumbai
Abstract:
Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this…
▽ More
Deep-learning based classification algorithms have been shown to be susceptible to adversarial attacks: minor changes to the input of classifiers can dramatically change their outputs, while being imperceptible to humans. In this paper, we present a simple hypothesis about a feature compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. Drawing on ideas from information and coding theory, we propose a general class of defenses for detecting classifier errors caused by abnormally small input perturbations. We further show theoretical guarantees for the performance of this detection method. We present experimental results with (a) a voice recognition system, and (b) a digit recognition system using the MNIST database, to demonstrate the effectiveness of the proposed defense methods. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
An Information-Theoretic Explanation for the Adversarial Fragility of AI Classifiers
Authors:
Hui Xie,
Jirong Yi,
Weiyu Xu,
Raghu Mudumbai
Abstract:
We present a simple hypothesis about a compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. We also propose a new method for detecting when small input perturbations cause classifier errors, and show theoretical guarantees…
▽ More
We present a simple hypothesis about a compression property of artificial intelligence (AI) classifiers and present theoretical arguments to show that this hypothesis successfully accounts for the observed fragility of AI classifiers to small adversarial perturbations. We also propose a new method for detecting when small input perturbations cause classifier errors, and show theoretical guarantees for the performance of this detection method. We present experimental results with a voice recognition system to demonstrate this method. The ideas in this paper are motivated by a simple analogy between AI classifiers and the standard Shannon model of a communication system.
△ Less
Submitted 27 January, 2019;
originally announced January 2019.
-
Serving the Grid: an Experimental Study of Server Clusters as Real-Time Demand Response Resources
Authors:
Josiah McClurg,
Raghuraman Mudumbai
Abstract:
Demand response is a crucial technology to allow large-scale penetration of intermittent renewable energy sources in the electric grid. This paper is based on the thesis that datacenters represent especially attractive candidates for providing flexible, real-time demand response services to the grid; they are capable of finely-controllable power consumption, fast power ramp-rates, and large dynami…
▽ More
Demand response is a crucial technology to allow large-scale penetration of intermittent renewable energy sources in the electric grid. This paper is based on the thesis that datacenters represent especially attractive candidates for providing flexible, real-time demand response services to the grid; they are capable of finely-controllable power consumption, fast power ramp-rates, and large dynamic range. This paper makes two main contributions: (a) it provides detailed experimental evidence justifying this thesis, and (b) it presents a comparative investigation of three candidate software interfaces for power control within the servers. All of these results are based on a series of experiments involving real-time power measurements on a lab-scale server cluster. This cluster was specially instrumented for accurate and fast power measurements on a time-scale of 100 ms or less. Our results provide preliminary evidence for the feasibility of large scale demand response using datacenters, and motivates future work on exploiting this capability.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
PHY-layer link quality indicators for wireless networks using matched-filters
Authors:
Henry E. Baidoo-Williams,
Octav Chipara,
Raghuraman Mudumbai,
Soura Dasgupta
Abstract:
We present a novel approach to accurate real-time estimation of wireless link quality using simple matched-filtering techniques. Our approach is based on the simple observation that there is a portion of each packet transmission from any given node that does not change from one packet to another; this includes preamble sequences used to synchronize the receiver and also address information in the…
▽ More
We present a novel approach to accurate real-time estimation of wireless link quality using simple matched-filtering techniques. Our approach is based on the simple observation that there is a portion of each packet transmission from any given node that does not change from one packet to another; this includes preamble sequences used to synchronize the receiver and also address information in the packet header used for medium access control and routing. Our approach can be thought of as a generalized and simplified variant of standard signal processing techniques that are commonly used for preamble detection, automatic gain control, carrier sensing and other functions in many packet wireless networks. By using a combination of energy detection and correlation techniques, we show that we can effectively detect packet transmissions in real-time with low complexity, without decoding the packets themselves, and indeed, even without detailed knowledge of the packet format. We present extensive experimental results from a software-defined radio testbed to illustrate the effectiveness of this approach for 802.15.4 (Zigbee) networks even in the presence of strong interference signals and low SNR.
△ Less
Submitted 19 August, 2015; v1 submitted 19 August, 2015;
originally announced August 2015.
-
Subspace based low rank and joint sparse matrix recovery
Authors:
Sampurna Biswas,
Sunrita Poddar,
Soura Dasgupta,
Raghuraman Mudumbai,
Mathews Jacob
Abstract:
We consider the recovery of a low rank and jointly sparse matrix from under sampled measurements of its columns. This problem is highly relevant in the recovery of dynamic MRI data with high spatio-temporal resolution, where each column of the matrix corresponds to a frame in the image time series; the matrix is highly low-rank since the frames are highly correlated. Similarly the non-zero locatio…
▽ More
We consider the recovery of a low rank and jointly sparse matrix from under sampled measurements of its columns. This problem is highly relevant in the recovery of dynamic MRI data with high spatio-temporal resolution, where each column of the matrix corresponds to a frame in the image time series; the matrix is highly low-rank since the frames are highly correlated. Similarly the non-zero locations of the matrix in appropriate transform/frame domains (e.g. wavelet, gradient) are roughly the same in different frame. The superset of the support can be safely assumed to be jointly sparse. Unlike the classical multiple measurement vector (MMV) setup that measures all the snapshots using the same matrix, we consider each snapshot to be measured using a different measurement matrix. We show that this approach reduces the total number of measurements, especially when the rank of the matrix is much smaller than than its sparsity. Our experiments in the context of dynamic imaging shows that this approach is very useful in realizing free breathing cardiac MRI.
△ Less
Submitted 2 June, 2015; v1 submitted 5 December, 2014;
originally announced December 2014.
-
Two step recovery of jointly sparse and low-rank matrices: theoretical guarantees
Authors:
Sampurna Biswas,
Sunrita Poddar,
Soura Dasgupta,
Raghuraman Mudumbai,
Mathews Jacob
Abstract:
We introduce a two step algorithm with theoretical guarantees to recover a jointly sparse and low-rank matrix from undersampled measurements of its columns. The algorithm first estimates the row subspace of the matrix using a set of common measurements of the columns. In the second step, the subspace aware recovery of the matrix is solved using a simple least square algorithm. The results are veri…
▽ More
We introduce a two step algorithm with theoretical guarantees to recover a jointly sparse and low-rank matrix from undersampled measurements of its columns. The algorithm first estimates the row subspace of the matrix using a set of common measurements of the columns. In the second step, the subspace aware recovery of the matrix is solved using a simple least square algorithm. The results are verified in the context of recovering CINE data from undersampled measurements; we obtain good recovery when the sampling conditions are satisfied.
△ Less
Submitted 2 June, 2015; v1 submitted 5 December, 2014;
originally announced December 2014.
-
Distributed Transmit Beamforming using Feedback Control
Authors:
R. Mudumbai,
J. Hespanha,
U. Madhow,
G. Barriac
Abstract:
A simple feedback control algorithm is presented for distributed beamforming in a wireless network. A network of wireless sensors that seek to cooperatively transmit a common message signal to a Base Station (BS) is considered. In this case, it is well-known that substantial energy efficiencies are possible by using distributed beamforming. The feedback algorithm is shown to achieve the carrier…
▽ More
A simple feedback control algorithm is presented for distributed beamforming in a wireless network. A network of wireless sensors that seek to cooperatively transmit a common message signal to a Base Station (BS) is considered. In this case, it is well-known that substantial energy efficiencies are possible by using distributed beamforming. The feedback algorithm is shown to achieve the carrier phase coherence required for beamforming in a scalable and distributed manner. In the proposed algorithm, each sensor independently makes a random adjustment to its carrier phase. Assuming that the BS is able to broadcast one bit of feedback each timeslot about the change in received signal to noise ratio (SNR), the sensors are able to keep the favorable phase adjustments and discard the unfavorable ones, asymptotically achieving perfect phase coherence. A novel analytical model is derived that accurately predicts the convergence rate. The analytical model is used to optimize the algorithm for fast convergence and to establish the scalability of the algorithm.
△ Less
Submitted 17 March, 2006;
originally announced March 2006.