Search | arXiv e-print repository

Quantum Circuit Optimization with AlphaTensor

Authors: Francisco J. R. Ruiz, Tuomas Laakkonen, Johannes Bausch, Matej Balog, Mohammadamin Barekatain, Francisco J. H. Heras, Alexander Novikov, Nathan Fitzpatrick, Bernardino Romera-Paredes, John van de Wetering, Alhussein Fawzi, Konstantinos Meichanetzidis, Pushmeet Kohli

Abstract: A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcem… ▽ More A key challenge in realizing fault-tolerant quantum computers is circuit optimization. Focusing on the most expensive gates in fault-tolerant quantum computation (namely, the T gates), we address the problem of T-count optimization, i.e., minimizing the number of T gates that are needed to implement a given circuit. To achieve this, we develop AlphaTensor-Quantum, a method based on deep reinforcement learning that exploits the relationship between optimizing T-count and tensor decomposition. Unlike existing methods for T-count optimization, AlphaTensor-Quantum can incorporate domain-specific knowledge about quantum computation and leverage gadgets, which significantly reduces the T-count of the optimized circuits. AlphaTensor-Quantum outperforms the existing methods for T-count optimization on a set of arithmetic benchmarks (even when compared without making use of gadgets). Remarkably, it discovers an efficient algorithm akin to Karatsuba's method for multiplication in finite fields. AlphaTensor-Quantum also finds the best human-designed solutions for relevant arithmetic computations used in Shor's algorithm and for quantum chemistry simulation, thus demonstrating it can save hundreds of hours of research by optimizing relevant quantum circuits in a fully automated way. △ Less

Submitted 5 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 25 pages main paper + 19 pages appendix

arXiv:2309.10123 [pdf, other]

On the generalization of the Kruskal-Szekeres coordinates: a global conformal charting of the Reissner-Nordstrom spacetime

Authors: Ali Fawzi, Dejan Stojkovic

Abstract: The Kruskal-Szekeres coordinates construction for the Schwarzschild spacetime could be viewed geometrically as a squeezing of the $t$-line associated with the asymptotic observer into a single point, at the event horizon $r=2M$. Starting from this point, we extend the Kruskal charting to spacetimes with two horizons, in particular the Reissner-Nordström manifold, $\mathcal{M}_{RN}$. We develop a n… ▽ More The Kruskal-Szekeres coordinates construction for the Schwarzschild spacetime could be viewed geometrically as a squeezing of the $t$-line associated with the asymptotic observer into a single point, at the event horizon $r=2M$. Starting from this point, we extend the Kruskal charting to spacetimes with two horizons, in particular the Reissner-Nordström manifold, $\mathcal{M}_{RN}$. We develop a new method for constructing Kruskal-like coordinates and find two algebraically distinct classes charting $\mathcal{M}_{RN}$. We pedagogically illustrate our method by constructing two compact, conformal, and global coordinate systems labeled $\mathcal{GK_{I}}$ and $\mathcal{GK_{II}}$ for each class respectively. In both coordinates, the metric differentiability can be promoted to $C^\infty$. The conformal metric factor can be explicitly written in terms of the original $t$ and $r$ coordinates for both charts. △ Less

Submitted 31 December, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Penrose diagrams added

arXiv:2101.10405 [pdf, other]

doi 10.1016/j.cma.2021.114271

Determining rigid body motion from accelerometer data through the square-root of a negative semi-definite tensor, with applications in mild traumatic brain injury

Authors: Yang Wan, Alice Lux Fawzi, Haneesh Kesari

Abstract: Mild Traumatic Brain Injuries (mTBI) are caused by violent head motions or impacts. Most mTBI prevention strategies explicitly or implicitly rely on a "brain injury criterion". A brain injury criterion takes some descriptor of the head's motion as input and yields a prediction for that motion's potential for causing mTBI as the output. The inputs are descriptors of the head's motion that are usual… ▽ More Mild Traumatic Brain Injuries (mTBI) are caused by violent head motions or impacts. Most mTBI prevention strategies explicitly or implicitly rely on a "brain injury criterion". A brain injury criterion takes some descriptor of the head's motion as input and yields a prediction for that motion's potential for causing mTBI as the output. The inputs are descriptors of the head's motion that are usually synthesized from accelerometer and gyroscope data. In the context of brain injury criterion the head is modeled as a rigid body. We present an algorithm for determining the complete motion of the head using data from only four head mounted tri-axial accelerometers. In contrast to inertial measurement unit based algorithms for determining rigid body motion the presented algorithm does not depend on data from gyroscopes; which consume much more power than accelerometers. Several algorithms that also make use of data from only accelerometers already exist. However, those algorithms, except for the recently presented AO-algorithm [Rahaman MM, Fang W, Fawzi AL, Wan Y, Kesari H (2020): J Mech Phys Solids 104014], give the rigid body's acceleration field in terms of the body frame, which in general is unknown. Compared to the AO-algorithm the presented algorithm is much more insensitive to bias type errors, such as those that arise from inaccurate measurement of sensor positions and orientations. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: 30 pages, 9 figures

arXiv:1911.09556 [pdf, other]

An accelerometer-only algorithm for determining the acceleration field of a rigid body, with application in studying the mechanics of mild Traumatic Brain Injury

Authors: Mohammad Masiur Rahaman, Wenqiang Fang, Alice Lux Fawzi, Yang Wan, Haneesh Kesari

Abstract: We present an algorithm for determining the acceleration field of a rigid body using measurements from four tri-axial accelerometers. The acceleration field is an important quantity in bio-mechanics problems, especially in the study of mild Traumatic Brain Injury (mTBI). The in vivo strains in the brain, which are hypothesized to closely correlate with brain injury, are generally not directly acce… ▽ More We present an algorithm for determining the acceleration field of a rigid body using measurements from four tri-axial accelerometers. The acceleration field is an important quantity in bio-mechanics problems, especially in the study of mild Traumatic Brain Injury (mTBI). The in vivo strains in the brain, which are hypothesized to closely correlate with brain injury, are generally not directly accessible outside of a laboratory setting. However, they can be estimated on knowing the head's acceleration field. In contrast to other techniques, the proposed algorithm uses data exclusively from accelerometers, rather than from a combination of accelerometers and gyroscopes. For that reason, the proposed accelerometer only (AO) algorithm does not involve any numerical differentiation of data, which is known to greatly amplify measurement noise. For applications where only the magnitude of the acceleration vector is of interest, the algorithm is straightforward, computationally efficient and does not require computation of angular velocity or orientation. When both the magnitude and direction of acceleration are of interest, the proposed algorithm involves the calculation of the angular velocity and orientation as intermediate steps. In addition to hel** understand the mechanics of mTBI, the AO-algorithm may find widespread use in several bio-mechanical applications, gyroscope-free inertial navigation units, ballistic platform guidance, and platform control. △ Less

Submitted 21 November, 2019; originally announced November 2019.

Comments: 35 pages, 7 figures

arXiv:1907.02610 [pdf, other]

Adversarial Robustness through Local Linearization

Authors: Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, Pushmeet Kohli

Abstract: Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust agai… ▽ More Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255. △ Less

Submitted 10 October, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

arXiv:1906.01681 [pdf, other]

Learning dynamic polynomial proofs

Authors: Alhussein Fawzi, Mateusz Malinowski, Hamza Fawzi, Omar Fawzi

Abstract: Polynomial inequalities lie at the heart of many mathematical disciplines. In this paper, we consider the fundamental computational task of automatically searching for proofs of polynomial inequalities. We adopt the framework of semi-algebraic proof systems that manipulate polynomial inequalities via elementary inference rules that infer new inequalities from the premises. These proof systems are… ▽ More Polynomial inequalities lie at the heart of many mathematical disciplines. In this paper, we consider the fundamental computational task of automatically searching for proofs of polynomial inequalities. We adopt the framework of semi-algebraic proof systems that manipulate polynomial inequalities via elementary inference rules that infer new inequalities from the premises. These proof systems are known to be very powerful, but searching for proofs remains a major difficulty. In this work, we introduce a machine learning based method to search for a dynamic proof within these proof systems. We propose a deep reinforcement learning framework that learns an embedding of the polynomials and guides the choice of inference rules, taking the inherent symmetries of the problem as an inductive bias. We compare our approach with powerful and widely-studied linear programming hierarchies based on static proof systems, and show that our method reduces the size of the linear program by several orders of magnitude while also improving performance. These results hence pave the way towards augmenting powerful and well-studied semi-algebraic proof systems with machine learning guiding strategies for enhancing the expressivity of such proof systems. △ Less

Submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.13725 [pdf, other]

Are Labels Required for Improving Adversarial Robustness?

Authors: Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli

Abstract: Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that… ▽ More Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR-10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-the-art on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training. △ Less

Submitted 5 December, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

Comments: Appears in the Thirty-Third Annual Conference on Neural Information Processing Systems (NeurIPS 2019)

arXiv:1812.02795 [pdf, other]

Verification of deep probabilistic models

Authors: Krishnamurthy Dvijotham, Marta Garnelo, Alhussein Fawzi, Pushmeet Kohli

Abstract: Probabilistic models are a critical part of the modern deep learning toolbox - ranging from generative models (VAEs, GANs), sequence to sequence models used in machine translation and speech processing to models over functional spaces (conditional neural processes, neural processes). Given the size and complexity of these models, safely deploying them in applications requires the development of to… ▽ More Probabilistic models are a critical part of the modern deep learning toolbox - ranging from generative models (VAEs, GANs), sequence to sequence models used in machine translation and speech processing to models over functional spaces (conditional neural processes, neural processes). Given the size and complexity of these models, safely deploying them in applications requires the development of tools to analyze their behavior rigorously and provide some guarantees that these models are consistent with a list of desirable properties or specifications. For example, a machine translation model should produce semantically equivalent outputs for innocuous changes in the input to the model. A functional regression model that is learning a distribution over monotonic functions should predict a larger value at a larger input. Verification of these properties requires a new framework that goes beyond notions of verification studied in deterministic feedforward networks, since requiring worst-case guarantees in probabilistic models is likely to produce conservative or vacuous results. We propose a novel formulation of verification for deep probabilistic models that take in conditioning inputs and sample latent variables in the course of producing an output: We require that the output of the model satisfies a linear constraint with high probability over the sampling of latent variables and for every choice of conditioning input to the model. We show that rigorous lower bounds on the probability that the constraint is satisfied can be obtained efficiently. Experiments with neural processes show that several properties of interest while modeling functional spaces can be modeled within this framework (monotonicity, convexity) and verified efficiently using our algorithms △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Accepted to NeurIPS 2018 Workshop on Security in Machine Learning

arXiv:1811.09716 [pdf, other]

Robustness via curvature regularization, and vice versa

Authors: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, Pascal Frossard

Abstract: State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the most effective strategies to improve robustness is adversarial training. In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. We show in particular that adversarial training leads to a significant decreas… ▽ More State-of-the-art classifiers have been shown to be largely vulnerable to adversarial perturbations. One of the most effective strategies to improve robustness is adversarial training. In this paper, we investigate the effect of adversarial training on the geometry of the classification landscape and decision boundaries. We show in particular that adversarial training leads to a significant decrease in the curvature of the loss surface with respect to inputs, leading to a drastically more "linear" behaviour of the network. Using a locally quadratic approximation, we provide theoretical evidence on the existence of a strong relation between large robustness and small curvature. To further show the importance of reduced curvature for improving the robustness, we propose a new regularizer that directly minimizes curvature of the loss surface, and leads to adversarial robustness that is on par with adversarial training. Besides being a more efficient and principled alternative to adversarial training, the proposed regularizer confirms our claims on the importance of exhibiting quasi-linear behavior in the vicinity of data points in order to achieve robustness. △ Less

Submitted 23 November, 2018; originally announced November 2018.

arXiv:1805.00980 [pdf, other]

SaaS: Speed as a Supervisor for Semi-supervised Learning

Authors: Safa Cicek, Alhussein Fawzi, Stefano Soatto

Abstract: We introduce the SaaS Algorithm for semi-supervised learning, which uses learning speed during stochastic gradient descent in a deep neural network to measure the quality of an iterative estimate of the posterior probability of unknown labels. Training speed in supervised learning correlates strongly with the percentage of correct labels, so we use it as an inference criterion for the unknown labe… ▽ More We introduce the SaaS Algorithm for semi-supervised learning, which uses learning speed during stochastic gradient descent in a deep neural network to measure the quality of an iterative estimate of the posterior probability of unknown labels. Training speed in supervised learning correlates strongly with the percentage of correct labels, so we use it as an inference criterion for the unknown labels, without attempting to infer the model parameters at first. Despite its simplicity, SaaS achieves state-of-the-art results in semi-supervised learning benchmarks. △ Less

Submitted 2 May, 2018; originally announced May 2018.

arXiv:1802.08686 [pdf, other]

Adversarial vulnerability for any classifier

Authors: Alhussein Fawzi, Hamza Fawzi, Omar Fawzi

Abstract: Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper b… ▽ More Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets. △ Less

Submitted 30 November, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: NeurIPS 2018

arXiv:1802.07971 [pdf, other]

Robustness of classifiers to uniform $\ell\_p$ and Gaussian noise

Authors: Jean-Yves Franceschi, Alhussein Fawzi, Omar Fawzi

Abstract: We study the robustness of classifiers to various kinds of random noise models. In particular, we consider noise drawn uniformly from the $\ell\_p$ ball for $p \in [1, \infty]$ and Gaussian noise with an arbitrary covariance matrix. We characterize this robustness to random noise in terms of the distance to the decision boundary of the classifier. This analysis applies to linear classifiers as wel… ▽ More We study the robustness of classifiers to various kinds of random noise models. In particular, we consider noise drawn uniformly from the $\ell\_p$ ball for $p \in [1, \infty]$ and Gaussian noise with an arbitrary covariance matrix. We characterize this robustness to random noise in terms of the distance to the decision boundary of the classifier. This analysis applies to linear classifiers as well as classifiers with locally approximately flat decision boundaries, a condition which is satisfied by state-of-the-art deep neural networks. The predicted robustness is verified experimentally. △ Less

Submitted 22 February, 2018; originally announced February 2018.

Journal ref: 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Apr 2018, Playa Blanca, Spain. 2018, http://www.aistats.org/

arXiv:1705.09554 [pdf, other]

Robustness of classifiers to universal perturbations: a geometric perspective

Authors: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, Stefano Soatto

Abstract: Deep networks have recently been shown to be vulnerable to universal perturbations: there exist very small image-agnostic perturbations that cause most natural images to be misclassified by such classifiers. In this paper, we propose the first quantitative analysis of the robustness of classifiers to universal perturbations, and draw a formal link between the robustness to universal perturbations,… ▽ More Deep networks have recently been shown to be vulnerable to universal perturbations: there exist very small image-agnostic perturbations that cause most natural images to be misclassified by such classifiers. In this paper, we propose the first quantitative analysis of the robustness of classifiers to universal perturbations, and draw a formal link between the robustness to universal perturbations, and the geometry of the decision boundary. Specifically, we establish theoretical bounds on the robustness of classifiers under two decision boundary models (flat and curved models). We show in particular that the robustness of deep networks to universal perturbations is driven by a key property of their curvature: there exists shared directions along which the decision boundary of deep networks is systematically positively curved. Under such conditions, we prove the existence of small universal perturbations. Our analysis further provides a novel geometric method for computing universal perturbations, in addition to explaining their properties. △ Less

Submitted 1 March, 2021; v1 submitted 26 May, 2017; originally announced May 2017.

Comments: Published at ICLR 2018

arXiv:1705.09552 [pdf, other]

Classification regions of deep neural networks

Authors: Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard, Stefano Soatto

Abstract: The goal of this paper is to analyze the geometric properties of deep neural network classifiers in the input space. We specifically study the topology of classification regions created by deep networks, as well as their associated decision boundary. Through a systematic empirical investigation, we show that state-of-the-art deep nets learn connected classification regions, and that the decision b… ▽ More The goal of this paper is to analyze the geometric properties of deep neural network classifiers in the input space. We specifically study the topology of classification regions created by deep networks, as well as their associated decision boundary. Through a systematic empirical investigation, we show that state-of-the-art deep nets learn connected classification regions, and that the decision boundary in the vicinity of datapoints is flat along most directions. We further draw an essential connection between two seemingly unrelated properties of deep networks: their sensitivity to additive perturbations in the inputs, and the curvature of their decision boundary. The directions where the decision boundary is curved in fact remarkably characterize the directions to which the classifier is the most vulnerable. We finally leverage a fundamental asymmetry in the curvature of the decision boundary of deep nets, and propose a method to discriminate between original images, and images perturbed with small adversarial examples. We show the effectiveness of this purely geometric approach for detecting small adversarial perturbations in images, and for recovering the labels of perturbed images. △ Less

Submitted 26 May, 2017; originally announced May 2017.

arXiv:1610.08401 [pdf, other]

Universal adversarial perturbations

Authors: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard

Abstract: Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being q… ▽ More Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images. △ Less

Submitted 9 March, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

Comments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

arXiv:1608.08967 [pdf, other]

Robustness of classifiers: from adversarial to random noise

Authors: Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard

Abstract: Several recent works have shown that state-of-the-art classifiers are vulnerable to worst-case (i.e., adversarial) perturbations of the datapoints. On the other hand, it has been empirically observed that these same classifiers are relatively robust to random noise. In this paper, we propose to study a \textit{semi-random} noise regime that generalizes both the random and worst-case noise regimes.… ▽ More Several recent works have shown that state-of-the-art classifiers are vulnerable to worst-case (i.e., adversarial) perturbations of the datapoints. On the other hand, it has been empirically observed that these same classifiers are relatively robust to random noise. In this paper, we propose to study a \textit{semi-random} noise regime that generalizes both the random and worst-case noise regimes. We propose the first quantitative analysis of the robustness of nonlinear classifiers in this general noise regime. We establish precise theoretical bounds on the robustness of classifiers in this general regime, which depend on the curvature of the classifier's decision boundary. Our bounds confirm and quantify the empirical observations that classifiers satisfying curvature constraints are robust to random noise. Moreover, we quantify the robustness of classifiers in terms of the subspace dimension in the semi-random noise regime, and show that our bounds remarkably interpolate between the worst-case and random noise regimes. We perform experiments and show that the derived bounds provide very accurate estimates when applied to various state-of-the-art deep neural networks and datasets. This result suggests bounds on the curvature of the classifiers' decision boundaries that we support experimentally, and more generally offers important insights onto the geometry of high dimensional classification problems. △ Less

Submitted 31 August, 2016; originally announced August 2016.

Comments: Accepted to NIPS 2016

arXiv:1511.04599 [pdf, other]

DeepFool: a simple and accurate method to fool deep neural networks

Authors: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard

Abstract: State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the importance of this phenomenon, no effective methods have been proposed to accurately compute the robustness of state-of-the-art deep classifiers to such perturbatio… ▽ More State-of-the-art deep neural networks have achieved impressive results on many image classification tasks. However, these same architectures have been shown to be unstable to small, well sought, perturbations of the images. Despite the importance of this phenomenon, no effective methods have been proposed to accurately compute the robustness of state-of-the-art deep classifiers to such perturbations on large-scale datasets. In this paper, we fill this gap and propose the DeepFool algorithm to efficiently compute perturbations that fool deep networks, and thus reliably quantify the robustness of these classifiers. Extensive experimental results show that our approach outperforms recent methods in the task of computing adversarial perturbations and making classifiers more robust. △ Less

Submitted 4 July, 2016; v1 submitted 14 November, 2015; originally announced November 2015.

Comments: In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

arXiv:1507.06535 [pdf, other]

Manitest: Are classifiers really invariant?

Authors: Alhussein Fawzi, Pascal Frossard

Abstract: Invariance to geometric transformations is a highly desirable property of automatic classifiers in many image recognition tasks. Nevertheless, it is unclear to which extent state-of-the-art classifiers are invariant to basic transformations such as rotations and translations. This is mainly due to the lack of general methods that properly measure such an invariance. In this paper, we propose a rig… ▽ More Invariance to geometric transformations is a highly desirable property of automatic classifiers in many image recognition tasks. Nevertheless, it is unclear to which extent state-of-the-art classifiers are invariant to basic transformations such as rotations and translations. This is mainly due to the lack of general methods that properly measure such an invariance. In this paper, we propose a rigorous and systematic approach for quantifying the invariance to geometric transformations of any classifier. Our key idea is to cast the problem of assessing a classifier's invariance as the computation of geodesics along the manifold of transformed images. We propose the Manitest method, built on the efficient Fast Marching algorithm to compute the invariance of classifiers. Our new method quantifies in particular the importance of data augmentation for learning invariance from data, and the increased invariance of convolutional neural networks with depth. We foresee that the proposed generic tool for measuring invariance to a large class of geometric transformations and arbitrary classifiers will have many applications for evaluating and comparing classifiers based on their invariance, and help improving the invariance of existing classifiers. △ Less

Submitted 23 July, 2015; originally announced July 2015.

Comments: BMVC 2015

arXiv:1505.04966 [pdf, other]

Multi-task additive models with shared transfer functions based on dictionary learning

Authors: Alhussein Fawzi, Mathieu Sinn, Pascal Frossard

Abstract: Additive models form a widely popular class of regression models which represent the relation between covariates and response variables as the sum of low-dimensional transfer functions. Besides flexibility and accuracy, a key benefit of these models is their interpretability: the transfer functions provide visual means for inspecting the models and identifying domain-specific relations between inp… ▽ More Additive models form a widely popular class of regression models which represent the relation between covariates and response variables as the sum of low-dimensional transfer functions. Besides flexibility and accuracy, a key benefit of these models is their interpretability: the transfer functions provide visual means for inspecting the models and identifying domain-specific relations between inputs and outputs. However, in large-scale problems involving the prediction of many related tasks, learning independently additive models results in a loss of model interpretability, and can cause overfitting when training data is scarce. We introduce a novel multi-task learning approach which provides a corpus of accurate and interpretable additive models for a large number of related forecasting tasks. Our key idea is to share transfer functions across models in order to reduce the model complexity and ease the exploration of the corpus. We establish a connection with sparse dictionary learning and propose a new efficient fitting algorithm which alternates between sparse coding and transfer function updates. The former step is solved via an extension of Orthogonal Matching Pursuit, whose properties are analyzed using a novel recovery condition which extends existing results in the literature. The latter step is addressed using a traditional dictionary update rule. Experiments on real-world data demonstrate that our approach compares favorably to baseline methods while yielding an interpretable corpus of models, revealing structure among the individual tasks and being more robust when training data is scarce. Our framework therefore extends the well-known benefits of additive models to common regression settings possibly involving thousands of tasks. △ Less

Submitted 19 May, 2015; originally announced May 2015.

arXiv:1502.02590 [pdf, other]

Analysis of classifiers' robustness to adversarial perturbations

Authors: Alhussein Fawzi, Omar Fawzi, Pascal Frossard

Abstract: The goal of this paper is to analyze an intriguing phenomenon recently discovered in deep networks, namely their instability to adversarial perturbations (Szegedy et. al., 2014). We provide a theoretical framework for analyzing the robustness of classifiers to adversarial perturbations, and show fundamental upper bounds on the robustness of classifiers. Specifically, we establish a general upper b… ▽ More The goal of this paper is to analyze an intriguing phenomenon recently discovered in deep networks, namely their instability to adversarial perturbations (Szegedy et. al., 2014). We provide a theoretical framework for analyzing the robustness of classifiers to adversarial perturbations, and show fundamental upper bounds on the robustness of classifiers. Specifically, we establish a general upper bound on the robustness of classifiers to adversarial perturbations, and then illustrate the obtained upper bound on the families of linear and quadratic classifiers. In both cases, our upper bound depends on a distinguishability measure that captures the notion of difficulty of the classification task. Our results for both classes imply that in tasks involving small distinguishability, no classifier in the considered set will be robust to adversarial perturbations, even if a good accuracy is achieved. Our theoretical framework moreover suggests that the phenomenon of adversarial instability is due to the low flexibility of classifiers, compared to the difficulty of the classification task (captured by the distinguishability). Moreover, we show the existence of a clear distinction between the robustness of a classifier to random noise and its robustness to adversarial perturbations. Specifically, the former is shown to be larger than the latter by a factor that is proportional to \sqrt{d} (with d being the signal dimension) for linear classifiers. This result gives a theoretical explanation for the discrepancy between the two robustness properties in high dimensional problems, which was empirically observed in the context of neural networks. To the best of our knowledge, our results provide the first theoretical work that addresses the phenomenon of adversarial instability recently observed for deep networks. Our analysis is complemented by experimental results on controlled and real-world data. △ Less

Submitted 28 March, 2016; v1 submitted 9 February, 2015; originally announced February 2015.

arXiv:1410.0719 [pdf, other]

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

Authors: L. Jacques, C. De Vleeschouwer, Y. Boursier, P. Sudhakar, C. De Mol, A. Pizurica, S. Anthoine, P. Vandergheynst, P. Frossard, C. Bilen, S. Kitic, N. Bertin, R. Gribonval, N. Boumal, B. Mishra, P. -A. Absil, R. Sepulchre, S. Bundervoet, C. Schretter, A. Dooms, P. Schelkens, O. Chabiron, F. Malgouyres, J. -Y. Tourneret, N. Dobigeon , et al. (42 additional authors not shown)

Abstract: The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in… ▽ More The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference. △ Less

Submitted 9 October, 2014; v1 submitted 2 October, 2014; originally announced October 2014.

Comments: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist14

arXiv:1402.1973 [pdf, other]

Dictionary learning for fast classification based on soft-thresholding

Authors: Alhussein Fawzi, Mike Davies, Pascal Frossard

Abstract: Classifiers based on sparse representations have recently been shown to provide excellent results in many visual recognition and classification tasks. However, the high cost of computing sparse representations at test time is a major obstacle that limits the applicability of these methods in large-scale problems, or in scenarios where computational power is restricted. We consider in this paper a… ▽ More Classifiers based on sparse representations have recently been shown to provide excellent results in many visual recognition and classification tasks. However, the high cost of computing sparse representations at test time is a major obstacle that limits the applicability of these methods in large-scale problems, or in scenarios where computational power is restricted. We consider in this paper a simple yet efficient alternative to sparse coding for feature extraction. We study a classification scheme that applies the soft-thresholding nonlinear map** in a dictionary, followed by a linear classifier. A novel supervised dictionary learning algorithm tailored for this low complexity classification architecture is proposed. The dictionary learning problem, which jointly learns the dictionary and linear classifier, is cast as a difference of convex (DC) program and solved efficiently with an iterative DC solver. We conduct experiments on several datasets, and show that our learning algorithm that leverages the structure of the classification problem outperforms generic learning procedures. Our simple classifier based on soft-thresholding also competes with the recent sparse coding classifiers, when the dictionary is learned appropriately. The adopted classification scheme further requires less computational time at the testing stage, compared to other classifiers. The proposed scheme shows the potential of the adequately trained soft-thresholding map** for classification and paves the way towards the development of very efficient classification methods for vision problems. △ Less

Submitted 2 October, 2014; v1 submitted 9 February, 2014; originally announced February 2014.

arXiv:1301.6646 [pdf, ps, other]

doi 10.1137/130907872

Image registration with sparse approximations in parametric dictionaries

Authors: Alhussein Fawzi, Pascal Frossard

Abstract: We examine in this paper the problem of image registration from the new perspective where images are given by sparse approximations in parametric dictionaries of geometric functions. We propose a registration algorithm that looks for an estimate of the global transformation between sparse images by examining the set of relative geometrical transformations between the respective features. We propos… ▽ More We examine in this paper the problem of image registration from the new perspective where images are given by sparse approximations in parametric dictionaries of geometric functions. We propose a registration algorithm that looks for an estimate of the global transformation between sparse images by examining the set of relative geometrical transformations between the respective features. We propose a theoretical analysis of our registration algorithm and we derive performance guarantees based on two novel important properties of redundant dictionaries, namely the robust linear independence and the transformation inconsistency. We propose several illustrations and insights about the importance of these dictionary properties and show that common properties such as coherence or restricted isometry property fail to provide sufficient information in registration problems. We finally show with illustrative experiments on simple visual objects and handwritten digits images that our algorithm outperforms baseline competitor methods in terms of transformation-invariant distance computation and classification. △ Less

Submitted 4 July, 2013; v1 submitted 28 January, 2013; originally announced January 2013.

Journal ref: SIAM Journal on Imaging Sciences 2013 6:4, 2370-2403

arXiv:1109.5938 [pdf, other]

Thresholding-based reconstruction of compressed correlated signals

Authors: Alhussein Fawzi, Tamara Tosic, Pascal Frossard

Abstract: We consider the problem of recovering a set of correlated signals (e.g., images from different viewpoints) from a few linear measurements per signal. We assume that each sensor in a network acquires a compressed signal in the form of linear measurements and sends it to a joint decoder for reconstruction. We propose a novel joint reconstruction algorithm that exploits correlation among underlying s… ▽ More We consider the problem of recovering a set of correlated signals (e.g., images from different viewpoints) from a few linear measurements per signal. We assume that each sensor in a network acquires a compressed signal in the form of linear measurements and sends it to a joint decoder for reconstruction. We propose a novel joint reconstruction algorithm that exploits correlation among underlying signals. Our correlation model considers geometrical transformations between the supports of the different signals. The proposed joint decoder estimates the correlation and reconstructs the signals using a simple thresholding algorithm. We give both theoretical and experimental evidence to show that our method largely outperforms independent decoding in terms of support recovery and reconstruction quality. △ Less

Submitted 2 April, 2012; v1 submitted 27 September, 2011; originally announced September 2011.

Comments: 11 pages, 3 figures

Showing 1–24 of 24 results for author: Fawzi, A