Search | arXiv e-print repository

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Authors: Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

Abstract: One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior… ▽ More One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting. △ Less

Submitted 18 March, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.00620 [pdf, other]

OTW: Optimal Transport War** for Time Series

Authors: Fabian Latorre, Chenghao Liu, Doyen Sahoo, Steven C. H. Hoi

Abstract: Dynamic Time War** (DTW) has become the pragmatic choice for measuring distance between time series. However, it suffers from unavoidable quadratic time complexity when the optimal alignment matrix needs to be computed exactly. This hinders its use in deep learning architectures, where layers involving DTW computations cause severe bottlenecks. To alleviate these issues, we introduce a new metri… ▽ More Dynamic Time War** (DTW) has become the pragmatic choice for measuring distance between time series. However, it suffers from unavoidable quadratic time complexity when the optimal alignment matrix needs to be computed exactly. This hinders its use in deep learning architectures, where layers involving DTW computations cause severe bottlenecks. To alleviate these issues, we introduce a new metric for time series data based on the Optimal Transport (OT) framework, called Optimal Transport War** (OTW). OTW enjoys linear time/space complexity, is differentiable and can be parallelized. OTW enjoys a moderate sensitivity to time and shape distortions, making it ideal for time series. We show the efficacy and efficiency of OTW on 1-Nearest Neighbor Classification and Hierarchical Clustering, as well as in the case of using OTW instead of DTW in Deep Learning architectures. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: This is an extended version of an ICASSP 2023 accepted paper https://ieeexplore.ieee.org/document/10095915

arXiv:2204.00477 [pdf]

doi 10.1016/j.icarus.2023.115434

Autonomous crater detection on asteroids using a fully-convolutional neural network

Authors: Francesco Latorre, Dario Spiller, Fabio Curti

Abstract: This paper shows the application of autonomous Crater Detection using the U-Net, a Fully-Convolutional Neural Network, on Ceres. The U-Net is trained on optical images of the Moon Global Morphology Mosaic based on data collected by the LRO and manual crater catalogues. The Moon-trained network will be tested on Dawn optical images of Ceres: this task is accomplished by means of a Transfer Learning… ▽ More This paper shows the application of autonomous Crater Detection using the U-Net, a Fully-Convolutional Neural Network, on Ceres. The U-Net is trained on optical images of the Moon Global Morphology Mosaic based on data collected by the LRO and manual crater catalogues. The Moon-trained network will be tested on Dawn optical images of Ceres: this task is accomplished by means of a Transfer Learning (TL) approach. The trained model has been fine-tuned using 100, 500 and 1000 additional images of Ceres. The test performance was measured on 350 never before seen images, reaching a testing accuracy of 96.24%, 96.95% and 97.19%, respectively. This means that despite the intrinsic differences between the Moon and Ceres, TL works with encouraging results. The output of the U-Net contains predicted craters: it will be post-processed applying global thresholding for image binarization and a template matching algorithm to extract craters positions and radii in the pixel space. Post-processed craters will be counted and compared to the ground truth data in order to compute image segmentation metrics: precision, recall and F1 score. These indices will be computed, and their effect will be discussed for tasks such as automated crater cataloguing and optical navigation. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2202.05068 [pdf, other]

Controlling the Complexity and Lipschitz Constant improves polynomial nets

Authors: Zhenyu Zhu, Fabian Latorre, Grigorios G Chrysos, Volkan Cevher

Abstract: While the class of Polynomial Nets demonstrates comparable performance to neural networks (NN), it currently has neither theoretical generalization characterization nor robustness guarantees. To this end, we derive new complexity bounds for the set of Coupled CP-Decomposition (CCP) and Nested Coupled CP-decomposition (NCP) models of Polynomial Nets in terms of the $\ell_\infty$-operator-norm and t… ▽ More While the class of Polynomial Nets demonstrates comparable performance to neural networks (NN), it currently has neither theoretical generalization characterization nor robustness guarantees. To this end, we derive new complexity bounds for the set of Coupled CP-Decomposition (CCP) and Nested Coupled CP-decomposition (NCP) models of Polynomial Nets in terms of the $\ell_\infty$-operator-norm and the $\ell_2$-operator norm. In addition, we derive bounds on the Lipschitz constant for both models to establish a theoretical certificate for their robustness. The theoretical results enable us to propose a principled regularization scheme that we also evaluate experimentally in six datasets and show that it improves the accuracy as well as the robustness of the models to adversarial perturbations. We showcase how this regularization can be combined with adversarial training, resulting in further improvements. △ Less

Submitted 10 February, 2022; originally announced February 2022.

arXiv:2011.01748 [pdf, other]

doi 10.1109/ICASSP39728.2021.9414879

Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Authors: Zhaodong Sun, Thomas Sanchez, Fabian Latorre, Volkan Cevher

Abstract: We mainly analyze and solve the overfitting problem of deep image prior (DIP). Deep image prior can solve inverse problems such as super-resolution, inpainting and denoising. The main advantage of DIP over other deep learning approaches is that it does not need access to a large dataset. However, due to the large number of parameters of the neural network and noisy data, DIP overfits to the noise… ▽ More We mainly analyze and solve the overfitting problem of deep image prior (DIP). Deep image prior can solve inverse problems such as super-resolution, inpainting and denoising. The main advantage of DIP over other deep learning approaches is that it does not need access to a large dataset. However, due to the large number of parameters of the neural network and noisy data, DIP overfits to the noise in the image as the number of iterations grows. In the thesis, we use hybrid deep image priors to avoid overfitting. The hybrid priors are to combine DIP with an explicit prior such as total variation or with an implicit prior such as a denoising algorithm. We use the alternating direction method-of-multipliers (ADMM) to incorporate the new prior and try different forms of ADMM to avoid extra computation caused by the inner loop of ADMM steps. We also study the relation between the dynamics of gradient descent, and the overfitting phenomenon. The numerical results show the hybrid priors play an important role in preventing overfitting. Besides, we try to fit the image along some directions and find this method can reduce overfitting when the noise level is large. When the noise level is small, it does not considerably reduce the overfitting problem. △ Less

Submitted 17 February, 2023; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Part of the work has been published on ICASSP 2021 with the paper title "a plug-and-play deep image prior"

arXiv:2007.01003 [pdf, other]

Efficient Proximal Map** of the 1-path-norm of Shallow Networks

Authors: Fabian Latorre, Paul Rolland, Nadav Hallak, Volkan Cevher

Abstract: We demonstrate two new important properties of the 1-path-norm of shallow neural networks. First, despite its non-smoothness and non-convexity it allows a closed form proximal operator which can be efficiently computed, allowing the use of stochastic proximal-gradient-type methods for regularized empirical risk minimization. Second, when the activation functions is differentiable, it provides an u… ▽ More We demonstrate two new important properties of the 1-path-norm of shallow neural networks. First, despite its non-smoothness and non-convexity it allows a closed form proximal operator which can be efficiently computed, allowing the use of stochastic proximal-gradient-type methods for regularized empirical risk minimization. Second, when the activation functions is differentiable, it provides an upper bound on the Lipschitz constant of the network. Such bound is tighter than the trivial layer-wise product of Lipschitz constants, motivating its use for training networks robust to adversarial perturbations. In practical experiments we illustrate the advantages of using the proximal map** and we compare the robustness-accuracy trade-off induced by the 1-path-norm, L1-norm and layer-wise constraints on the Lipschitz constant (Parseval networks). △ Less

Submitted 15 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

Comments: ICML 2020. Fabian Latorre, Paul Rolland and Nadav Hallak have contributed equally

arXiv:2004.08688 [pdf, other]

Lipschitz constant estimation of Neural Networks via sparse polynomial optimization

Authors: Fabian Latorre, Paul Rolland, Volkan Cevher

Abstract: We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bounds on the Lipschitz constant of neural networks. The underlying optimization problems boil down to either linear (LP) or semidefinite (SDP) programming. We show how to use the sparse connectivity of a network, to significantly reduce the complexity of computation. This is specially useful for conv… ▽ More We introduce LiPopt, a polynomial optimization framework for computing increasingly tighter upper bounds on the Lipschitz constant of neural networks. The underlying optimization problems boil down to either linear (LP) or semidefinite (SDP) programming. We show how to use the sparse connectivity of a network, to significantly reduce the complexity of computation. This is specially useful for convolutional as well as pruned neural networks. We conduct experiments on networks with random weights as well as networks trained on MNIST, showing that in the particular case of the $\ell_\infty$-Lipschitz constant, our approach yields superior estimates, compared to baselines available in the literature. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: Published as a conference paper in ICLR2020, originally submitted in September 25 2019 and available at https://openreview.net/forum?id=rJe4_xSFDB

arXiv:1907.03343 [pdf, other]

Fast and Provable ADMM for Learning with Generative Priors

Authors: Fabian Latorre Gómez, Armin Eftekhari, Volkan Cevher

Abstract: In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (G… ▽ More In this work, we propose a (linearized) Alternating Direction Method-of-Multipliers (ADMM) algorithm for minimizing a convex function subject to a nonconvex constraint. We focus on the special case where such constraint arises from the specification that a variable should lie in the range of a neural network. This is motivated by recent successful applications of Generative Adversarial Networks (GANs) in tasks like compressive sensing, denoising and robustness against adversarial examples. The derived rates for our algorithm are characterized in terms of certain geometric properties of the generator network, which we show hold for feedforward architectures, under mild assumptions. Unlike gradient descent (GD), it can efficiently handle non-smooth objectives as well as exploit efficient partial minimization procedures, thus being faster in many practical scenarios. △ Less

Submitted 7 July, 2019; originally announced July 2019.

arXiv:1212.1406 [pdf, other]

The Maxflow problem and a generalization to simplicial complexes

Authors: Fabian Latorre

Abstract: The problem of Maxflow is a widely developed subject in modern mathematics. Efficient algorithms exist to solve this problem, that is why a good generalization may permit these algorithms to be understood as a particular instance of solutions in a wider class of problems. In the last section we suggest a generalization in the context of simplicial complexes, that reduces to the problem of Maxflow… ▽ More The problem of Maxflow is a widely developed subject in modern mathematics. Efficient algorithms exist to solve this problem, that is why a good generalization may permit these algorithms to be understood as a particular instance of solutions in a wider class of problems. In the last section we suggest a generalization in the context of simplicial complexes, that reduces to the problem of Maxflow in graphs, when we consider a graph as a simplicial complex of dimension 1. △ Less

Submitted 5 December, 2012; originally announced December 2012.

Showing 1–9 of 9 results for author: Latorre, F