Search | arXiv e-print repository

Sparsest Univariate Learning Models Under Lipschitz Constraint

Authors: Shayan Aziznejad, Thomas Debarre, Michael Unser

Abstract: Beside the minimization of the prediction error, two of the most desirable properties of a regression scheme are stability and interpretability. Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the lear… ▽ More Beside the minimization of the prediction error, two of the most desirable properties of a regression scheme are stability and interpretability. Driven by these principles, we propose continuous-domain formulations for one-dimensional regression problems. In our first approach, we use the Lipschitz constant as a regularizer, which results in an implicit tuning of the overall robustness of the learned map**. In our second approach, we control the Lipschitz constant explicitly using a user-defined upper-bound and make use of a sparsity-promoting regularizer to favor simpler (and, hence, more interpretable) solutions. The theoretical study of the latter formulation is motivated in part by its equivalence, which we prove, with the training of a Lipschitz-constrained two-layer univariate neural network with rectified linear unit (ReLU) activations and weight decay. By proving representer theorems, we show that both problems admit global minimizers that are continuous and piecewise-linear (CPWL) functions. Moreover, we propose efficient algorithms that find the sparsest solution of each problem: the CPWL map** with the least number of linear regions. Finally, we illustrate numerically the outcome of our formulations. △ Less

Submitted 27 December, 2021; originally announced December 2021.

arXiv:2112.06209 [pdf, other]

Measuring Complexity of Learning Schemes Using Hessian-Schatten Total Variation

Authors: Shayan Aziznejad, Joaquim Campos, Michael Unser

Abstract: In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is inva… ▽ More In this paper, we introduce the Hessian-Schatten total variation (HTV) -- a novel seminorm that quantifies the total "rugosity" of multivariate functions. Our motivation for defining HTV is to assess the complexity of supervised-learning schemes. We start by specifying the adequate matrix-valued Banach spaces that are equipped with suitable classes of mixed norms. We then show that the HTV is invariant to rotations, scalings, and translations. Additionally, its minimum value is achieved for linear map**s, which supports the common intuition that linear regression is the least complex learning model. Next, we present closed-form expressions of the HTV for two general classes of functions. The first one is the class of Sobolev functions with a certain degree of regularity, for which we show that the HTV coincides with the Hessian-Schatten seminorm that is sometimes used as a regularizer for image reconstruction. The second one is the class of continuous and piecewise-linear (CPWL) functions. In this case, we show that the HTV reflects the total change in slopes between linear regions that have a common facet. Hence, it can be viewed as a convex relaxation (l1-type) of the number of linear regions (l0-type) of CPWL map**s. Finally, we illustrate the use of our proposed seminorm. △ Less

Submitted 31 January, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

arXiv:2003.11646 [pdf, other]

Wavelet Compressibility of Compound Poisson Processes

Authors: Shayan Aziznejad, Julien Fageot

Abstract: In this paper, we precisely quantify the wavelet compressibility of compound Poisson processes. To that end, we expand the given random process over the Haar wavelet basis and we analyse its asymptotic approximation properties. By only considering the nonzero wavelet coefficients up to a given scale, what we call the greedy approximation, we exploit the extreme sparsity of the wavelet expansion th… ▽ More In this paper, we precisely quantify the wavelet compressibility of compound Poisson processes. To that end, we expand the given random process over the Haar wavelet basis and we analyse its asymptotic approximation properties. By only considering the nonzero wavelet coefficients up to a given scale, what we call the greedy approximation, we exploit the extreme sparsity of the wavelet expansion that derives from the piecewise-constant nature of compound Poisson processes. More precisely, we provide lower and upper bounds for the mean squared error of greedy approximation of compound Poisson processes. We are then able to deduce that the greedy approximation error has a sub-exponential and super-polynomial asymptotic behavior. Finally, we provide numerical experiments to highlight the remarkable ability of wavelet-based dictionaries in achieving highly compressible approximations of compound Poisson processes. △ Less

Submitted 17 December, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

arXiv:2001.06263 [pdf, other]

doi 10.1109/TSP.2020.3014611

Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant

Authors: Shayan Aziznejad, Harshit Gupta, Joaquim Campos, Michael Unser

Abstract: We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational pr… ▽ More We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the actual Lipschitz constant of the input-output relation. To that end, we first establish a global bound for the Lipschitz constant of neural networks. Based on the obtained bound, we then formulate a variational problem for learning activation functions. Our variational problem is infinite-dimensional and is not computationally tractable. However, we prove that there always exists a solution that has continuous and piecewise-linear (linear-spline) activations. This reduces the original problem to a finite-dimensional minimization where an l1 penalty on the parameters of the activations favors the learning of sparse nonlinearities. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU and we empirically demonstrate the practical aspects of our framework. △ Less

Submitted 7 August, 2020; v1 submitted 17 January, 2020; originally announced January 2020.

arXiv:1811.00836 [pdf, other]

Multi-Kernel Regression with Sparsity Constraint

Authors: Shayan Aziznejad, Michael Unser

Abstract: In this paper, we provide a Banach-space formulation of supervised learning with generalized total-variation (gTV) regularization. We identify the class of kernel functions that are admissible in this framework. Then, we propose a variation of supervised learning in a continuous-domain hybrid search space with gTV regularization. We show that the solution admits a multi-kernel expansion with adapt… ▽ More In this paper, we provide a Banach-space formulation of supervised learning with generalized total-variation (gTV) regularization. We identify the class of kernel functions that are admissible in this framework. Then, we propose a variation of supervised learning in a continuous-domain hybrid search space with gTV regularization. We show that the solution admits a multi-kernel expansion with adaptive positions. In this representation, the number of active kernels is upper-bounded by the number of data points while the gTV regularization imposes an $\ell_1$ penalty on the kernel coefficients. Finally, we illustrate numerically the outcome of our theory. △ Less

Submitted 17 December, 2020; v1 submitted 2 November, 2018; originally announced November 2018.

Showing 1–5 of 5 results for author: Aziznejad, S