Search | arXiv e-print repository

Phase Coexistence and Transitions between Antiferromagnetic and Ferromagnetic States in a Synthetic Antiferromagnet

Authors: Christopher E. A. Barker, Kayla Fallon, Craig Barton, Eloi Haltz, Trevor P. Almeida, Sara Villa, Colin Kirkbride, Francesco Maccherozzi, Brice Sarpi, Sarnjeet S. Dhesi, Damien McGrouther, Stephen McVitie, Thomas A. Moore, Olga Kazakova, Christopher H. Marrows

Abstract: In synthetic antiferromagnets (SAFs) the combination of antiferromagnetic order and synthesis using conventional sputtering techniques is combined to produce systems that are advantageous for spintronics applications. Here we present the preparation and study of SAF multilayers possessing both perpendicular magnetic anisotropy and the Dzyaloshinskii-Moriya interaction. The multilayers have an anti… ▽ More In synthetic antiferromagnets (SAFs) the combination of antiferromagnetic order and synthesis using conventional sputtering techniques is combined to produce systems that are advantageous for spintronics applications. Here we present the preparation and study of SAF multilayers possessing both perpendicular magnetic anisotropy and the Dzyaloshinskii-Moriya interaction. The multilayers have an antiferromagnetically (AF) aligned ground state but can be forced into a full ferromagnetic (FM) alignment by applying an out-of-plane field $\sim 100$~mT. We study the spin textures in these multilayers in their ground state as well as around the transition point between the AF and FM states, at fields $\sim 40$~mT, by imaging the spin textures using complementary methods: photo-emission electron, magnetic force, and Lorentz transmission electron microscopies. The transformation into a FM state by field proceeds by a nucleation and growth process, where first skyrmionic nuclei form, which broaden into regions containing a FM-aligned labyrinth pattern that eventually occupies the whole film. This process remarkably occurs without any significant change in the net magnetic moment of the multilayer. The mix of AF- and FM-aligned regions on the micron scale in the middle of this transition is reminiscent of a first-order phase transition that exhibits phase coexistence. These results are important for guiding the design of spintronic devices using chiral magnetic textures made from SAFs. △ Less

Submitted 2 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2311.15845 [pdf, other]

On Learning the Optimal Regularization Parameter in Inverse Problems

Authors: Jonathan Chirinos Rodriguez, Ernesto De Vito, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

Abstract: Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require less a priori knowledge, but their theoretical analysis is limited. In this paper, we propose and study a statistical machine learning approach, based on empiri… ▽ More Selecting the best regularization parameter in inverse problems is a classical and yet challenging problem. Recently, data-driven approaches have become popular to tackle this challenge. These approaches are appealing since they do require less a priori knowledge, but their theoretical analysis is limited. In this paper, we propose and study a statistical machine learning approach, based on empirical risk minimization. Our main contribution is a theoretical analysis, showing that, provided with enough data, this approach can reach sharp rates while being essentially adaptive to the noise and smoothness of the problem. Numerical simulations corroborate and illustrate the theoretical findings. Our results are a step towards grounding theoretically data-driven approaches to inverse problems. △ Less

Submitted 21 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

MSC Class: 65J20; 47N10; 65K10; 62G05

arXiv:2309.16606 [pdf, other]

"AI enhances our performance, I have no doubt this one will do the same": The Placebo effect is robust to negative descriptions of AI

Authors: Agnes M. Kloft, Robin Welsch, Thomas Kosch, Steeven Villa

Abstract: Heightened AI expectations facilitate performance in human-AI interactions through placebo effects. While lowering expectations to control for placebo effects is advisable, overly negative expectations could induce nocebo effects. In a letter discrimination task, we informed participants that an AI would either increase or decrease their performance by adapting the interface, but in reality, no AI… ▽ More Heightened AI expectations facilitate performance in human-AI interactions through placebo effects. While lowering expectations to control for placebo effects is advisable, overly negative expectations could induce nocebo effects. In a letter discrimination task, we informed participants that an AI would either increase or decrease their performance by adapting the interface, but in reality, no AI was present in any condition. A Bayesian analysis showed that participants had high expectations and performed descriptively better irrespective of the AI description when a sham-AI was present. Using cognitive modeling, we could trace this advantage back to participants gathering more information. A replication study verified that negative AI descriptions do not alter expectations, suggesting that performance expectations with AI are biased and robust to negative verbal descriptions. We discuss the impact of user expectations on AI interactions and evaluation and provide a behavioral placebo marker for human-AI interaction △ Less

Submitted 23 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2308.09310 [pdf, other]

Variance reduction techniques for stochastic proximal point algorithms

Authors: Cheik Traoré, Vassilis Apidopoulos, Saverio Salzo, Silvia Villa

Abstract: In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice… ▽ More In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but their variance reduced versions are not as studied as the gradient ones. In this work, we propose the first unified study of variance reduction techniques for stochastic proximal point algorithms. We introduce a generic stochastic proximal algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-Łojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the stepsize for difficult problems. △ Less

Submitted 30 May, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2305.16024 [pdf, other]

An Optimal Structured Zeroth-order Algorithm for Non-smooth Optimization

Authors: Marco Rando, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

Abstract: Finite-difference methods are a class of algorithms designed to solve black-box optimization problems by approximating a gradient of the target function on a set of directions. In black-box optimization, the non-smooth setting is particularly relevant since, in practice, differentiability and smoothness assumptions cannot be verified. To cope with nonsmoothness, several authors use a smooth approx… ▽ More Finite-difference methods are a class of algorithms designed to solve black-box optimization problems by approximating a gradient of the target function on a set of directions. In black-box optimization, the non-smooth setting is particularly relevant since, in practice, differentiability and smoothness assumptions cannot be verified. To cope with nonsmoothness, several authors use a smooth approximation of the target function and show that finite difference methods approximate its gradient. Recently, it has been proved that imposing a structure in the directions allows improving performance. However, only the smooth setting was considered. To close this gap, we introduce and analyze O-ZD, the first structured finite-difference algorithm for non-smooth black-box optimization. Our method exploits a smooth approximation of the target function and we prove that it approximates its gradient on a subset of random {\em orthogonal} directions. We analyze the convergence of O-ZD under different assumptions. For non-smooth convex functions, we obtain the optimal complexity. In the non-smooth non-convex setting, we characterize the number of iterations needed to bound the expected norm of the smoothed gradient. For smooth functions, our analysis recovers existing results for structured zeroth-order methods for the convex case and extends them to the non-convex setting. We conclude with numerical simulations where assumptions are satisfied, observing that our algorithm has very good practical performances. △ Less

Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 37 pages, 6 figures

MSC Class: 90C56 (Primary) 49J52; 90C25; 90C26; 90C30 (Secondary) ACM Class: G.1.6

arXiv:2305.06682 [pdf, ps, other]

Regularization properties of dual subgradient flow

Authors: Vassilis Apidopoulos, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

Abstract: Dual gradient descent combined with early stop** represents an efficient alternative to the Tikhonov variational approach when the regularizer is strongly convex. However, for many relevant applications, it is crucial to deal with regularizers which are only convex. In this setting, the dual problem is non smooth, and dual gradient descent cannot be used. In this paper, we study the regularizati… ▽ More Dual gradient descent combined with early stop** represents an efficient alternative to the Tikhonov variational approach when the regularizer is strongly convex. However, for many relevant applications, it is crucial to deal with regularizers which are only convex. In this setting, the dual problem is non smooth, and dual gradient descent cannot be used. In this paper, we study the regularization properties of a subgradient dual flow, and we show that the proposed procedure achieves the same recovery accuracy as penalization methods, while being more efficient from the computational perspective. △ Less

Submitted 11 May, 2023; originally announced May 2023.

MSC Class: 49M29; 47N10; 34G25; 90C25

arXiv:2305.03559 [pdf, other]

On the convergence of proximal gradient methods for convex simple bilevel optimization

Authors: Puya Latafat, Andreas Themelis, Silvia Villa, Panagiotis Patrinos

Abstract: This paper studies proximal gradient iterations for solving simple bilevel optimization problems where both the upper and the lower level cost functions are split as the sum of differentiable and (possibly nonsmooth) proximable functions. We develop a novel convergence recipe for iteration varying stepsizes that relies on Barzilai-Borwein type local estimates for the differentiable terms. Leveragi… ▽ More This paper studies proximal gradient iterations for solving simple bilevel optimization problems where both the upper and the lower level cost functions are split as the sum of differentiable and (possibly nonsmooth) proximable functions. We develop a novel convergence recipe for iteration varying stepsizes that relies on Barzilai-Borwein type local estimates for the differentiable terms. Leveraging the convergence recipe, under global Lipschitz gradient continuity, we establish convergence for a nonadaptive stepsize sequence, without requiring any strong convexity or linesearch. In the locally Lipschitz differentiable setting, we develop an adaptive linesearch method that introduces a systematic adaptive scheme enabling large and nonmonotonic stepsize sequences while being insensitive to the choice of hyperparameters and initialization. Numerical simulations are provided showcasing favorable convergence speed of our methods. △ Less

Submitted 2 March, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

MSC Class: 65K05; 90C06; 90C25; 90C30

arXiv:2304.07983 [pdf, other]

Snacks: a fast large-scale kernel SVM solver

Authors: Sofiane Tanji, Andrea Della Vecchia, François Glineur, Silvia Villa

Abstract: Kernel methods provide a powerful framework for non parametric learning. They are based on kernel functions and allow learning in a rich functional space while applying linear statistical learning tools, such as Ridge Regression or Support Vector Machines. However, standard kernel methods suffer from a quadratic time and memory complexity in the number of data points and thus have limited applicat… ▽ More Kernel methods provide a powerful framework for non parametric learning. They are based on kernel functions and allow learning in a rich functional space while applying linear statistical learning tools, such as Ridge Regression or Support Vector Machines. However, standard kernel methods suffer from a quadratic time and memory complexity in the number of data points and thus have limited applications in large-scale learning. In this paper, we propose Snacks, a new large-scale solver for Kernel Support Vector Machines. Specifically, Snacks relies on a Nyström approximation of the kernel matrix and an accelerated variant of the stochastic subgradient method. We demonstrate formally through a detailed empirical evaluation, that it competes with other SVM solvers on a variety of benchmark datasets. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 6 pages

arXiv:2212.12675 [pdf, other]

Iterative regularization in classification via hinge loss diagonal descent

Authors: Vassilis Apidopoulos, Tomaso Poggio, Lorenzo Rosasco, Silvia Villa

Abstract: Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularizat… ▽ More Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularization in the context of classification. After contrasting this setting with that of regression and inverse problems, we develop an iterative regularization approach based on the use of the hinge loss function. More precisely we consider a diagonal approach for a family of algorithms for which we prove convergence as well as rates of convergence. Our approach compares favorably with other alternatives, as confirmed also in numerical simulations. △ Less

Submitted 24 December, 2022; originally announced December 2022.

arXiv:2207.01341 [pdf, other]

doi 10.1016/j.jcis.2022.09.099

Microparticle Brownian Motion near an Air-Water Interface Governed by Direction-Dependent Boundary Conditions

Authors: Stefano Villa, Christophe Blanc, Abdallah Daddi-Moussa-Ider, Antonio Stocco, Maurizio Nobili

Abstract: Although the dynamics of colloids in the vicinity of a solid interface has been widely characterized in the past, experimental studies of Brownian diffusion close to an air-water interface are rare and limited to particle-interface gap distances larger than the particle size. At the still unexplored lower distances, the dynamics is expected to be extremely sensitive to boundary conditions at the a… ▽ More Although the dynamics of colloids in the vicinity of a solid interface has been widely characterized in the past, experimental studies of Brownian diffusion close to an air-water interface are rare and limited to particle-interface gap distances larger than the particle size. At the still unexplored lower distances, the dynamics is expected to be extremely sensitive to boundary conditions at the air-water interface. There, ad hoc experiments would provide a quantitative validation of predictions. Using a specially designed dual wave interferometric setup, the 3D dynamics of 9 micrometers diameter particles at a few hundreds of nanometers from an air-water interface is here measured in thermal equilibrium. Intriguingly, while the measured dynamics parallel to the interface approaches expected predictions for slip boundary conditions, the Brownian motion normal to the interface is very close to the predictions for no-slip boundary conditions. These puzzling results are rationalized considering current models of incompressible interfacial flow and deepened develo** an ad hoc model which considers the contribution of tiny concentrations of surface active particles at the interface. We argue that such condition governs the particle dynamics in a large spectrum of systems ranging from biofilm formation to flotation process. △ Less

Submitted 22 September, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: 33 pages, 5 figures. Submitted to Journal of Colloid and Interface Science

arXiv:2206.05124 [pdf, other]

Stochastic Zeroth order Descent with Structured Directions

Authors: Marco Rando, Cesare Molinari, Silvia Villa, Lorenzo Rosasco

Abstract: We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergen… ▽ More We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form $O(d/l k^{-c})$ for every $c<1/2$, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using $l$ multiple directions instead of one. For non-convex functions satisfying the Polyak-Łojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances. △ Less

Submitted 15 December, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

arXiv:2204.10131 [pdf, other]

Fast iterative regularization by reusing data

Authors: Cristian Vega, Cesare Molinari, Lorenzo Rosasco, Silvia Villa

Abstract: Discrete inverse problems correspond to solving a system of equations in a stable way with respect to noise in the data. A typical approach to enforce uniqueness and select a meaningful solution is to introduce a regularizer. While for most applications the regularizer is convex, in many cases it is not smooth nor strongly convex. In this paper, we propose and study two new iterative regularizatio… ▽ More Discrete inverse problems correspond to solving a system of equations in a stable way with respect to noise in the data. A typical approach to enforce uniqueness and select a meaningful solution is to introduce a regularizer. While for most applications the regularizer is convex, in many cases it is not smooth nor strongly convex. In this paper, we propose and study two new iterative regularization methods, based on a primal-dual algorithm, to solve inverse problems efficiently. Our analysis, in the noise free case, provides convergence rates for the Lagrangian and the feasibility gap. In the noisy case, it provides stability bounds and early-stop** rules with theoretical guarantees. The main novelty of our work is the exploitation of some a priori knowledge about the solution set, i.e. redundant information. More precisely we show that the linear systems can be used more than once along the iteration. Despite the simplicity of the idea, we show that this procedure brings surprising advantages in the numerical applications. We discuss various approaches to take advantage of redundant information, that are at the same time consistent with our assumptions and flexible in the implementation. Finally, we illustrate our theoretical findings with numerical simulations for robust sparse recovery and image reconstruction through total variation. We confirm the efficiency of the proposed procedures, comparing the results with state-of-the-art methods. △ Less

Submitted 21 April, 2022; originally announced April 2022.

MSC Class: 90C25; 65K10; 49M29

arXiv:2203.06226 [pdf, ps, other]

Non-invasive measurement of nuclear relative stiffness from quantitative analysis of microscopy data

Authors: Stefano Villa, Andrea Palamidessi, Emanuela Frittoli, Giorgio Scita, Roberto Cerbino, Fabio Giavazzi

Abstract: The connection between the properties of cell tissue and those of the single constituent cells remains to be elucidated. At the purely mechanical level, the degree of rigidity of different cellular components, such as the nucleus and the cytoplasm, modulates the interplay between the cell inner processes and the external environment, while simultaneously mediating the mechanical interactions betwe… ▽ More The connection between the properties of cell tissue and those of the single constituent cells remains to be elucidated. At the purely mechanical level, the degree of rigidity of different cellular components, such as the nucleus and the cytoplasm, modulates the interplay between the cell inner processes and the external environment, while simultaneously mediating the mechanical interactions between neighboring cells. Being able to quantify the correlation between single-cell and tissue properties would improve our mechanobiological understanding of cell tissues. Here we develop a methodology to quantitatively extract a set of structural and motility parameters from the analysis of time-lapse movies of nuclei belonging to jammed and flocking cell monolayers. We then study in detail the correlation between the dynamical state of the tissue and the deformation of the nuclei. We observe that the nuclear deformation rate linearly correlates with the local divergence of the velocity field, which leads to a non-invasive estimate of the elastic modulus of the nucleus relative to the one of the cytoplasm. We also find that nuclei belonging to flocking monolayers, subjected to larger mechanical perturbations, are about two times stiffer than nuclei belonging to dynamically arrested monolayers, in agreement with atomic force microscopy results. Our results demonstrate a non-invasive route to the determination of nuclear relative stiffness for cells in a monolayer. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2202.00420 [pdf, other]

Iterative regularization for low complexity regularizers

Authors: Cesare Molinari, Mathurin Massias, Lorenzo Rosasco, Silvia Villa

Abstract: Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this… ▽ More Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the $\ell_1$ penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach. △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2201.05498 [pdf, other]

Convergence of an Asynchronous Block-Coordinate Forward-Backward Algorithm for Convex Composite Optimization

Authors: Cheik Traoré, Saverio Salzo, Silvia Villa

Abstract: In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stochastic quasi-Fejér sequence and thus converge almo… ▽ More In this paper, we study the convergence properties of a randomized block-coordinate descent algorithm for the minimization of a composite convex objective function, where the block-coordinates are updated asynchronously and randomly according to an arbitrary probability distribution. We prove that the iterates generated by the algorithm form a stochastic quasi-Fejér sequence and thus converge almost surely to a minimizer of the objective function. Moreover, we prove a general sublinear rate of convergence in expectation for the function values and a linear rate of convergence in expectation under an error bound condition of Tseng type. △ Less

Submitted 12 April, 2023; v1 submitted 14 January, 2022; originally announced January 2022.

arXiv:2107.10123 [pdf, other]

Convergence rates for the Heavy-Ball continuous dynamics for non-convex optimization, under Polyak-Łojasiewicz condition

Authors: Vassilis Apidopoulos, Nicolò Ginatta, Silvia Villa

Abstract: We study convergence of the trajectories of the Heavy Ball dynamical system, with constant dam** coefficient, in the framework of convex and non-convex smooth optimization. By using the Polyak-Łojasiewicz condition, we derive new linear convergence rates for the associated trajectory, in terms of objective function values, without assuming uniqueness of the minimizer. We study convergence of the trajectories of the Heavy Ball dynamical system, with constant dam** coefficient, in the framework of convex and non-convex smooth optimization. By using the Polyak-Łojasiewicz condition, we derive new linear convergence rates for the associated trajectory, in terms of objective function values, without assuming uniqueness of the minimizer. △ Less

Submitted 26 January, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2107.03941 [pdf, other]

Zeroth order optimization with orthogonal random directions

Authors: David Kozak, Cesare Molinari, Lorenzo Rosasco, Luis Tenorio, Silvia Villa

Abstract: We propose and analyze a randomized zeroth-order approach based on approximating the exact gradient byfinite differences computed in a set of orthogonal random directions that changes with each iteration. A number ofpreviously proposed methods are recovered as special cases including spherical smoothing, coordinate descent, as wellas discretized gradient descent. Our main contribution is proving c… ▽ More We propose and analyze a randomized zeroth-order approach based on approximating the exact gradient byfinite differences computed in a set of orthogonal random directions that changes with each iteration. A number ofpreviously proposed methods are recovered as special cases including spherical smoothing, coordinate descent, as wellas discretized gradient descent. Our main contribution is proving convergence guarantees as well as convergence ratesunder different parameter choices and assumptions. In particular, we consider convex objectives, but also possiblynon-convex objectives satisfying the Polyak-Łojasiewicz (PL) condition. Theoretical results are complemented andillustrated by numerical experiments. △ Less

Submitted 15 November, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2106.08598 [pdf, other]

Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domains by Adaptive Discretization

Authors: Marco Rando, Luigi Carratino, Silvia Villa, Lorenzo Rosasco

Abstract: Gaussian process optimization is a successful class of algorithms(e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, for functions with continuous domains, Gaussian process optimization has to rely on either a fixed discretization of the space, or the solution of a non-convex optimization subproblem at each evaluation. The first approach can negatively affect pe… ▽ More Gaussian process optimization is a successful class of algorithms(e.g. GP-UCB) to optimize a black-box function through sequential evaluations. However, for functions with continuous domains, Gaussian process optimization has to rely on either a fixed discretization of the space, or the solution of a non-convex optimization subproblem at each evaluation. The first approach can negatively affect performance, while the second approach requires a heavy computational burden. A third option, only recently theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra non-convex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of $O(T^4)$, where $T$ is the number of iterations. In this paper, we introduce Ada-BKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in $O(T^2 d_\text{eff}^2)$, where $d_\text{eff}$ is the effective dimension of the explored space, and which is typically much smaller than $T$. We corroborate our theoretical findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization, confirming the good practical performances of the proposed approach. △ Less

Submitted 11 March, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

arXiv:2006.09859 [pdf, other]

Iterative regularization for convex regularizers

Authors: Cesare Molinari, Mathurin Massias, Lorenzo Rosasco, Silvia Villa

Abstract: We study iterative regularization for linear models, when the bias is convex but not necessarily strongly convex. We characterize the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is… ▽ More We study iterative regularization for linear models, when the bias is convex but not necessarily strongly convex. We characterize the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances can be achieved with considerable computational speed-ups. △ Less

Submitted 29 October, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:1912.12153 [pdf, ps, other]

doi 10.1137/19M1308888

Accelerated iterative regularization via dual diagonal descent

Authors: Luca Calatroni, Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Abstract: We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with general regularization and data-fit functions. In particular, we develop an inertial approach of which we analyze both convergence and stability. Using tools from inexact proximal calculus, we prove early stop** results with optimal convergence rates for additive data-… ▽ More We propose and analyze an accelerated iterative dual diagonal descent algorithm for the solution of linear inverse problems with general regularization and data-fit functions. In particular, we develop an inertial approach of which we analyze both convergence and stability. Using tools from inexact proximal calculus, we prove early stop** results with optimal convergence rates for additive data-fit terms as well as more general cases, such as the Kullback-Leibler divergence, for which different type of proximal point approximations hold. △ Less

Submitted 27 December, 2019; originally announced December 2019.

Journal ref: SIAM Journal on Optimization, 31(1), 754-784 (2021)

arXiv:1906.07392 [pdf, ps, other]

Parallel Random Block-Coordinate Forward-Backward Algorithm: A Unified Convergence Analysis

Authors: Saverio Salzo, Silvia Villa

Abstract: We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak con… ▽ More We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to fully exploit the smoothness properties of the objective function. In the convex case and in an infinite dimensional setting, we establish almost sure weak convergence of the iterates and the asymptotic rate o(1/n) for the mean of the function values. We derive linear rates under strong convexity and error bound conditions. Our analysis is based on an abstract convergence principle for stochastic descent algorithms which allows to extend and simplify existing results. △ Less

Submitted 25 November, 2020; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: 39 pages

MSC Class: 65K05; 90C25; 90C06; 49M27

arXiv:1803.00783 [pdf, other]

doi 10.23919/EUSIPCO.2018.8553267

Sparse Multiple Kernel Learning: Support Identification via Mirror Stratifiability

Authors: Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Abstract: In statistical machine learning, kernel methods allow to consider infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done by solving an optimization problem depending on a data fit term and a suitable regularizer. In this paper we consider feature maps which are the concatenation of a fixed, possibly large, set of simpler… ▽ More In statistical machine learning, kernel methods allow to consider infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done by solving an optimization problem depending on a data fit term and a suitable regularizer. In this paper we consider feature maps which are the concatenation of a fixed, possibly large, set of simpler feature maps. The penalty is a sparsity inducing one, promoting solutions depending only on a small subset of the features. The group lasso problem is a special case of this more general setting. We show that one of the most popular optimization algorithms to solve the regularized objective function, the forward-backward splitting method, allows to perform feature selection in a stable manner. In particular, we prove that the set of relevant features is identified by the algorithm after a finite number of iterations if a suitable qualification condition holds. The main tools used in the proofs are the notions of stratification and mirror stratifiability. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: Submitted to EUSIPCO 2018. 5 pages, 2 figures

arXiv:1712.00357 [pdf, other]

Thresholding gradient methods in Hilbert spaces: support identification and linear convergence

Authors: Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Abstract: We study $\ell^1$ regularized least squares optimization problem in a separable Hilbert space. We show that the iterative soft-thresholding algorithm (ISTA) converges linearly, without making any assumption on the linear operator into play or on the problem. The result is obtained combining two key concepts: the notion of extended support, a finite set containing the support, and the notion of con… ▽ More We study $\ell^1$ regularized least squares optimization problem in a separable Hilbert space. We show that the iterative soft-thresholding algorithm (ISTA) converges linearly, without making any assumption on the linear operator into play or on the problem. The result is obtained combining two key concepts: the notion of extended support, a finite set containing the support, and the notion of conditioning over finite dimensional sets. We prove that ISTA identifies the solution extended support after a finite number of iterations, and we derive linear convergence from the conditioning property, which is always satisfied for $\ell^1$ regularized least squares problems. Our analysis extends to the the entire class of thresholding gradient algorithms, for which we provide a conceptually new proof of strong convergence, as well as convergence rates. △ Less

Submitted 1 December, 2017; originally announced December 2017.

Comments: 17 pages, 5 figures

MSC Class: 49K40; 49M29; 65J10; 65J15; 65J20; 65J22; 65K15; 90C25; 90C46

arXiv:1707.05422 [pdf, other]

Don't relax: early stop** for convex regularization

Authors: Simon Matet, Lorenzo Rosasco, Silvia Villa, Bang Long Vu

Abstract: We consider the problem of designing efficient regularization algorithms when regularization is encoded by a (strongly) convex functional. Unlike classical penalization methods based on a relaxation approach, we propose an iterative method where regularization is achieved via early stop**. Our results show that the proposed procedure achieves the same recovery accuracy as penalization methods, w… ▽ More We consider the problem of designing efficient regularization algorithms when regularization is encoded by a (strongly) convex functional. Unlike classical penalization methods based on a relaxation approach, we propose an iterative method where regularization is achieved via early stop**. Our results show that the proposed procedure achieves the same recovery accuracy as penalization methods, while naturally integrating computational considerations. An empirical analysis on a number of problems provides promising results with respect to the state of the art. △ Less

Submitted 17 July, 2017; originally announced July 2017.

MSC Class: 47H05; 49M29; 49M27; 90C25

arXiv:1703.09477 [pdf, ps, other]

doi 10.1007/s10107-022-01809-4

Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry

Authors: Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Abstract: We provide a comprehensive study of the convergence of the forward-backward algorithm under suitable geometric conditions, such as conditioning or Łojasiewicz properties. These geometrical notions are usually local by nature, and may fail to describe the fine geometry of objective functions relevant in inverse problems and signal processing, that have a nice behaviour on manifolds, or sets open wi… ▽ More We provide a comprehensive study of the convergence of the forward-backward algorithm under suitable geometric conditions, such as conditioning or Łojasiewicz properties. These geometrical notions are usually local by nature, and may fail to describe the fine geometry of objective functions relevant in inverse problems and signal processing, that have a nice behaviour on manifolds, or sets open with respect to a weak topology. Motivated by this observation, we revisit those geometric notions over arbitrary sets. In turn, this allows us to present several new results as well as collect in a unified view a variety of results scattered in the literature. Our contributions include the analysis of infinite dimensional convex minimization problems, showing the first Łojasiewicz inequality for a quadratic function associated to a compact operator, and the derivation of new linear rates for problems arising from inverse problems with low-complexity priors. Our approach allows to establish unexpected connections between geometry and a priori conditions in inverse problems, such as source conditions, or restricted isometry properties. △ Less

Submitted 13 November, 2020; v1 submitted 28 March, 2017; originally announced March 2017.

Comments: After peer-review, the paper has been significantly modified: i) Section 3.3 has been completely rewritten, and contains a new sum rule (Theorem 3.15) ii) The end of Section 4.2 and Section 5.2 have been rewritten to include mirror-stratifiable problems iii) The Annex contains new proofs for small-but-not-trivial claims made throughout the paper iv) Theorems, Examples etc have been renumbered

Journal ref: Math. Program. 198, 937-996 (2023)

arXiv:1610.02170 [pdf, other]

doi 10.1007/s10851-017-0754-0

Iterative regularization via dual diagonal descent

Authors: Guillaume Garrigos, Lorenzo Rosasco, Silvia Villa

Abstract: In the context of linear inverse problems, we propose and study a general iterative regularization method allowing to consider large classes of regularizers and data-fit terms. The algorithm we propose is based on a primal-dual diagonal {descent} method. Our analysis establishes convergence as well as stability results. Theoretical findings are complemented with numerical experiments showing state… ▽ More In the context of linear inverse problems, we propose and study a general iterative regularization method allowing to consider large classes of regularizers and data-fit terms. The algorithm we propose is based on a primal-dual diagonal {descent} method. Our analysis establishes convergence as well as stability results. Theoretical findings are complemented with numerical experiments showing state of the art performances. △ Less

Submitted 13 July, 2017; v1 submitted 7 October, 2016; originally announced October 2016.

Comments: 41 pages, 13 figures. 4-pages version of the paper available at http://opt-ml.org/papers/OPT2016_paper_19.pdf

MSC Class: 90C25; 49N45; 49N15; 68U10; 90C06

arXiv:1605.00632 [pdf, other]

The Aw-Rascle-Zhang model with constraints

Authors: Stefano Villa

Abstract: The thesis deals with the Aw-Rascle-Zhang model for traffic. We have applied the model to describe the influence of a large and slow vehicle (a bus or a truck) on the traffic. The trajectory of the bus is given by an ODE. The model can also be applied to the case of a fixed constraint, like a traffic light or a toll gate. We define two different Riemann solvers: the first one conserves both the nu… ▽ More The thesis deals with the Aw-Rascle-Zhang model for traffic. We have applied the model to describe the influence of a large and slow vehicle (a bus or a truck) on the traffic. The trajectory of the bus is given by an ODE. The model can also be applied to the case of a fixed constraint, like a traffic light or a toll gate. We define two different Riemann solvers: the first one conserves both the number of cars and the generalized momentum, while the second conserves only the number of cars. We characterize the invariant domains for these Riemann solvers. We study two numerical methods based on the Godunov method to capture the proposed solutions and we track the bus trajectory with a front-tracking technique. The first method is based on conservation and captures exactly the solution corresponding to the first Riemann solver. The second method is based on a non-uniform mesh. Both methods fail to capture the solutions corresponding to the second Riemann solver for general initial data. Finally, we prove the existence of solutions for the Cauchy problem for the second Riemann solver in the case of a fixed constraint, applying the wave-front tracking method. △ Less

Submitted 2 May, 2016; originally announced May 2016.

MSC Class: 90B20; 35L65

arXiv:1602.07872 [pdf, ps, other]

A first-order stochastic primal-dual algorithm with correction step

Authors: Lorenzo Rosasco, Silvia Villa, Bang Cong Vu

Abstract: We investigate the convergence properties of a stochastic primal-dual splitting algorithm for solving structured monotone inclusions involving the sum of a cocoercive operator and a composite monotone operator. The proposed method is the stochastic extension to monotone inclusions of a proximal method studied in {\em Y. Drori, S. Sabach, and M. Teboulle, A simple algorithm for a class of nonsmooth… ▽ More We investigate the convergence properties of a stochastic primal-dual splitting algorithm for solving structured monotone inclusions involving the sum of a cocoercive operator and a composite monotone operator. The proposed method is the stochastic extension to monotone inclusions of a proximal method studied in {\em Y. Drori, S. Sabach, and M. Teboulle, A simple algorithm for a class of nonsmooth convex-concave saddle-point problems, 2015} and {\em I. Loris and C. Verhoeven, On a generalization of the iterative soft-thresholding algorithm for the case of non-separable penalty, 2011} for saddle point problems. It consists in a forward step determined by the stochastic evaluation of the cocoercive operator, a backward step in the dual variables involving the resolvent of the monotone operator, and an additional forward step using the stochastic evaluation of the cocoercive introduced in the first step. We prove weak almost sure convergence of the iterates by showing that the primal-dual sequence generated by the method is stochastic quasi Fejér-monotone with respect to the set of zeros of the considered primal and dual inclusions. Additional results on ergodic convergence in expectation are considered for the special case of saddle point models. △ Less

Submitted 25 February, 2016; originally announced February 2016.

MSC Class: 47H05; 49M29; 49M27; 90C25

arXiv:1510.04641 [pdf, ps, other]

Modified Fejér sequences and applications

Authors: Junhong Lin, Lorenzo Rosasco, Silvia Villa, Ding-Xuan Zhou

Abstract: In this note, we propose and study the notion of modified Fejér sequences. Within a Hilbert space setting, we show that it provides a unifying framework to prove convergence rates for objective function values of several optimization algorithms. In particular, our results apply to forward-backward splitting algorithm, incremental subgradient proximal algorithm, and the Douglas-Rachford splitting m… ▽ More In this note, we propose and study the notion of modified Fejér sequences. Within a Hilbert space setting, we show that it provides a unifying framework to prove convergence rates for objective function values of several optimization algorithms. In particular, our results apply to forward-backward splitting algorithm, incremental subgradient proximal algorithm, and the Douglas-Rachford splitting method including and generalizing known results. △ Less

Submitted 15 October, 2015; originally announced October 2015.

MSC Class: 65K05; 90C25; 90C52

arXiv:1507.00852 [pdf, other]

Stochastic inertial primal-dual algorithms

Authors: Lorenzo Rosasco, Silvia Villa, Bang Cong Vu

Abstract: We propose and study a novel stochastic inertial primal-dual approach to solve composite optimization problems. These latter problems arise naturally when learning with penalized regularization schemes. Our analysis provide convergence results in a general setting, that allows to analyze in a unified framework a variety of special cases of interest. Key in our analysis is considering the framework… ▽ More We propose and study a novel stochastic inertial primal-dual approach to solve composite optimization problems. These latter problems arise naturally when learning with penalized regularization schemes. Our analysis provide convergence results in a general setting, that allows to analyze in a unified framework a variety of special cases of interest. Key in our analysis is considering the framework of splitting algorithm for solving a monotone inclusions in suitable product spaces and for a specific choice of preconditioning operators. △ Less

Submitted 3 July, 2015; originally announced July 2015.

MSC Class: 47H05; 49M29; 49M27; 90C25

arXiv:1507.00848 [pdf, ps, other]

A stochastic inertial forward-backward splitting algorithm for multivariate monotone inclusions

Authors: Lorenzo Rosasco, Silvia Villa, Bang Cong Vu

Abstract: We propose an inertial forward-backward splitting algorithm to compute the zero of a sum of two monotone operators allowing for stochastic errors in the computation of the operators. More precisely, we establish almost sure convergence in real Hilbert spaces of the sequence of iterates to an optimal solution. Then, based on this analysis, we introduce two new classes of stochastic inertial primal-… ▽ More We propose an inertial forward-backward splitting algorithm to compute the zero of a sum of two monotone operators allowing for stochastic errors in the computation of the operators. More precisely, we establish almost sure convergence in real Hilbert spaces of the sequence of iterates to an optimal solution. Then, based on this analysis, we introduce two new classes of stochastic inertial primal-dual splitting methods for solving structured systems of composite monotone inclusions and prove their convergence. Our results extend to the stochastic and inertial setting various types of structured monotone inclusion problems and corresponding algorithmic solutions. Application to minimization problems is discussed. △ Less

Submitted 3 July, 2015; originally announced July 2015.

MSC Class: 47H05; 49M29; 49M27; 90C25

arXiv:1504.04636 [pdf, ps, other]

Consistent Learning by Composite Proximal Thresholding

Authors: Patrick L. Combettes, Saverio Salzo, Silvia Villa

Abstract: We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite-dimensional dictionary. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints… ▽ More We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite-dimensional dictionary. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding algorithm for computing such estimators. New results on the asymptotic behavior of the proximal forward-backward splitting method are derived and exploited to establish the convergence properties of the proposed algorithm. In particular, our method features a $o(1/m)$ convergence rate in objective values. △ Less

Submitted 1 December, 2015; v1 submitted 17 April, 2015; originally announced April 2015.

MSC Class: 62G08; 46N30; 46E22; 60B11; 47N30

arXiv:1504.03106 [pdf, other]

Learning Multiple Visual Tasks while Discovering their Structure

Authors: Carlo Ciliberto, Lorenzo Rosasco, Silvia Villa

Abstract: Multi-task learning is a natural approach for computer vision applications that require the simultaneous solution of several distinct but related problems, e.g. object detection, classification, tracking of multiple agents, or denoising, to name a few. The key idea is that exploring task relatedness (structure) can lead to improved performances. In this paper, we propose and study a novel sparse… ▽ More Multi-task learning is a natural approach for computer vision applications that require the simultaneous solution of several distinct but related problems, e.g. object detection, classification, tracking of multiple agents, or denoising, to name a few. The key idea is that exploring task relatedness (structure) can lead to improved performances. In this paper, we propose and study a novel sparse, non-parametric approach exploiting the theory of Reproducing Kernel Hilbert Spaces for vector-valued functions. We develop a suitable regularization framework which can be formulated as a convex optimization problem, and is provably solvable using an alternating minimization approach. Empirical tests show that the proposed method compares favorably to state of the art techniques and further allows to recover interpretable structures, a problem of interest in its own right. △ Less

Submitted 13 April, 2015; originally announced April 2015.

Comments: 19 pages, 3 figures, 3 tables

arXiv:1410.6847 [pdf, ps, other]

Regularized Learning Schemes in Feature Banach Spaces

Authors: Patrick L. Combettes, Saverio Salzo, Silvia Villa

Abstract: This paper proposes a unified framework for the investigation of constrained learning theory in reflexive Banach spaces of features via regularized empirical risk minimization. The focus is placed on Tikhonov-like regularization with totally convex functions. This broad class of regularizers provides a flexible model for various priors on the features, including in particular hard constraints and… ▽ More This paper proposes a unified framework for the investigation of constrained learning theory in reflexive Banach spaces of features via regularized empirical risk minimization. The focus is placed on Tikhonov-like regularization with totally convex functions. This broad class of regularizers provides a flexible model for various priors on the features, including in particular hard constraints and powers of Banach norms. In such context, the main results establish a new general form of the representer theorem and the consistency of the corresponding learning schemes under general conditions on the loss function, the geometry of the feature space, and the modulus of total convexity of the regularizer. In addition, the proposed analysis gives new insight into basic tools such as reproducing Banach spaces, feature maps, and universality. Even when specialized to Hilbert spaces, this framework yields new results that extend the state of the art. △ Less

Submitted 19 October, 2016; v1 submitted 24 October, 2014; originally announced October 2014.

MSC Class: 62G08; 46N30; 46E22; 60B11; 47N30

arXiv:1406.6311 [pdf]

doi 10.1140/epjc/s10052-014-3026-9

The Physics of the B Factories

Authors: A. J. Bevan, B. Golob, Th. Mannel, S. Prell, B. D. Yabsley, K. Abe, H. Aihara, F. Anulli, N. Arnaud, T. Aushev, M. Beneke, J. Beringer, F. Bianchi, I. I. Bigi, M. Bona, N. Brambilla, J. B rodzicka, P. Chang, M. J. Charles, C. H. Cheng, H. -Y. Cheng, R. Chistov, P. Colangelo, J. P. Coleman, A. Drutskoy , et al. (2009 additional authors not shown)

Abstract: This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary… ▽ More This work is on the Physics of the B Factories. Part A of this book contains a brief description of the SLAC and KEK B Factories as well as their detectors, BaBar and Belle, and data taking related issues. Part B discusses tools and methods used by the experiments in order to obtain results. The results themselves can be found in Part C. Please note that version 3 on the archive is the auxiliary version of the Physics of the B Factories book. This uses the notation alpha, beta, gamma for the angles of the Unitarity Triangle. The nominal version uses the notation phi_1, phi_2 and phi_3. Please cite this work as Eur. Phys. J. C74 (2014) 3026. △ Less

Submitted 31 October, 2015; v1 submitted 24 June, 2014; originally announced June 2014.

Comments: 928 pages, version 3 (arXiv:1406.6311v3) corresponds to the alpha, beta, gamma version of the book, the other versions use the phi1, phi2, phi3 notation

Report number: SLAC-PUB-15968, KEK Preprint 2014-3

Journal ref: Eur. Phys. J. C74 (2014) 3026

arXiv:1405.0042 [pdf, other]

Learning with incremental iterative regularization

Authors: Lorenzo Rosasco, Silvia Villa

Abstract: Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as wel… ▽ More Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as well as sharp finite sample bounds for the iterates. Our results are a step towards understanding the effect of multiple epochs in stochastic gradient techniques in machine learning and rely on integrating statistical and optimization results. △ Less

Submitted 15 June, 2015; v1 submitted 30 April, 2014; originally announced May 2014.

Comments: 30 pages

arXiv:1403.7999 [pdf, ps, other]

A Stochastic forward-backward splitting method for solving monotone inclusions in Hilbert spaces

Authors: Lorenzo Rosasco, Silvia Villa, Bang Công Vũ

Abstract: We propose and analyze the convergence of a novel stochastic forward-backward splitting algorithm for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued maximal monotone cocoercive operator. This latter framework has a number of interesting special cases, including variational inequalities and convex minimization problems, while stochastic approaches ar… ▽ More We propose and analyze the convergence of a novel stochastic forward-backward splitting algorithm for solving monotone inclusions given by the sum of a maximal monotone operator and a single-valued maximal monotone cocoercive operator. This latter framework has a number of interesting special cases, including variational inequalities and convex minimization problems, while stochastic approaches are practically relevant to account for perturbations in the data. The algorithm we propose is a stochastic extension of the classical deterministic forward-backward method, and is obtained considering the composition of the resolvent of the maximal monotone operator with a forward step based on a stochastic estimate of the single-valued operator. Our study provides a non-asymptotic error analysis in expectation for the strongly monotone case, as well as almost sure convergence under weaker assumptions. The approach we consider allows to avoid averaging, a feature critical when considering methods based on sparsity, and, for minimization problems, it allows to obtain convergence rates matching those obtained by stochastic extensions of so called accelerated methods. Stochastic quasi Fejer's sequences are a key technical tool to prove almost sure convergence. △ Less

Submitted 20 February, 2015; v1 submitted 31 March, 2014; originally announced March 2014.

Comments: 20 pages

MSC Class: 47H05; 90C15; 65K10; 90C25

arXiv:1403.5074 [pdf, other]

Convergence of Stochastic Proximal Gradient Algorithm

Authors: Lorenzo Rosasco, Silvia Villa, Bang Công Vũ

Abstract: We prove novel convergence results for a stochastic proximal gradient algorithm suitable for solving a large class of convex optimization problems, where a convex objective function is given by the sum of a smooth and a possibly non-smooth component. We consider the iterates convergence and derive $O(1/n)$ non asymptotic bounds in expectation in the strongly convex case, as well as almost sure con… ▽ More We prove novel convergence results for a stochastic proximal gradient algorithm suitable for solving a large class of convex optimization problems, where a convex objective function is given by the sum of a smooth and a possibly non-smooth component. We consider the iterates convergence and derive $O(1/n)$ non asymptotic bounds in expectation in the strongly convex case, as well as almost sure convergence results under weaker assumptions. Our approach allows to avoid averaging and weaken boundedness assumptions which are often considered in theoretical studies and might not be satisfied in practice. △ Less

Submitted 16 September, 2014; v1 submitted 20 March, 2014; originally announced March 2014.

Comments: 24 pages

MSC Class: 65K10; 90C25; 90C15; 62G08

arXiv:1303.5976 [pdf, ps, other]

On Learnability, Complexity and Stability

Authors: Silvia Villa, Lorenzo Rosasco, Tomaso Poggio

Abstract: We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik. We survey classic results characterizing learnability in term of suitable notions of complexity, as well as more recent results that establish the connection between learnability and stability of a learning algorithm. We consider the fundamental question of learnability of a hypotheses class in the supervised learning setting and in the general learning setting introduced by Vladimir Vapnik. We survey classic results characterizing learnability in term of suitable notions of complexity, as well as more recent results that establish the connection between learnability and stability of a learning algorithm. △ Less

Submitted 24 March, 2013; originally announced March 2013.

arXiv:1209.0368 [pdf, other]

Proximal methods for the latent group lasso penalty

Authors: Silvia Villa, Lorenzo Rosasco, Sofia Mosci, Alessandro Verri

Abstract: We consider a regularized least squares problem, with regularization by structured sparsity-inducing norms, which extend the usual $\ell_1$ and the group lasso penalty, by allowing the subsets to overlap. Such regularizations lead to nonsmooth problems that are difficult to optimize, and we propose in this paper a suitable version of an accelerated proximal method to solve them. We prove convergen… ▽ More We consider a regularized least squares problem, with regularization by structured sparsity-inducing norms, which extend the usual $\ell_1$ and the group lasso penalty, by allowing the subsets to overlap. Such regularizations lead to nonsmooth problems that are difficult to optimize, and we propose in this paper a suitable version of an accelerated proximal method to solve them. We prove convergence of a nested procedure, obtained composing an accelerated proximal method with an inner algorithm for computing the proximity operator. By exploiting the geometrical properties of the penalty, we devise a new active set strategy, thanks to which the inner iteration is relatively fast, thus guaranteeing good computational performances of the overall algorithm. Our approach allows to deal with high dimensional problems without pre-processing for dimensionality reduction, leading to better computational and prediction performances with respect to the state-of-the art methods, as shown empirically both on toy and real data. △ Less

Submitted 3 September, 2012; originally announced September 2012.

Comments: 4 figures

MSC Class: 65K10; 90C25

arXiv:1208.2572 [pdf, other]

Nonparametric sparsity and regularization

Authors: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro verri

Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. B… ▽ More In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. △ Less

Submitted 13 August, 2012; originally announced August 2012.

Comments: 45 pages, 11 figures

arXiv:1110.4017 [pdf, ps, other]

An extension of Mercer theorem to vector-valued measurable kernels

Authors: Ernesto De Vito, Veronica Umanita`, Silvia Villa

Abstract: We extend the classical Mercer theorem to reproducing kernel Hilbert spaces whose elements are functions from a measurable space $X$into $\mathbb C^n$. Given a finite measure $μ$ on $X$, we represent the reproducing kernel $K$ as convergent series in terms of the eigenfunctions of a suitable compact operator depending on $K$ and $μ$. Our result holds under the mild assumption that $K$ is measurabl… ▽ More We extend the classical Mercer theorem to reproducing kernel Hilbert spaces whose elements are functions from a measurable space $X$into $\mathbb C^n$. Given a finite measure $μ$ on $X$, we represent the reproducing kernel $K$ as convergent series in terms of the eigenfunctions of a suitable compact operator depending on $K$ and $μ$. Our result holds under the mild assumption that $K$ is measurable and the associated Hilbert space is separable. Furthermore, we show that $X$ has a natural second countable topology with respect to which the eigenfunctions are continuous and the series representing $K$ uniformly converges to $K$ on any compact subsets of $X\times X$, provided that the support of $μ$ is $X$. △ Less

Submitted 18 October, 2011; originally announced October 2011.

MSC Class: 46E22; 47B32; 47B34; 47G10; 47A70; 68T05

arXiv:1109.6015 [pdf, ps, other]

doi 10.1111/j.1365-2966.2011.19909.x

Stellar Clusters in M83: Formation, evolution, disruption and the influence of environment

Authors: N. Bastian, A. Adamo, M. Gieles, E. Silva Villa, H. J. G. L. M Lamers, S. S. Larsen, L. J. Smith, I. S. Konstantopoulos, E. Zackrisson

Abstract: We study the stellar cluster population in two adjacent fields in the nearby, face-on spiral galaxy, M83, using WFC3/HST imaging. The clusters are selected through visual inspection to be centrally concentrated, symmetric, and resolved on the images, which allows us to differentiate between clusters and likely unbound associations. We compare our sample with previous studies and show that the diff… ▽ More We study the stellar cluster population in two adjacent fields in the nearby, face-on spiral galaxy, M83, using WFC3/HST imaging. The clusters are selected through visual inspection to be centrally concentrated, symmetric, and resolved on the images, which allows us to differentiate between clusters and likely unbound associations. We compare our sample with previous studies and show that the differences between the catalogues are largely due to the inclusion of large numbers of diffuse associations within previous catalogues. The luminosity function of the clusters is well approximated by a power-law with index, -2, over most of the observed range, however a steepening is seen at M_V = -9.3 and -8.8 in the inner and outer fields, respectively. Additionally, we show that the cluster population is inconsistent with a pure power-law mass distribution, but instead exhibits a truncation at the high mass end. If described as a Schechter function, the characteristic mass is 1.6 and 0.5 * 10^5 Msun, for the inner and outer fields, respectively, in agreement with previous estimates of other cluster populations in spiral galaxies. Comparing the predictions of the mass independent disruption (MID) and mass dependent disruption (MDD) scenarios with the observed distributions, we find that both models can accurately fit the data. However, for the MID case, the fraction of clusters destroyed (or mass lost) per decade in age is dependent on the environment, hence, the age/mass distributions of clusters are not universal. In the MDD case, the disruption timescale scales with galactocentric distance (being longer in the outer regions of the galaxy) in agreement with analytic and numerical predictions. Finally, we discuss the implications of our results on other extragalactic surveys, focussing on the fraction of stars that form in clusters and the need (or lack thereof) for infant mortality. △ Less

Submitted 27 September, 2011; originally announced September 2011.

Comments: 19 pages, 20 figures, MNRAS in press

arXiv:1103.0414 [pdf, ps, other]

Convergence analysis of a proximal Gauss-Newton method

Authors: Saverio Salzo, Silvia Villa

Abstract: An extension of the Gauss-Newton algorithm is proposed to find local minimizers of penalized nonlinear least squares problems, under generalized Lipschitz assumptions. Convergence results of local type are obtained, as well as an estimate of the radius of the convergence ball. Some applications for solving constrained nonlinear equations are discussed and the numerical performance of the method is… ▽ More An extension of the Gauss-Newton algorithm is proposed to find local minimizers of penalized nonlinear least squares problems, under generalized Lipschitz assumptions. Convergence results of local type are obtained, as well as an estimate of the radius of the convergence ball. Some applications for solving constrained nonlinear equations are discussed and the numerical performance of the method is assessed on some significant test problems. △ Less

Submitted 2 March, 2011; originally announced March 2011.

MSC Class: 65J15; 90C30; 47J25

arXiv:1011.3728 [pdf, other]

PADDLE: Proximal Algorithm for Dual Dictionaries LEarning

Authors: Curzio Basso, Matteo Santoro, Alessandro Verri, Silvia Villa

Abstract: Recently, considerable research efforts have been devoted to the design of methods to learn from data overcomplete dictionaries for sparse coding. However, learned dictionaries require the solution of an optimization problem for coding new data. In order to overcome this drawback, we propose an algorithm aimed at learning both a dictionary and its dual: a linear map** directly performing the cod… ▽ More Recently, considerable research efforts have been devoted to the design of methods to learn from data overcomplete dictionaries for sparse coding. However, learned dictionaries require the solution of an optimization problem for coding new data. In order to overcome this drawback, we propose an algorithm aimed at learning both a dictionary and its dual: a linear map** directly performing the coding. By leveraging on proximal methods, our algorithm jointly minimizes the reconstruction error of the dictionary and the coding error of its dual; the sparsity of the representation is induced by an $\ell_1$-based penalty on its coefficients. The results obtained on synthetic data and real images show that the algorithm is capable of recovering the expected dictionaries. Furthermore, on a benchmark dataset, we show that the image features obtained from the dual matrix yield state-of-the-art classification performance while being much less computational intensive. △ Less

Submitted 16 November, 2010; originally announced November 2010.

Report number: DISI-TR-2010-06

arXiv:0707.0263 [pdf, ps, other]

Review of Bu leptonic decays

Authors: Stefano Villa

Abstract: This paper reviews the status of searches and measurements of Bu leptonic decays, concentrating on the most recent results obtained at B-factories. We will describe studies of decays of the type B+ -> ell+ nu and B+ -> ell+ nu gamma. This paper reviews the status of searches and measurements of Bu leptonic decays, concentrating on the most recent results obtained at B-factories. We will describe studies of decays of the type B+ -> ell+ nu and B+ -> ell+ nu gamma. △ Less

Submitted 2 July, 2007; originally announced July 2007.

Comments: Flavor Physics & CP Violation Conference, Bled, 2007

Report number: fpcp07_314

Journal ref: ECONFC070512:014,2007

arXiv:hep-ex/0605114 [pdf, ps, other]

Beyond the Standard Model at Belle: B to K*l+l- and search for leptonic B decays

Authors: Stefano Villa

Abstract: We report the first measurement of the forward-backward asymmetry and the ratios of Wilson coefficients A_9/A_7 and A_10/A_7 in B to K*l+l-, obtained using 386M BBbar pairs that were collected at the Y(4S) resonance with the Belle detector at the KEKB asymmetric-energy e+ e- collider. We also summarise the results obtained by Belle in searches for purely leptonic B decays, with emphasis on their… ▽ More We report the first measurement of the forward-backward asymmetry and the ratios of Wilson coefficients A_9/A_7 and A_10/A_7 in B to K*l+l-, obtained using 386M BBbar pairs that were collected at the Y(4S) resonance with the Belle detector at the KEKB asymmetric-energy e+ e- collider. We also summarise the results obtained by Belle in searches for purely leptonic B decays, with emphasis on their impact on models of physics beyond the Standard Model. △ Less

Submitted 29 May, 2006; originally announced May 2006.

Comments: 9 pages, 4 figures. To be published in the proceedings of the Rencontres de Physique de la Vallee d'Aoste, La Thuile, Italy, March 5-11, 2006

arXiv:hep-ex/0507036 [pdf, ps, other]

doi 10.1103/PhysRevD.73.051107

Search for the decay B0 to gamma gamma

Authors: The Belle Collaboration, S. Villa

Abstract: The rare decay B0 -> gamma gamma is searched for in 104 fb^-1 of data, corresponding to 111 x 10^6 BBar pairs, collected with the Belle detector at the KEKB asymmetric-energy e+ e- collider. No evidence for the signal is found, and an upper limit of 6.2 x 10^-7 at 90% confidence level is set for the corresponding branching fraction. The rare decay B0 -> gamma gamma is searched for in 104 fb^-1 of data, corresponding to 111 x 10^6 BBar pairs, collected with the Belle detector at the KEKB asymmetric-energy e+ e- collider. No evidence for the signal is found, and an upper limit of 6.2 x 10^-7 at 90% confidence level is set for the corresponding branching fraction. △ Less

Submitted 24 March, 2006; v1 submitted 8 July, 2005; originally announced July 2005.

Comments: 7 pages, 3 figures, accepted for publication by Phys. Rev. D

Report number: Belle preprint 2005-38; KEK preprint 2005-90

Journal ref: Phys.Rev.D73:051107,2006

arXiv:hep-ph/0410208 [pdf, ps, other]

doi 10.1016/j.nuclphysbps.2005.01.065

Gauge boson couplings at LEP

Authors: Stefano Villa

Abstract: A review is given of the measurements of triple and quartic couplings among the electroweak gauge bosons performed at LEP by the four experiments ALEPH, DELPHI, L3 and OPAL. Emphasis is placed on recently published results and on combinations of results performed by the LEP electroweak gauge-couplings group. All measurements presented are consistent with the Standard Model expectations. A review is given of the measurements of triple and quartic couplings among the electroweak gauge bosons performed at LEP by the four experiments ALEPH, DELPHI, L3 and OPAL. Emphasis is placed on recently published results and on combinations of results performed by the LEP electroweak gauge-couplings group. All measurements presented are consistent with the Standard Model expectations. △ Less

Submitted 14 October, 2004; originally announced October 2004.

Comments: 6 pages, 7 figures. To be published in the proceedings of the BEACH04 conference, Chicago, June 27-July 3 2004

arXiv:hep-ex/0309034 [pdf, ps, other]

Discovery Potential for SUGRA/SUSY at CMS

Authors: Stefano Villa

Abstract: The expected SUSY discovery potential of the CMS experiment at LHC is described, both in the MSSM and in the more constrained framework of mSugra, with emphasis on inclusive searches, the MSSM Higgs sector, and one example of complete reconstruction of a SUSY decay chain. The expected SUSY discovery potential of the CMS experiment at LHC is described, both in the MSSM and in the more constrained framework of mSugra, with emphasis on inclusive searches, the MSSM Higgs sector, and one example of complete reconstruction of a SUSY decay chain. △ Less

Submitted 9 September, 2003; originally announced September 2003.

Comments: 6 pages, 7 figures, based on talk presented at SUGRA20, Boston, 17-21 March 2003

Showing 1–50 of 51 results for author: Villa, S