-
Solving partial differential equations with sampled neural networks
Authors:
Chinmay Datar,
Taniya Kapoor,
Abhishek Chandra,
Qing Sun,
Iryna Burak,
Erik Lien Bolager,
Anna Veselovska,
Massimo Fornasier,
Felix Dietrich
Abstract:
Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent pr…
▽ More
Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent probability distributions allows us to progress on both challenges. In most examples, the random sampling schemes outperform iterative, gradient-based optimization of physics-informed neural networks regarding training time and accuracy by several orders of magnitude. For time-dependent PDE, we construct neural basis functions only in the spatial domain and then solve the associated ordinary differential equation with classical methods from scientific computing over a long time horizon. This alleviates one of the greatest challenges for neural PDE solvers because it does not require us to parameterize the solution in time. For second-order elliptic PDE in Barron spaces, we prove the existence of sampled networks with $L^2$ convergence to the solution. We demonstrate our approach on several time-dependent and static PDEs. We also illustrate how sampled networks can effectively solve inverse problems in this setting. Benefits compared to common numerical schemes include spectral convergence and mesh-free construction of basis functions.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
A PDE Framework of Consensus-Based Optimization for Objectives with Multiple Global Minimizers
Authors:
Massimo Fornasier,
Lukang Sun
Abstract:
Consensus-based optimization (CBO) is an agent-based derivative-free method for non-smooth global optimization that has been introduced in 2017, leveraging a surprising interplay between stochastic exploration and Laplace principle. In addition to its versatility and effectiveness in handling high-dimensional, non-convex, and nonsmooth optimization problems, this approach lends itself well to theo…
▽ More
Consensus-based optimization (CBO) is an agent-based derivative-free method for non-smooth global optimization that has been introduced in 2017, leveraging a surprising interplay between stochastic exploration and Laplace principle. In addition to its versatility and effectiveness in handling high-dimensional, non-convex, and nonsmooth optimization problems, this approach lends itself well to theoretical analysis. Indeed, its dynamics is governed by a degenerate nonlinear Fokker-Planck equation, whose large-time behavior explains the convergence of the method. Recent results provide guarantees of convergence under the restrictive assumption of a unique global minimizer for the objective function. In this work, we proposed a novel and simple variation of CBO to tackle nonconvex optimizations with multiple minimizers. Despite the simplicity of this new model, its analysis is particularly challenging because of its nonlinearity and nonlocal nature. We prove the existence of solutions of the corresponding nonlinear Fokker-Planck equation and we show exponential concentration in time to the set of minimizers made of multiple smooth, convex, and compact components. Our proofs require combining several ingredients, such as delicate geometrical arguments, new variants of a quantitative Laplace principle, ad hoc regularizations and approximations, and regularity theory for parabolic equations. Ultimately, this result suggests that the corresponding CBO algorithm, formulated as an Euler-Maruyama discretization of the underlying empirical stochastic process, tends to converge to global minimizers.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Approximation Theory, Computing, and Deep Learning on the Wasserstein Space
Authors:
Massimo Fornasier,
Pascal Heid,
Giacomo Enrico Sodini
Abstract:
The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing…
▽ More
The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.
△ Less
Submitted 30 April, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Consensus-Based Optimization with Truncated Noise
Authors:
Massimo Fornasier,
Peter Richtárik,
Konstantin Riedl,
Lukang Sun
Abstract:
Consensus-based optimization (CBO) is a versatile multi-particle metaheuristic optimization method suitable for performing nonconvex and nonsmooth global optimizations in high dimensions. It has proven effective in various applications while at the same time being amenable to a theoretical convergence analysis. In this paper, we explore a variant of CBO, which incorporates truncated noise in order…
▽ More
Consensus-based optimization (CBO) is a versatile multi-particle metaheuristic optimization method suitable for performing nonconvex and nonsmooth global optimizations in high dimensions. It has proven effective in various applications while at the same time being amenable to a theoretical convergence analysis. In this paper, we explore a variant of CBO, which incorporates truncated noise in order to enhance the well-behavedness of the statistics of the law of the dynamics. By introducing this additional truncation in the noise term of the CBO dynamics, we achieve that, in contrast to the original version, higher moments of the law of the particle system can be effectively bounded. As a result, our proposed variant exhibits enhanced convergence performance, allowing in particular for wider flexibility in choosing the noise parameter of the method as we confirm experimentally. By analyzing the time-evolution of the Wasserstein-$2$ distance between the empirical measure of the interacting particle system and the global minimizer of the objective function, we rigorously prove convergence in expectation of the proposed CBO variant requiring only minimal assumptions on the objective function and on the initialization. Numerical evidences demonstrate the benefit of truncating the noise in CBO.
△ Less
Submitted 12 February, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks
Authors:
Cristina Cipriani,
Massimo Fornasier,
Alessandro Scagliotti
Abstract:
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning archi…
▽ More
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.
△ Less
Submitted 10 August, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Gradient is All You Need?
Authors:
Konstantin Riedl,
Timo Klock,
Carina Geldhauser,
Massimo Fornasier
Abstract:
In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradien…
▽ More
In this paper we provide a novel analytical perspective on the theoretical understanding of gradient-based learning algorithms by interpreting consensus-based optimization (CBO), a recently proposed multi-particle derivative-free optimization method, as a stochastic relaxation of gradient descent. Remarkably, we observe that through communication of the particles, CBO exhibits a stochastic gradient descent (SGD)-like behavior despite solely relying on evaluations of the objective function. The fundamental value of such link between CBO and SGD lies in the fact that CBO is provably globally convergent to global minimizers for ample classes of nonsmooth and nonconvex objective functions, hence, on the one side, offering a novel explanation for the success of stochastic relaxations of gradient descent. On the other side, contrary to the conventional wisdom for which zero-order methods ought to be inefficient or not to possess generalization abilities, our results unveil an intrinsic gradient descent nature of such heuristics. This viewpoint furthermore complements previous insights into the working principles of CBO, which describe the dynamics in the mean-field limit through a nonlinear nonlocal partial differential equation that allows to alleviate complexities of the nonconvex function landscape. Our proofs leverage a completely nonsmooth analysis, which combines a novel quantitative version of the Laplace principle (log-sum-exp trick) and the minimizing movement scheme (proximal iteration). In doing so, we furnish useful and precise insights that explain how stochastic perturbations of gradient descent overcome energy barriers and reach deep levels of nonconvex functions. Instructive numerical illustrations support the provided theoretical insights.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Finite Sample Identification of Wide Shallow Neural Networks with Biases
Authors:
Massimo Fornasier,
Timo Klock,
Marco Mondelli,
Michael Rauchensteiner
Abstract:
Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP…
▽ More
Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases. The identification of the parameters of the network from finite samples of input-output pairs is often referred to as the \emph{teacher-student model}, and this model has represented a popular framework for understanding training and generalization. Even if the problem is NP-complete in the worst case, a rapidly growing literature -- after adding suitable distributional assumptions -- has established finite sample identification of two-layer networks with a number of neurons $m=\mathcal O(D)$, $D$ being the input dimension. For the range $D<m<D^2$ the problem becomes harder, and truly little is known for networks parametrized by biases as well. This paper fills the gap by providing constructive methods and theoretical guarantees of finite sample identification for such wider shallow networks with biases. Our approach is based on a two-step pipeline: first, we recover the direction of the weights, by exploiting second order information; next, we identify the signs by suitable algebraic evaluations, and we recover the biases by empirical risk minimization via gradient descent. Numerical results demonstrate the effectiveness of our approach.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Density of subalgebras of Lipschitz functions in metric Sobolev spaces and applications to Wasserstein Sobolev spaces
Authors:
Massimo Fornasier,
Giuseppe Savaré,
Giacomo Enrico Sodini
Abstract:
We prove a general criterion for the density in energy of suitable subalgebras of Lipschitz functions in the metric-Sobolev space $H^{1,p}(X,\mathsf{d},\mathfrak{m})$ associated with a positive and finite Borel measure $\mathfrak{m}$ in a separable and complete metric space $(X,\mathsf{d})$. We then provide a relevant application to the case of the algebra of cylinder functions in the Wasserstein…
▽ More
We prove a general criterion for the density in energy of suitable subalgebras of Lipschitz functions in the metric-Sobolev space $H^{1,p}(X,\mathsf{d},\mathfrak{m})$ associated with a positive and finite Borel measure $\mathfrak{m}$ in a separable and complete metric space $(X,\mathsf{d})$. We then provide a relevant application to the case of the algebra of cylinder functions in the Wasserstein Sobolev space $H^{1,2}(\mathcal{P}_2(\mathbb{M}),W_{2},\mathfrak{m})$ arising from a positive and finite Borel measure $\mathfrak{m}$ on the Kantorovich-Rubinstein-Wasserstein space $(\mathcal{P}_2(\mathbb{M}),W_{2})$ of probability measures in a finite dimensional Euclidean space, a complete Riemannian manifold, or a separable Hilbert space $\mathbb{M}$. We will show that such a Sobolev space is always Hilbertian, independently of the choice of the reference measure $\mathfrak{m}$ so that the resulting Cheeger energy is a Dirichlet form. We will eventually provide an explicit characterization for the corresponding notion of $\mathfrak{m}$-Wasserstein gradient, showing useful calculus rules and its consistency with the tangent bundle and the $Γ$-calculus inherited from the Dirichlet form.
△ Less
Submitted 14 September, 2023; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Convergence of Anisotropic Consensus-Based Optimization in Mean-Field Law
Authors:
Massimo Fornasier,
Timo Klock,
Konstantin Riedl
Abstract:
In this paper we study anisotropic consensus-based optimization (CBO), a multi-agent metaheuristic derivative-free optimization method capable of globally minimizing nonconvex and nonsmooth functions in high dimensions. CBO is based on stochastic swarm intelligence, and inspired by consensus dynamics and opinion formation. Compared to other metaheuristic algorithms like particle swarm optimization…
▽ More
In this paper we study anisotropic consensus-based optimization (CBO), a multi-agent metaheuristic derivative-free optimization method capable of globally minimizing nonconvex and nonsmooth functions in high dimensions. CBO is based on stochastic swarm intelligence, and inspired by consensus dynamics and opinion formation. Compared to other metaheuristic algorithms like particle swarm optimization, CBO is of a simpler nature and therefore more amenable to theoretical analysis. By adapting a recently established proof technique, we show that anisotropic CBO converges globally with a dimension-independent rate for a rich class of objective functions under minimal assumptions on the initialization of the method. Moreover, the proof technique reveals that CBO performs a convexification of the optimization problem as the number of agents goes to infinity, thus providing an insight into the internal CBO mechanisms responsible for the success of the method. To motivate anisotropic CBO from a practical perspective, we further test the method on a complicated high-dimensional benchmark problem, which is well understood in the machine learning literature.
△ Less
Submitted 23 March, 2024; v1 submitted 15 November, 2021;
originally announced November 2021.
-
A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs
Authors:
Benoît Bonnet,
Cristina Cipriani,
Massimo Fornasier,
Hui Huang
Abstract:
In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with $L^2$-regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequ…
▽ More
In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with $L^2$-regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequence of this uniqueness property, the mean-field maximum principle also provides a strong quantitative generalization error for finite sample approximations. Our derivation of the mean-field maximum principle is much simpler than the ones currently available in the literature for mean-field optimal control problems, and is based on a generalized Lagrange multiplier theorem on convex sets of spaces of measures. The latter is also new, and can be considered as a result of independent interest.
△ Less
Submitted 8 April, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Anisotropic Diffusion in Consensus-based Optimization on the Sphere
Authors:
Massimo Fornasier,
Hui Huang,
Lorenzo Pareschi,
Philippe Sünnen
Abstract:
In this paper we are concerned with the global minimization of a possibly non-smooth and non-convex objective function constrained on the unit hypersphere by means of a multi-agent derivative-free method. The proposed algorithm falls into the class of the recently introduced Consensus-Based Optimization. In fact, agents move on the sphere driven by a drift towards an instantaneous consensus point,…
▽ More
In this paper we are concerned with the global minimization of a possibly non-smooth and non-convex objective function constrained on the unit hypersphere by means of a multi-agent derivative-free method. The proposed algorithm falls into the class of the recently introduced Consensus-Based Optimization. In fact, agents move on the sphere driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of agent locations, weighted by the cost function according to Laplaces principle, and it represents an approximation to a global minimizer. The dynamics is further perturbed by an anisotropic random vector field to favor exploration. The main results of this paper are about the proof of convergence of the numerical scheme to global minimizers provided conditions of well-preparation of the initial datum. The proof of convergence combines a mean-field limit result with a novel asymptotic analysis, and classical convergence results of numerical methods for SDE. The main innovation with respect to previous work is the introduction of an anisotropic stochastic term, which allows us to ensure the independence of the parameters of the algorithm from the dimension and to scale the method to work in very high dimension. We present several numerical experiments, which show that the algorithm proposed in the present paper is extremely versatile and outperforms previous formulations with isotropic stochastic noise.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Consensus-Based Optimization Methods Converge Globally
Authors:
Massimo Fornasier,
Timo Klock,
Konstantin Riedl
Abstract:
In this paper we study consensus-based optimization (CBO), which is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a no…
▽ More
In this paper we study consensus-based optimization (CBO), which is a multi-agent metaheuristic derivative-free optimization method that can globally minimize nonconvex nonsmooth functions and is amenable to theoretical analysis. Based on an experimentally supported intuition that, on average, CBO performs a gradient descent of the squared Euclidean distance to the global minimizer, we devise a novel technique for proving the convergence to the global minimizer in mean-field law for a rich class of objective functions. The result unveils internal mechanisms of CBO that are responsible for the success of the method. In particular, we prove that CBO performs a convexification of a large class of optimization problems as the number of optimizing agents goes to infinity. Furthermore, we improve prior analyses by requiring mild assumptions about the initialization of the method and by covering objectives that are merely locally Lipschitz continuous. As a core component of this analysis, we establish a quantitative nonasymptotic Laplace principle, which may be of independent interest. From the result of CBO convergence in mean-field law, it becomes apparent that the hardness of any global optimization problem is necessarily encoded in the rate of the mean-field approximation, for which we provide a novel probabilistic quantitative estimate. The combination of these results allows to obtain probabilistic global convergence guarantees of the numerical CBO method.
△ Less
Submitted 23 March, 2024; v1 submitted 28 March, 2021;
originally announced March 2021.
-
Data-driven entropic spatially inhomogeneous evolutionary games
Authors:
Mauro Bonafini,
Massimo Fornasier,
Bernhard Schmitzer
Abstract:
We introduce novel multi-agent interaction models of entropic spatially inhomogeneous evolutionary undisclosed games and their quasi-static limits. These evolutions vastly generalize first and second order dynamics. Besides the well-posedness of these novel forms of multi-agent interactions, we are concerned with the learnability of individual payoff functions from observation data. We formulate t…
▽ More
We introduce novel multi-agent interaction models of entropic spatially inhomogeneous evolutionary undisclosed games and their quasi-static limits. These evolutions vastly generalize first and second order dynamics. Besides the well-posedness of these novel forms of multi-agent interactions, we are concerned with the learnability of individual payoff functions from observation data. We formulate the payoff learning as a variational problem, minimizing the discrepancy between the observations and the predictions by the payoff function. The inferred payoff function can then be used to simulate further evolutions, which are fully data-driven. We prove convergence of minimizing solutions obtained from a finite number of observations to a mean field limit and the minimal value provides a quantitative error bound on the data-driven evolutions. The abstract framework is fully constructive and numerically implementable. We illustrate this on computational examples where a ground truth payoff function is known and on examples where this is not the case, including a model for pedestrian movement.
△ Less
Submitted 9 March, 2022; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples
Authors:
Christian Fiedler,
Massimo Fornasier,
Timo Klock,
Michael Rauchensteiner
Abstract:
In this paper we approach the problem of unique and stable identifiability of generic deep artificial neural networks with pyramidal shape and smooth activation functions from a finite number of input-output samples. More specifically we introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the…
▽ More
In this paper we approach the problem of unique and stable identifiability of generic deep artificial neural networks with pyramidal shape and smooth activation functions from a finite number of input-output samples. More specifically we introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts. We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm as soon as $\mathcal O(D^2 \times m)$ nonadaptive input-output samples of the network are collected, where $D$ is the input dimension and $m$ is the number of neurons of the network. Moreover, we empirically observe that the approach applies to networks with up to $\mathcal O(D \times m_L)$ neurons, where $m_L$ is the number of output neurons at layer $L$. Provided knowledge of layer assignments of entangled weights and of remaining scaling and shift parameters, which may be further heuristically obtained by least squares, the entangled weights identify the network completely and uniquely. To highlight the relevance of the theoretical result of stable recovery of entangled weights, we present numerical experiments, which demonstrate that multilayered networks with generic weights can be robustly identified and therefore uniformly approximated by the presented algorithmic pipeline. In contrast backpropagation cannot generalize stably very well in this setting, being always limited by relatively large uniform error. In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability. Moreover, our method paves the way for compression of overparametrized networks and for the training of minimal complexity networks.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Consensus-Based Optimization on Hypersurfaces: Well-Posedness and Mean-Field Limit
Authors:
Massimo Fornasier,
Hui Huang,
Lorenzo Pareschi,
Philippe Sünnen
Abstract:
We introduce a new stochastic differential model for global optimization of nonconvex functions on compact hypersurfaces. The model is inspired by the stochastic Kuramoto-Vicsek system and belongs to the class of Consensus-Based Optimization methods. In fact, particles move on the hypersurface driven by a drift towards an instantaneous consensus point, computed as a convex combination of the parti…
▽ More
We introduce a new stochastic differential model for global optimization of nonconvex functions on compact hypersurfaces. The model is inspired by the stochastic Kuramoto-Vicsek system and belongs to the class of Consensus-Based Optimization methods. In fact, particles move on the hypersurface driven by a drift towards an instantaneous consensus point, computed as a convex combination of the particle locations weighted by the cost function according to Laplace's principle. The consensus point represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached, then the stochastic component vanishes. In this paper, we study the well-posedness of the model and we derive rigorously its mean-field approximation for large particle limit.
△ Less
Submitted 7 December, 2020; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Consensus-Based Optimization on the Sphere: Convergence to Global Minimizers and Machine Learning
Authors:
Massimo Fornasier,
Hui Huang,
Lorenzo Pareschi,
Philippe Sünnen
Abstract:
We investigate the implementation of a new stochastic Kuramoto-Vicsek-type model for global optimization of nonconvex functions on the sphere. This model belongs to the class of Consensus-Based Optimization. In fact, particles move on the sphere driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function…
▽ More
We investigate the implementation of a new stochastic Kuramoto-Vicsek-type model for global optimization of nonconvex functions on the sphere. This model belongs to the class of Consensus-Based Optimization. In fact, particles move on the sphere driven by a drift towards an instantaneous consensus point, which is computed as a convex combination of particle locations, weighted by the cost function according to Laplace's principle, and it represents an approximation to a global minimizer. The dynamics is further perturbed by a random vector field to favor exploration, whose variance is a function of the distance of the particles to the consensus point. In particular, as soon as the consensus is reached the stochastic component vanishes. The main results of this paper are about the proof of convergence of the numerical scheme to global minimizers provided conditions of well-preparation of the initial datum. The proof combines previous results of mean-field limit with a novel asymptotic analysis, and classical convergence results of numerical methods for SDE. We present several numerical experiments, which show that the algorithm proposed in the present paper scales well with the dimension and is extremely versatile. To quantify the performances of the new approach, we show that the algorithm is able to perform essentially as good as ad hoc state of the art methods in challenging problems in signal processing and machine learning, namely the phase retrieval problem and the robust subspace detection.
△ Less
Submitted 28 July, 2021; v1 submitted 31 January, 2020;
originally announced January 2020.
-
Data-driven Evolutions of Critical Points
Authors:
Stefano Almi,
Massimo Fornasier,
Richard Huber
Abstract:
In this paper we are concerned with the learnability of energies from data obtained by observing time evolutions of their critical points starting at random initial equilibria. As a byproduct of our theoretical framework we introduce the novel concept of mean-field limit of critical point evolutions and of their energy balance as a new form of transport. We formulate the energy learning as a varia…
▽ More
In this paper we are concerned with the learnability of energies from data obtained by observing time evolutions of their critical points starting at random initial equilibria. As a byproduct of our theoretical framework we introduce the novel concept of mean-field limit of critical point evolutions and of their energy balance as a new form of transport. We formulate the energy learning as a variational problem, minimizing the discrepancy of energy competitors from fulfilling the equilibrium condition along any trajectory of critical points originated at random initial equilibria. By Gamma-convergence arguments we prove the convergence of minimal solutions obtained from finite number of observations to the exact energy in a suitable sense. The abstract framework is actually fully constructive and numerically implementable. Hence, the approximation of the energy from a finite number of observations of past evolutions allows to simulate further evolutions, which are fully data-driven. As we aim at a precise quantitative analysis, and to provide concrete examples of tractable solutions, we present analytic and numerical results on the reconstruction of an elastic energy for a one-dimensional model of thin nonlinear-elastic rod.
△ Less
Submitted 1 November, 2019;
originally announced November 2019.
-
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
Authors:
Massimo Fornasier,
Timo Klock,
Michael Rauchensteiner
Abstract:
We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $f(x)=1^T h(B^T g(A^T x))$ on $\mathbb R^d$ from a small number of query samples. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subs…
▽ More
We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $f(x)=1^T h(B^T g(A^T x))$ on $\mathbb R^d$ from a small number of query samples. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subspace $\mathcal W$ spanned by symmetric tensors $a_1 \otimes a_1 ,\dots,a_{m_0}\otimes a_{m_0}$ formed by weights of the first layer together with the entangled symmetric tensors $v_1 \otimes v_1 ,\dots,v_{m_1}\otimes v_{m_1}$, formed by suitable combinations of the weights of the first and second layer as $v_\ell=A G_0 b_\ell/\|A G_0 b_\ell\|_2$, $\ell \in [m_1]$, for a diagonal matrix $G_0$ depending on the activation functions of the first layer. The identification of the 1-rank symmetric tensors within $\mathcal W$ is then performed by the solution of a robust nonlinear program. We provide guarantees of stable recovery under a posteriori verifiable conditions. We further address the correct attribution of approximate weights to the first or second layer. By using a suitably adapted gradient descent iteration, it is possible then to estimate, up to intrinsic symmetries, the shifts of the activations functions of the first layer and compute exactly the matrix $G_0$. Our method of identification of the weights of the network is fully constructive, with quantifiable sample complexity, and therefore contributes to dwindle the black-box nature of the network training phase. We corroborate our theoretical results by extensive numerical experiments.
△ Less
Submitted 30 June, 2019;
originally announced July 2019.
-
Spatially Inhomogeneous Evolutionary Games
Authors:
Luigi Ambrosio,
Massimo Fornasier,
Marco Morandotti,
Giuseppe Savaré
Abstract:
We introduce and study a mean-field model for a system of spatially distributed players interacting through an evolutionary game driven by a replicator dynamics. Strategies evolve by a replicator dynamics influenced by the position and the interaction between different players and return a feedback on the velocity field guiding their motion.
One of the main novelties of our approach concerns the…
▽ More
We introduce and study a mean-field model for a system of spatially distributed players interacting through an evolutionary game driven by a replicator dynamics. Strategies evolve by a replicator dynamics influenced by the position and the interaction between different players and return a feedback on the velocity field guiding their motion.
One of the main novelties of our approach concerns the description of the whole system, which can be represented by an evolving probability measure $Σ$ on an infinite dimensional state space (pairs $(x,σ)$ of position and distribution of strategies). We provide a Lagrangian and a Eulerian description of the evolution, and we prove their equivalence, together with existence, uniqueness, and stability of the solution. As a byproduct of the stability result, we also obtain convergence of the finite agents model to our mean-field formulation, when the number $N$ of the players goes to infinity, and the initial discrete distribution of positions and strategies converge.
To this aim we develop some basic functional analytic tools to deal with interaction dynamics and continuity equations in Banach spaces, that could be of independent interest.
△ Less
Submitted 10 May, 2018;
originally announced May 2018.
-
Robust and Resource Efficient Identification of Shallow Neural Networks by Fewest Samples
Authors:
Massimo Fornasier,
Jan Vybíral,
Ingrid Daubechies
Abstract:
We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper…
▽ More
We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation - active sampling, where the sampling points are universal, actively, and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first and second order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors $a_1,\dots,a_m$. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can costructively reduce to this case by a whitening procedure, without loss of any generality.
△ Less
Submitted 6 May, 2021; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Mean-field optimal control as Gamma-limit of finite agent controls
Authors:
Massimo Fornasier,
Stefano Lisini,
Carlo Orrieri,
Giuseppe Savaré
Abstract:
This paper focuses on the role of a government of a large population of interacting agents as a mean field optimal control problem derived from deterministic finite agent dynamics. The control problems are constrained by a PDE of continuity-type without diffusion, governing the dynamics of the probability distribution of the agent population. We derive existence of optimal controls in a measure-th…
▽ More
This paper focuses on the role of a government of a large population of interacting agents as a mean field optimal control problem derived from deterministic finite agent dynamics. The control problems are constrained by a PDE of continuity-type without diffusion, governing the dynamics of the probability distribution of the agent population. We derive existence of optimal controls in a measure-theoretical setting as natural limits of finite agent optimal controls without any assumption on the regularity of control competitors. In particular, we prove the consistency of mean-field optimal controls with corresponding underlying finite agent ones. The results follow from a $Γ$-convergence argument constructed over the mean-field limit, which stems from leveraging the superposition principle.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Robust Recovery of Low-Rank Matrices with Non-Orthogonal Sparse Decomposition from Incomplete Measurements
Authors:
Massimo Fornasier,
Johannes Maly,
Valeriya Naumova
Abstract:
We consider the problem of recovering an unknown effectively $(s_1,s_2)$-sparse low-rank-$R$ matrix $X$ with possibly non-orthogonal rank-$1$ decomposition from incomplete and inaccurate linear measurements of the form $y = \mathcal A (X) + η$, where $η$ is an ineliminable noise. We first derive an optimization formulation for matrix recovery under the considered model and propose a novel algorith…
▽ More
We consider the problem of recovering an unknown effectively $(s_1,s_2)$-sparse low-rank-$R$ matrix $X$ with possibly non-orthogonal rank-$1$ decomposition from incomplete and inaccurate linear measurements of the form $y = \mathcal A (X) + η$, where $η$ is an ineliminable noise. We first derive an optimization formulation for matrix recovery under the considered model and propose a novel algorithm, called Alternating Tikhonov regularization and Lasso (A-T-LA$\text{S}_{2,1}$), to solve it. The algorithm is based on a multi-penalty regularization, which is able to leverage both structures (low-rankness and sparsity) simultaneously. The algorithm is a fast first order method, and straightforward to implement. We prove global convergence for any linear measurement model to stationary points and local convergence to global minimizers. By adapting the concept of restricted isometry property from compressed sensing to our novel model class, we prove error bounds between global minimizers and ground truth, up to noise level, from a number of subgaussian measurements scaling as $R(s_1+s_2)$, up to log-factors in the dimension, and relative-to-diameter distortion. Simulation results demonstrate both the accuracy and efficacy of the algorithm, as well as its superiority to the state-of-the-art algorithms in strong noise regimes and for matrices, whose singular vectors do not possess exact (joint-) sparse support.
△ Less
Submitted 28 July, 2020; v1 submitted 18 January, 2018;
originally announced January 2018.
-
A Relaxed Kačanov Iteration for the $p$-Poisson Problem
Authors:
Lars Diening,
Massimo Fornasier,
Maximilian Wank
Abstract:
In this paper we introduce and analyze an iteratively re-weighted algorithm, that allows to approximate the weak solution of the $p$-Poisson problem for $1 < p \leq 2$ by iteratively solving a sequence of linear elliptic problems. The algorithm can be interpreted as a relaxed Ka{\v c}anov iteration, as so-called in the specific literature of the numerical solution of quasi-linear equations. The ma…
▽ More
In this paper we introduce and analyze an iteratively re-weighted algorithm, that allows to approximate the weak solution of the $p$-Poisson problem for $1 < p \leq 2$ by iteratively solving a sequence of linear elliptic problems. The algorithm can be interpreted as a relaxed Ka{\v c}anov iteration, as so-called in the specific literature of the numerical solution of quasi-linear equations. The main contribution of the paper is proving that the algorithm converges at least with an algebraic rate.
△ Less
Submitted 23 September, 2022; v1 submitted 13 February, 2017;
originally announced February 2017.
-
A Boltzmann approach to mean-field sparse feedback control
Authors:
Giacomo Albi,
Massimo Fornasier,
Dante Kalise
Abstract:
We study the synthesis of optimal control policies for large-scale multi-agent systems. The optimal control design induces a parsimonious control intervention by means of l-1, sparsity-promoting control penalizations. We study instantaneous and infinite horizon sparse optimal feedback controllers. In order to circumvent the dimensionality issues associated to the control of large-scale agent-based…
▽ More
We study the synthesis of optimal control policies for large-scale multi-agent systems. The optimal control design induces a parsimonious control intervention by means of l-1, sparsity-promoting control penalizations. We study instantaneous and infinite horizon sparse optimal feedback controllers. In order to circumvent the dimensionality issues associated to the control of large-scale agent-based models, we follow a Boltzmann approach. We generate (sub)optimal controls signals for the kinetic limit of the multi-agent dynamics, by sampling of the optimal solution of the associated two-agent dynamics. Numerical experiments assess the performance of the proposed sparse design.
△ Less
Submitted 12 November, 2016;
originally announced November 2016.
-
A Machine Learning Approach to Optimal Tikhonov Regularisation I: Affine Manifolds
Authors:
Ernesto De Vito,
Massimo Fornasier,
Valeriya Naumova
Abstract:
Despite a variety of available techniques the issue of the proper regularization parameter choice for inverse problems still remains one of the biggest challenges. The main difficulty lies in constructing a rule, allowing to compute the parameter from given noisy data without relying either on a priori knowledge of the solution or on the noise level. In this paper we propose a novel method based o…
▽ More
Despite a variety of available techniques the issue of the proper regularization parameter choice for inverse problems still remains one of the biggest challenges. The main difficulty lies in constructing a rule, allowing to compute the parameter from given noisy data without relying either on a priori knowledge of the solution or on the noise level. In this paper we propose a novel method based on supervised machine learning to approximate the high-dimensional function, map** noisy data into a good approximation to the optimal Tikhonov regularization parameter. Our assumptions are that solutions of the inverse problem are statistically distributed in a concentrated manner on (lower-dimensional) linear subspaces and the noise is sub-gaussian. One of the surprising facts is that the number of previously observed examples for the supervised learning of the optimal parameter map** scales at most linearly with the dimension of the solution subspace. We also provide explicit error bounds on the accuracy of the approximated parameter and the corresponding regularization solution. Even though the results are more of theoretical nature, we present a recipe for the practical implementation of the approach and provide numerical experiments confirming the theoretical results. We also outline interesting directions for future research with some preliminary results, confirming their feasibility.
△ Less
Submitted 12 October, 2017; v1 submitted 6 October, 2016;
originally announced October 2016.
-
Sparse Control of Multiagent Systems
Authors:
Mattia Bongini,
Massimo Fornasier
Abstract:
In recent years, numerous studies have focused on the mathematical modeling of social dynamics, with self-organization, i.e., the autonomous pattern formation, as the main driving concept. Usually, first or second order models are employed to reproduce, at least qualitatively, certain global patterns (such as bird flocking, milling schools of fish or queue formations in pedestrian flows, just to m…
▽ More
In recent years, numerous studies have focused on the mathematical modeling of social dynamics, with self-organization, i.e., the autonomous pattern formation, as the main driving concept. Usually, first or second order models are employed to reproduce, at least qualitatively, certain global patterns (such as bird flocking, milling schools of fish or queue formations in pedestrian flows, just to mention a few). It is, however, common experience that self-organization does not always spontaneously occur in a society. In this review chapter we aim to describe the limitations of decentralized controls in restoring certain desired configurations and to address the question of whether it is possible to externally and parsimoniously influence the dynamics to reach a given outcome. More specifically, we address the issue of finding the sparsest control strategy for finite agent-based models in order to lead the dynamics optimally towards a desired pattern.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Mean field control hierarchy
Authors:
Giacomo Albi,
Young-Pil Choi,
Massimo Fornasier,
Dante Kalise
Abstract:
In this paper we model the role of a government of a large population as a mean field optimal control problem. Such control problems are constrainted by a PDE of continuity-type, governing the dynamics of the probability distribution of the agent population. We show the existence of mean field optimal controls both in the stochastic and deterministic setting. We derive rigorously the first order o…
▽ More
In this paper we model the role of a government of a large population as a mean field optimal control problem. Such control problems are constrainted by a PDE of continuity-type, governing the dynamics of the probability distribution of the agent population. We show the existence of mean field optimal controls both in the stochastic and deterministic setting. We derive rigorously the first order optimality conditions useful for numerical computation of mean field optimal controls. We introduce a novel approximating hierarchy of sub-optimal controls based on a Boltzmann approach, whose computation requires a very moderate numerical complexity with respect to the one of the optimal control. We provide numerical experiments for models in opinion formation comparing the behavior of the control hierarchy.
△ Less
Submitted 4 August, 2016;
originally announced August 2016.
-
Inferring Interaction Rules From Observations of Evolutive Systems I: The Variational Approach
Authors:
Mattia Bongini,
Massimo Fornasier,
Markus Hansen,
Mauro Maggioni
Abstract:
In this paper we are concerned with the learnability of nonlocal interaction kernels for first order systems modeling certain social interactions, from observations of realizations of their dynamics. This paper is the first of a series on learnability of nonlocal interaction kernels and presents a variational approach to the problem. In particular, we assume here that the kernel to be learned is b…
▽ More
In this paper we are concerned with the learnability of nonlocal interaction kernels for first order systems modeling certain social interactions, from observations of realizations of their dynamics. This paper is the first of a series on learnability of nonlocal interaction kernels and presents a variational approach to the problem. In particular, we assume here that the kernel to be learned is bounded and locally Lipschitz continuous and that the initial conditions of the systems are drawn identically and independently at random according to a given initial probability distribution. Then the minimization over a rather arbitrary sequence of (finite dimensional) subspaces of a least square functional measuring the discrepancy from observed trajectories produces uniform approximations to the kernel on compact sets. The convergence result is obtained by combining mean-field limits, transport methods, and a $Γ$-convergence argument. A crucial condition for the learnability is a certain coercivity property of the least square functional, majoring an $L_2$-norm discrepancy to the kernel with respect to a probability measure, depending on the given initial probability distribution by suitable push forwards and transport maps. We illustrate the convergence result by means of several numerical experiments.
△ Less
Submitted 16 February, 2016; v1 submitted 31 January, 2016;
originally announced February 2016.
-
Conjugate gradient acceleration of iteratively re-weighted least squares methods
Authors:
Massimo Fornasier,
Steffen Peter,
Holger Rauhut,
Stephan Worm
Abstract:
Iteratively Re-weighted Least Squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear…
▽ More
Iteratively Re-weighted Least Squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear algebra. Its general scope and its usually simple implementation, transforming the initial non-convex and non-smooth minimization problem into a more familiar and easily solvable quadratic optimization problem, make it a versatile algorithm. However, despite its simplicity, versatility, and elegant analysis, the complexity of IRLS strongly depends on the way the solution of the successive quadratic optimizations is addressed. For the important special case of $\textit{compressed sensing}$ and sparse recovery problems in signal processing, we investigate theoretically and numerically how accurately one needs to solve the quadratic problems by means of the $\textit{conjugate gradient}$ (CG) method in each iteration in order to guarantee convergence. The use of the CG method may significantly speed-up the numerical solution of the quadratic subproblems, in particular, when fast matrix-vector multiplication (exploiting for instance the FFT) is available for the matrix involved. In addition, we study convergence rates. Our modified IRLS method outperforms state of the art first order methods such as Iterative Hard Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in many situations, especially in large dimensions. Moreover, IRLS is often able to recover sparse vectors from fewer measurements than required for IHT and FISTA.
△ Less
Submitted 23 February, 2016; v1 submitted 14 September, 2015;
originally announced September 2015.
-
Linearly constrained evolutions of critical points and an application to cohesive fractures
Authors:
Marco Artina,
Filippo Cagnetti,
Massimo Fornasier,
Francesco Solombrino
Abstract:
We introduce a novel constructive approach to define time evolution of critical points of an energy functional. Our procedure, which is different from other more established approaches based on viscosity approximations in infinite dimension, is prone to efficient and consistent numerical implementations, and allows for an existence proof under very general assumptions. We consider in particular ra…
▽ More
We introduce a novel constructive approach to define time evolution of critical points of an energy functional. Our procedure, which is different from other more established approaches based on viscosity approximations in infinite dimension, is prone to efficient and consistent numerical implementations, and allows for an existence proof under very general assumptions. We consider in particular rather nonsmooth and nonconvex energy functionals, provided the domain of the energy is finite dimensional. Nevertheless, in the infinite dimensional case study of a cohesive fracture model, we prove a consistency theorem of a discrete-to-continuum limit. We show that a quasistatic evolution can be indeed recovered as a limit of evolutions of critical points of finite dimensional discretizations of the energy, constructed according to our scheme. To illustrate the results, we provide several numerical experiments both in one and two dimensions. These agree with the crack initiation criterion, which states that a fracture appears only when the stress overcomes a certain threshold, depending on the material.
△ Less
Submitted 7 July, 2016; v1 submitted 12 August, 2015;
originally announced August 2015.
-
Mean-Field Pontryagin Maximum Principle
Authors:
Mattia Bongini,
Massimo Fornasier,
Francesco Rossi,
Francesco Solombrino
Abstract:
We derive a Maximum Principle for optimal control problems with constraints given by the coupling of a system of ODEs and a PDE of Vlasov-type. Such problems arise naturally as $Γ$-limits of optimal control problems subject to ODE constraints, modeling, for instance, external interventions on crowd dynamics. We obtain these first-order optimality conditions in the form of Hamiltonian flows in the…
▽ More
We derive a Maximum Principle for optimal control problems with constraints given by the coupling of a system of ODEs and a PDE of Vlasov-type. Such problems arise naturally as $Γ$-limits of optimal control problems subject to ODE constraints, modeling, for instance, external interventions on crowd dynamics. We obtain these first-order optimality conditions in the form of Hamiltonian flows in the Wasserstein space of probability measures with forward-backward boundary conditions with respect to the first and second marginals, respectively. In particular, we recover the equations and their solutions by means of a constructive procedure, which can be seen as the mean-field limit of the Pontryagin Maximum Principle applied to the discrete optimal control problems, under a suitable scaling of the adjoint variables.
△ Less
Submitted 9 April, 2015;
originally announced April 2015.
-
(Un)conditional consensus emergence under feedback controls
Authors:
Mattia Bongini,
Massimo Fornasier,
Dante Kalise
Abstract:
We study the problem of consensus emergence in multi-agent systems via external feedback controllers. We consider a set of agents interacting with dynamics given by a Cucker-Smale type of model, and study its consensus stabilization by means of centralized and decentralized control configurations. We present a characterization of consensus emergence for systems with different feedback structures,…
▽ More
We study the problem of consensus emergence in multi-agent systems via external feedback controllers. We consider a set of agents interacting with dynamics given by a Cucker-Smale type of model, and study its consensus stabilization by means of centralized and decentralized control configurations. We present a characterization of consensus emergence for systems with different feedback structures, such as leader-based configurations, perturbed information feedback, and feedback computed upon spatially confined information. We characterize consensus emergence for this latter design as a parameter-dependent transition regime between self-regulation and centralized feedback stabilization. Numerical experiments illustrate the different features of the proposed designs.
△ Less
Submitted 21 February, 2015;
originally announced February 2015.
-
Sparse Control of Alignment Models in High Dimension
Authors:
Mattia Bongini,
Massimo Fornasier,
Oliver Junge,
Benjamin Scharf
Abstract:
For high dimensional particle systems, governed by smooth nonlinearities depending on mutual distances between particles, one can construct low-dimensional representations of the dynamical system, which allow the learning of nearly optimal control strategies in high dimension with overwhelming confidence. In this paper we present an instance of this general statement tailored to the sparse control…
▽ More
For high dimensional particle systems, governed by smooth nonlinearities depending on mutual distances between particles, one can construct low-dimensional representations of the dynamical system, which allow the learning of nearly optimal control strategies in high dimension with overwhelming confidence. In this paper we present an instance of this general statement tailored to the sparse control of models of consensus emergence in high dimension, projected to lower dimensions by means of random linear maps. We show that one can steer, nearly optimally and with high probability, a high-dimensional alignment model to consensus by acting at each switching time on one agent of the system only, with a control rule chosen essentially exclusively according to information gathered from a randomly drawn low-dimensional representation of the control system.
△ Less
Submitted 27 August, 2014;
originally announced August 2014.
-
Mean-Field Sparse Optimal Control
Authors:
Massimo Fornasier,
Benedetto Piccoli,
Francesco Rossi
Abstract:
We introduce the rigorous limit process connecting finite dimensional sparse optimal control problems with ODE constraints, modeling parsimonious interventions on the dynamics of a moving population divided into leaders and followers, to an infinite dimensional optimal control problem with a constraint given by a system of ODE for the leaders coupled with a PDE of Vlasov-type, governing the dynami…
▽ More
We introduce the rigorous limit process connecting finite dimensional sparse optimal control problems with ODE constraints, modeling parsimonious interventions on the dynamics of a moving population divided into leaders and followers, to an infinite dimensional optimal control problem with a constraint given by a system of ODE for the leaders coupled with a PDE of Vlasov-type, governing the dynamics of the probability distribution of the followers. In the classical mean-field theory one studies the behavior of a large number of small individuals freely interacting with each other, by simplifying the effect of all the other individuals on any given individual by a single averaged effect. In this paper we address instead the situation where the leaders are actually influenced also by an external policy maker, and we propagate its effect for the number $N$ of followers going to infinity. The technical derivation of the sparse mean-field optimal control is realized by the simultaneous development of the mean-field limit of the equations governing the followers dynamics together with the $Γ$-limit of the finite dimensional sparse optimal control problems.
△ Less
Submitted 10 March, 2014; v1 submitted 23 February, 2014;
originally announced February 2014.
-
Asymptotic Behavior of Gradient Flows Driven by Nonlocal Power Repulsion and Attraction Potentials in One Dimension
Authors:
Marco Di Francesco,
Massimo Fornasier,
Jan-Christian Hütter,
Daniel Matthes
Abstract:
We study the long time behavior of the Wasserstein gradient flow for an energy functional consisting of two components: particles are attracted to a fixed profile $ω$ by means of an interaction kernel $ψ_a(z)=|z|^{q_a}$,and they repel each other by means of another kernel $ψ_r(z)=|z|^{q_r}$. We focus on the case of one space dimension and assume that $1\le q_r\le q_a\le 2$.
Our main result is th…
▽ More
We study the long time behavior of the Wasserstein gradient flow for an energy functional consisting of two components: particles are attracted to a fixed profile $ω$ by means of an interaction kernel $ψ_a(z)=|z|^{q_a}$,and they repel each other by means of another kernel $ψ_r(z)=|z|^{q_r}$. We focus on the case of one space dimension and assume that $1\le q_r\le q_a\le 2$.
Our main result is that the flow converges to an equilibrium if either $q_r<q_a$ or $1\le q_r=q_a\le4/3$,and if the solution has the same (conserved) mass as the reference state $ω$. In the cases $q_r=1$ and $q_r=2$, we are able to discuss the behavior for different masses as well, and we explicitly identify the equilibrium state, which is independent of the initial condition. Our proofs heavily use the inverse distribution function of the solution.
△ Less
Submitted 10 January, 2014;
originally announced January 2014.
-
Quasi-Linear Compressed Sensing
Authors:
Martin Ehler,
Massimo Fornasier,
Juliane Sigl
Abstract:
Inspired by significant real-life applications, in particular, sparse phase retrieval and sparse pulsation frequency detection in Asteroseismology, we investigate a general framework for compressed sensing, where the measurements are quasi-linear. We formulate natural generalizations of the well-known Restricted Isometry Property (RIP) towards nonlinear measurements, which allow us to prove both u…
▽ More
Inspired by significant real-life applications, in particular, sparse phase retrieval and sparse pulsation frequency detection in Asteroseismology, we investigate a general framework for compressed sensing, where the measurements are quasi-linear. We formulate natural generalizations of the well-known Restricted Isometry Property (RIP) towards nonlinear measurements, which allow us to prove both unique identifiability of sparse signals as well as the convergence of recovery algorithms to compute them efficiently. We show that for certain randomized quasi-linear measurements, including Lipschitz perturbations of classical RIP matrices and phase retrieval from random projections, the proposed restricted isometry properties hold with high probability. We analyze a generalized Orthogonal Least Squares (OLS) under the assumption that magnitudes of signal entries to be recovered decay fast. Greed is good again, as we show that this algorithm performs efficiently in phase retrieval and asteroseismology. For situations where the decay assumption on the signal does not necessarily hold, we propose two alternative algorithms, which are natural generalizations of the well-known iterative hard and soft-thresholding. While these algorithms are rarely successful for the mentioned applications, we show their strong recovery guarantees for quasi-linear measurements which are Lipschitz perturbations of RIP matrices.
△ Less
Submitted 7 November, 2013;
originally announced November 2013.
-
Consistency of Probability Measure Quantization by Means of Power Repulsion-Attraction Potentials
Authors:
Massimo Fornasier,
Jan-Christian Hütter
Abstract:
This paper is concerned with the study of the consistency of a variational method for probability measure quantization, deterministically realized by means of a minimizing principle, balancing power repulsion and attraction potentials. The proof of consistency is based on the construction of a target energy functional whose unique minimizer is actually the given probability measure ωto be quantize…
▽ More
This paper is concerned with the study of the consistency of a variational method for probability measure quantization, deterministically realized by means of a minimizing principle, balancing power repulsion and attraction potentials. The proof of consistency is based on the construction of a target energy functional whose unique minimizer is actually the given probability measure ωto be quantized. Then we show that the discrete functionals, defining the discrete quantizers as their minimizers, actually Γ-converge to the target energy with respect to the narrow topology on the space of probability measures. A key ingredient is the reformulation of the target functional by means of a Fourier representation, which extends the characterization of conditionally positive semi-definite functions from points in generic position to probability measures. As a byproduct of the Fourier representation, we also obtain compactness of sublevels of the target energy in terms of uniform moment bounds, which already found applications in the asymptotic analysis of corresponding gradient flows. To model situations where the given probability is affected by noise, we additionally consider a modified energy, with the addition of a regularizing total variation term and we investigate again its point mass approximations in terms of Γ-convergence. We show that such a discrete measure representation of the total variation can be interpreted as an additional nonlinear potential, repulsive at a short range, attractive at a medium range, and at a long range not having effect, promoting a uniform distribution of the point masses.
△ Less
Submitted 3 October, 2013;
originally announced October 2013.
-
Dam** Noise-Folding and Enhanced Support Recovery in Compressed Sensing
Authors:
Marco Artina,
Massimo Fornasier,
Steffen Peter
Abstract:
The practice of compressed sensing suffers importantly in terms of the efficiency/accuracy trade-off when acquiring noisy signals prior to measurement. It is rather common to find results treating the noise affecting the measurements, avoiding in this way to face the so-called $\textit{noise-folding}$ phenomenon, related to the noise in the signal, eventually amplified by the measurement procedure…
▽ More
The practice of compressed sensing suffers importantly in terms of the efficiency/accuracy trade-off when acquiring noisy signals prior to measurement. It is rather common to find results treating the noise affecting the measurements, avoiding in this way to face the so-called $\textit{noise-folding}$ phenomenon, related to the noise in the signal, eventually amplified by the measurement procedure. In this paper, we present two new decoding procedures, combining $\ell_1$-minimization followed by either a regularized selective least $p$-powers or an iterative hard thresholding, which not only are able to reduce this component of the original noise, but also have enhanced properties in terms of support identification with respect to the sole $\ell_1$-minimization or iteratively re-weighted $\ell_1$-minimization. We prove such features, providing relatively simple and precise theoretical guarantees. We additionally confirm and support the theoretical results by extensive numerical simulations, which give a statistics of the robustness of the new decoding procedures with respect to more classical $\ell_1$-minimization and iteratively re-weighted $\ell_1$-minimization.
△ Less
Submitted 24 November, 2014; v1 submitted 22 July, 2013;
originally announced July 2013.
-
Mean-Field Optimal Control
Authors:
Massimo Fornasier,
Francesco Solombrino
Abstract:
We introduce the concept of {\it mean-field optimal control} which is the rigorous limit process connecting finite dimensional optimal control problems with ODE constraints modeling multi-agent interactions to an infinite dimensional optimal control problem with a constraint given by a PDE of Vlasov-type, governing the dynamics of the probability distribution of interacting agents. While in the cl…
▽ More
We introduce the concept of {\it mean-field optimal control} which is the rigorous limit process connecting finite dimensional optimal control problems with ODE constraints modeling multi-agent interactions to an infinite dimensional optimal control problem with a constraint given by a PDE of Vlasov-type, governing the dynamics of the probability distribution of interacting agents. While in the classical mean-field theory one studies the behavior of a large number of small individuals {\it freely interacting} with each other, by simplifying the effect of all the other individuals on any given individual by a single averaged effect, we address the situation where the individuals are actually influenced also by an external {\it policy maker}, and we propagate its effect for the number $N$ of individuals going to infinity. On the one hand, from a modeling point of view, we take into account also that the policy maker is constrained to act according to optimal strategies promoting its most parsimonious interaction with the group of individuals. This will be realized by considering cost functionals including $L^1$-norm terms penalizing a broadly distributed control of the group, while promoting its sparsity. On the other hand, from the analysis point of view, and for the sake of generality, we consider broader classes of convex control penalizations. In order to develop this new concept of limit rigorously, we need to carefully combine the classical concept of mean-field limit, connecting the finite dimensional system of ODE describing the dynamics of each individual of the group to the PDE describing the dynamics of the respective probability distribution, with the well-known concept of $Γ$-convergence to show that optimal strategies for the finite dimensional problems converge to optimal strategies of the infinite dimensional problem.
△ Less
Submitted 25 June, 2013;
originally announced June 2013.
-
Sparse Stabilization and Control of Alignment Models
Authors:
Marco Caponigro,
Massimo Fornasier,
Benedetto Piccoli,
Emmanuel Trélat
Abstract:
From a mathematical point of view self-organization can be described as patterns to which certain dynamical systems modeling social dynamics tend spontaneously to be attracted. In this paper we explore situations beyond self-organization, in particular how to externally control such dynamical systems in order to eventually enforce pattern formation also in those situations where this wished phenom…
▽ More
From a mathematical point of view self-organization can be described as patterns to which certain dynamical systems modeling social dynamics tend spontaneously to be attracted. In this paper we explore situations beyond self-organization, in particular how to externally control such dynamical systems in order to eventually enforce pattern formation also in those situations where this wished phenomenon does not result from spontaneous convergence. Our focus is on dynamical systems of Cucker-Smale type, modeling consensus emergence, and we question the existence of stabilization and optimal control strategies which require the minimal amount of external intervention for nevertheless inducing consensus in a group of interacting agents. We provide a variational criterion to explicitly design feedback controls that are componentwise sparse, i.e. with at most one nonzero component at every instant of time. Controls sharing this sparsity feature are very realistic and convenient for practical issues. Moreover, the maximally sparse ones are instantaneously optimal in terms of the decay rate of a suitably designed Lyapunov functional, measuring the distance from consensus. As a consequence we provide a mathematical justification to the general principle according to which "sparse is better" in the sense that a policy maker, who is not allowed to predict future developments, should always consider more favorable to intervene with stronger action on the fewest possible instantaneous optimal leaders rather than trying to control more agents with minor strength in order to achieve group consensus. We then establish local and global sparse controllability properties to consensus and, finally, we analyze the sparsity of solutions of the finite time optimal control problem where the minimization criterion is a combination of the distance from consensus and of the l1-norm of the control.
△ Less
Submitted 21 March, 2014; v1 submitted 21 October, 2012;
originally announced October 2012.
-
Linearly contrained nonsmooth and nonconvex minimization
Authors:
Marco Artina,
Massimo Fornasier,
Francesco Solombrino
Abstract:
Motivated by variational models in continuum mechanics, we introduce a novel algorithm to perform nonsmooth and nonconvex minimizations with linear constraints in Euclidean spaces. We show how this algorithm is actually a natural generalization of the well-known non-stationary augmented Lagrangian method for convex optimization. The relevant features of this approach are its applicability to a lar…
▽ More
Motivated by variational models in continuum mechanics, we introduce a novel algorithm to perform nonsmooth and nonconvex minimizations with linear constraints in Euclidean spaces. We show how this algorithm is actually a natural generalization of the well-known non-stationary augmented Lagrangian method for convex optimization. The relevant features of this approach are its applicability to a large variety of nonsmooth and nonconvex objective functions, its guaranteed convergence to critical points of the objective energy independently of the choice of the initial value, and its simplicity of implementation. In fact, the algorithm results in a nested double loop iteration. In the inner loop an augmented Lagrangian algorithm performs an adaptive finite number of iterations on a fixed quadratic and strictly convex perturbation of the objective energy, depending on a parameter which is adapted by the external loop. To show the versatility of this new algorithm, we exemplify how it can be used for computing critical points in inverse free-discontinuity variational models, such as the Mumford-Shah functional, and, by doing so, we also derive and analyze new iterative thresholding algorithms.
△ Less
Submitted 15 July, 2013; v1 submitted 29 January, 2012;
originally announced January 2012.
-
Consistency of Variational Continuous-Domain Quantization via Kinetic Theory
Authors:
Massimo Fornasier,
Jan Haskovec,
Gabriele Steidl
Abstract:
We study the kinetic mean-field limits of the discrete systems of interacting particles used for halftoning of images in the sense of continuous-domain quantization. Under mild assumptions on the regularity of the interacting kernels we provide a rigorous derivation of the mean-field kinetic equation. Moreover, we study the energy of the system, show that it is a Lyapunov functional and prove that…
▽ More
We study the kinetic mean-field limits of the discrete systems of interacting particles used for halftoning of images in the sense of continuous-domain quantization. Under mild assumptions on the regularity of the interacting kernels we provide a rigorous derivation of the mean-field kinetic equation. Moreover, we study the energy of the system, show that it is a Lyapunov functional and prove that in the long time limit the solution tends to an equilibrium given by a local minimum of the energy. In a special case we prove that the equilibrium is unique and is identical to the prescribed image profile. This proves the consistency of the particle halftoning method when the number of particles tends to infinity.
△ Less
Submitted 5 December, 2011;
originally announced December 2011.
-
Particle systems and kinetic equations modeling interacting agents in high dimension
Authors:
Massimo Fornasier,
Jan Haskovec,
Jan Vybiral
Abstract:
In this paper we explore how concepts of high-dimensional data compression via random projections onto lower-dimensional spaces can be applied for tractable simulation of certain dynamical systems modeling complex interactions. In such systems, one has to deal with a large number of agents (typically millions) in spaces of parameters describing each agent of high dimension (thousands or more). Eve…
▽ More
In this paper we explore how concepts of high-dimensional data compression via random projections onto lower-dimensional spaces can be applied for tractable simulation of certain dynamical systems modeling complex interactions. In such systems, one has to deal with a large number of agents (typically millions) in spaces of parameters describing each agent of high dimension (thousands or more). Even with today's powerful computers, numerical simulations of such systems are prohibitively expensive. We propose an approach for the simulation of dynamical systems governed by functions of adjacency matrices in high dimension, by random projections via Johnson-Lindenstrauss embeddings, and recovery by compressed sensing techniques. We show how these concepts can be generalized to work for associated kinetic equations, by addressing the phenomenon of the delayed curse of dimension, known in information-based complexity for optimal numerical integration problems in high dimensions.
△ Less
Submitted 7 November, 2011; v1 submitted 13 April, 2011;
originally announced April 2011.
-
Low-rank matrix recovery via iteratively reweighted least squares minimization
Authors:
Massimo Fornasier,
Holger Rauhut,
Rachel Ward
Abstract:
We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Sp…
▽ More
We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with an error of the order of the best k-rank approximation. In certain relevant cases, for instance for the matrix completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which allows to expedite the solution of the least squares problems required at each iteration. We present numerical experiments that confirm the robustness of the algorithm for the solution of matrix completion problems, and demonstrate its competitiveness with respect to other techniques proposed recently in the literature.
△ Less
Submitted 15 July, 2011; v1 submitted 12 October, 2010;
originally announced October 2010.
-
Learning Functions of Few Arbitrary Linear Parameters in High Dimensions
Authors:
Massimo Fornasier,
Karin Schnass,
Jan Vybiral
Abstract:
Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Und…
▽ More
Let us assume that $f$ is a continuous function defined on the unit ball of $\mathbb R^d$, of the form $f(x) = g (A x)$, where $A$ is a $k \times d$ matrix and $g$ is a function of $k$ variables for $k \ll d$. We are given a budget $m \in \mathbb N$ of possible point evaluations $f(x_i)$, $i=1,...,m$, of $f$, which we are allowed to query in order to construct a uniform approximating function. Under certain smoothness and variation assumptions on the function $g$, and an {\it arbitrary} choice of the matrix $A$, we present in this paper
1. a sampling choice of the points $\{x_i\}$ drawn at random for each function approximation;
2. algorithms (Algorithm 1 and Algorithm 2) for computing the approximating function, whose complexity is at most polynomial in the dimension $d$ and in the number $m$ of points.
Due to the arbitrariness of $A$, the choice of the sampling points will be according to suitable random distributions and our results hold with overwhelming probability. Our approach uses tools taken from the {\it compressed sensing} framework, recent Chernoff bounds for sums of positive-semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions.
△ Less
Submitted 17 January, 2012; v1 submitted 18 August, 2010;
originally announced August 2010.
-
A Convergent Overlap** Domain Decomposition Method for Total Variation Minimization
Authors:
Massimo Fornasier,
Andreas Langer,
Carola-Bibiane Schönlieb
Abstract:
This paper is concerned with the analysis of convergent sequential and parallel overlap** domain decomposition methods for the minimization of functionals formed by a discrepancy term with respect to data and a total variation constraint. To our knowledge, this is the first successful attempt of addressing such strategy for the nonlinear, nonadditive, and nonsmooth problem of total variation m…
▽ More
This paper is concerned with the analysis of convergent sequential and parallel overlap** domain decomposition methods for the minimization of functionals formed by a discrepancy term with respect to data and a total variation constraint. To our knowledge, this is the first successful attempt of addressing such strategy for the nonlinear, nonadditive, and nonsmooth problem of total variation minimization. We provide several numerical experiments, showing the successful application of the algorithm for the restoration of 1D signals and 2D images in interpolation/inpainting problems respectively, and in a compressed sensing problem, for recovering piecewise constant medical-type images from partial Fourier ensembles.
△ Less
Submitted 14 May, 2009;
originally announced May 2009.
-
Domain decomposition methods for compressed sensing
Authors:
Massimo Fornasier,
Andreas Langer,
Carola-Bibiane Schönlieb
Abstract:
We present several domain decomposition algorithms for sequential and parallel minimization of functionals formed by a discrepancy term with respect to data and total variation constraints. The convergence properties of the algorithms are analyzed. We provide several numerical experiments, showing the successful application of the algorithms for the restoration 1D and 2D signals in interpolation…
▽ More
We present several domain decomposition algorithms for sequential and parallel minimization of functionals formed by a discrepancy term with respect to data and total variation constraints. The convergence properties of the algorithms are analyzed. We provide several numerical experiments, showing the successful application of the algorithms for the restoration 1D and 2D signals in interpolation/inpainting problems respectively, and in a compressed sensing problem, for recovering piecewise constant medical-type images from partial Fourier ensembles.
△ Less
Submitted 1 February, 2009;
originally announced February 2009.
-
Iterative thresholding meets free discontinuity problems
Authors:
Massimo Fornasier,
Rachel Ward
Abstract:
Free-discontinuity problems describe situations where the solution of interest is defined by a function and a lower dimensional set consisting of the discontinuities of the function. Hence, the derivative of the solution is assumed to be a `small' function almost everywhere except on sets where it concentrates as a singular measure. This is the case, for instance, in crack detection from fractur…
▽ More
Free-discontinuity problems describe situations where the solution of interest is defined by a function and a lower dimensional set consisting of the discontinuities of the function. Hence, the derivative of the solution is assumed to be a `small' function almost everywhere except on sets where it concentrates as a singular measure. This is the case, for instance, in crack detection from fracture mechanics or in certain digital image segmentation problems. If we discretize such situations for numerical purposes, the free-discontinuity problem in the discrete setting can be re-formulated as that of finding a derivative vector with small components at all but a few entries that exceed a certain threshold. This problem is similar to those encountered in the field of `sparse recovery', where vectors with a small number of dominating components in absolute value are recovered from a few given linear measurements via the minimization of related energy functionals. Several iterative thresholding algorithms that intertwine gradient-type iterations with thresholding steps have been designed to recover sparse solutions in this setting. It is natural to wonder if and/or how such algorithms can be used towards solving discrete free-discontinuity problems. The current paper explores this connection, and, by establishing an iterative thresholding algorithm for discrete free-discontinuity problems, provides new insights on properties of minimizing solutions thereof.
△ Less
Submitted 28 April, 2009; v1 submitted 16 January, 2009;
originally announced January 2009.
-
Iteratively re-weighted least squares minimization for sparse recovery
Authors:
Ingrid Daubechies,
Ronald DeVore,
Massimo Fornasier,
C. Sinan Gunturk
Abstract:
We analyze an Iteratively Re-weighted Least Squares (IRLS) algorithm for promoting l1-minimization in sparse and compressible vector recovery. We prove its convergence and we estimate its local rate. We show how the algorithm can be modified in order to promote lt-minimization for t<1, and how this modification produces superlinear rates of convergence.
We analyze an Iteratively Re-weighted Least Squares (IRLS) algorithm for promoting l1-minimization in sparse and compressible vector recovery. We prove its convergence and we estimate its local rate. We show how the algorithm can be modified in order to promote lt-minimization for t<1, and how this modification produces superlinear rates of convergence.
△ Less
Submitted 3 July, 2008;
originally announced July 2008.
-
Subspace correction methods for total variation and $\ell_1-$minimization
Authors:
Massimo Fornasier,
Carola-Bibiane Schönlieb
Abstract:
This paper is concerned with the numerical minimization of energy functionals in Hilbert spaces involving convex constraints coinciding with a semi-norm for a subspace. The optimization is realized by alternating minimizations of the functional on a sequence of orthogonal subspaces. On each subspace an iterative proximity-map algorithm is implemented via \emph{oblique thresholding}, which is the…
▽ More
This paper is concerned with the numerical minimization of energy functionals in Hilbert spaces involving convex constraints coinciding with a semi-norm for a subspace. The optimization is realized by alternating minimizations of the functional on a sequence of orthogonal subspaces. On each subspace an iterative proximity-map algorithm is implemented via \emph{oblique thresholding}, which is the main new tool introduced in this work. We provide convergence conditions for the algorithm in order to compute minimizers of the target energy. Analogous results are derived for a parallel variant of the algorithm. Applications are presented in domain decomposition methods for singular elliptic PDE's arising in total variation minimization and in accelerated sparse recovery algorithms based on $\ell_1$-minimization. We include numerical examples which show efficient solutions to classical problems in signal and image processing.
△ Less
Submitted 13 December, 2007;
originally announced December 2007.