Search | arXiv e-print repository

Path-metrics, pruning, and generalization

Authors: Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval

Abstract: Analyzing the behavior of ReLU neural networks often hinges on understanding the relationships between their parameters and the functions they implement. This paper proves a new bound on function distances in terms of the so-called path-metrics of the parameters. Since this bound is intrinsically invariant with respect to the rescaling symmetries of the networks, it sharpens previously known bound… ▽ More Analyzing the behavior of ReLU neural networks often hinges on understanding the relationships between their parameters and the functions they implement. This paper proves a new bound on function distances in terms of the so-called path-metrics of the parameters. Since this bound is intrinsically invariant with respect to the rescaling symmetries of the networks, it sharpens previously known bounds. It is also, to the best of our knowledge, the first bound of its kind that is broadly applicable to modern networks such as ResNets, VGGs, U-nets, and many more. In contexts such as network pruning and quantization, the proposed path-metrics can be efficiently computed using only two forward passes. Besides its intrinsic theoretical interest, the bound yields not only novel theoretical generalization bounds, but also a promising proof of concept for rescaling-invariant pruning. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.13385 [pdf, other]

A multilevel framework for accelerating uSARA in radio-interferometric imaging

Authors: Guillaume Lauga, Audrey Repetti, Elisa Riccietti, Nelly Pustelnik, Paulo Gonçalves, Yves Wiaux

Abstract: This paper presents a multilevel algorithm specifically designed for radio-interferometric imaging in astronomy. The proposed algorithm is used to solve the uSARA (unconstrained Sparsity Averaging Reweighting Analysis) formulation of this image restoration problem. Multilevel algorithms rely on a hierarchy of approximations of the objective function to accelerate its optimization. In contrast to t… ▽ More This paper presents a multilevel algorithm specifically designed for radio-interferometric imaging in astronomy. The proposed algorithm is used to solve the uSARA (unconstrained Sparsity Averaging Reweighting Analysis) formulation of this image restoration problem. Multilevel algorithms rely on a hierarchy of approximations of the objective function to accelerate its optimization. In contrast to the usual multilevel approaches where this hierarchy is derived in the parameter space, here we construct the hierarchy of approximations in the observation space. The proposed approach is compared to a reweighted forward-backward procedure, which is the backbone iteration scheme for solving the uSARA problem. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2310.01225 [pdf, other]

A path-norm toolkit for modern networks: consequences, promises and challenges

Authors: Antoine Gonon, Nicolas Brisebarre, Elisa Riccietti, Rémi Gribonval

Abstract: This work introduces the first toolkit around path-norms that fully encompasses general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also reco… ▽ More This work introduces the first toolkit around path-norms that fully encompasses general DAG ReLU networks with biases, skip connections and any operation based on the extraction of order statistics: max pooling, GroupSort etc. This toolkit notably allows us to establish generalization bounds for modern neural networks that are not only the most widely applicable path-norm based ones, but also recover or beat the sharpest known bounds of this type. These extended path-norms further enjoy the usual benefits of path-norms: ease of computation, invariance under the symmetries of the network, and improved sharpness on layered fully-connected networks compared to the product of operator norms, another complexity measure most commonly used. The versatility of the toolkit and its ease of implementation allow us to challenge the concrete promises of path-norm-based generalization bounds, by numerically evaluating the sharpest known bounds for ResNets on ImageNet. △ Less

Submitted 13 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2307.00820 [pdf, other]

Butterfly factorization by algorithmic identification of rank-one blocks

Authors: Léon Zheng, Gilles Puy, Elisa Riccietti, Patrick Pérez, Rémi Gribonval

Abstract: Many matrices associated with fast transforms posess a certain low-rank property characterized by the existence of several block partitionings of the matrix, where each block is of low rank. Provided that these partitionings are known, there exist algorithms, called butterfly factorization algorithms, that approximate the matrix into a product of sparse factors, thus enabling a rapid evaluation of… ▽ More Many matrices associated with fast transforms posess a certain low-rank property characterized by the existence of several block partitionings of the matrix, where each block is of low rank. Provided that these partitionings are known, there exist algorithms, called butterfly factorization algorithms, that approximate the matrix into a product of sparse factors, thus enabling a rapid evaluation of the associated linear operator. This paper proposes a new method to identify algebraically these block partitionings for a matrix admitting a butterfly factorization, without any analytical assumption on its entries. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: in French language. XXIX{è}me Colloque Francophone de Traitement du Signal et des Images, Aug 2023, Grenoble, France

arXiv:2306.02666 [pdf, other]

Does a sparse ReLU network training problem always admit an optimum?

Authors: Quoc-Tung Le, Elisa Riccietti, Rémi Gribonval

Abstract: Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular,… ▽ More Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters, and that optimization algorithms may then diverge. Via a new topological relation between sparse ReLU neural networks and their linear counterparts, we derive -- using existing tools from real algebraic geometry -- an algorithm to verify that a given sparsity pattern suffers from this issue. Then, the existence of a global optimum is proved for every concrete optimization problem involving a shallow sparse ReLU neural network of output dimension one. Overall, the analysis is based on the investigation of two topological properties of the space of functions implementable as sparse ReLU neural networks: a best approximation property, and a closedness property, both in the uniform norm. This is studied both for (finite) domains corresponding to practical training on finite training sets, and for more general domains such as the unit cube. This allows us to provide conditions for the guaranteed existence of an optimum given a sparsity pattern. The results apply not only to several sparsity patterns proposed in recent works on network pruning/sparsification, but also to classical dense neural networks, including architectures not covered by existing results. △ Less

Submitted 5 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 - Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023, New Orleans (Lousiane), United States

arXiv:2305.14477 [pdf, other]

A Block-Coordinate Approach of Multi-level Optimization with an Application to Physics-Informed Neural Networks

Authors: Serge Gratton, Valentin Mercier, Elisa Riccietti, Philippe L. Toint

Abstract: Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation com… ▽ More Multi-level methods are widely used for the solution of large-scale problems, because of their computational advantages and exploitation of the complementarity between the involved sub-problems. After a re-interpretation of multi-level methods from a block-coordinate point of view, we propose a multi-level algorithm for the solution of nonlinear optimization problems and analyze its evaluation complexity. We apply it to the solution of partial differential equations using physics-informed neural networks (PINNs) and show on a few test problems that the approach results in better solutions and significant computational savings △ Less

Submitted 25 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.13329 [pdf, other]

IML FISTA: A Multilevel Framework for Inexact and Inertial Forward-Backward. Application to Image Restoration

Authors: Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, Paulo Gonçalves

Abstract: This paper presents a multilevel framework for inertial and inexact proximal algorithms, that encompasses multilevel versions of classical algorithms such as forward-backward and FISTA. The methods are supported by strong theoretical guarantees: we prove both the rate of convergence and the convergence of the iterates to a minimum in the convex case, an important result for ill-posed problems. We… ▽ More This paper presents a multilevel framework for inertial and inexact proximal algorithms, that encompasses multilevel versions of classical algorithms such as forward-backward and FISTA. The methods are supported by strong theoretical guarantees: we prove both the rate of convergence and the convergence of the iterates to a minimum in the convex case, an important result for ill-posed problems. We propose a particular instance of IML (Inexact MultiLevel) FISTA, based on the use of the Moreau envelope to build efficient and useful coarse corrections, fully adapted to solve problems in image restoration. Such a construction is derived for a broad class of composite optimization problems with proximable functions. We evaluate our approach on several image reconstruction problems and we show that it considerably accelerates the convergence of the corresponding one-level (i.e. standard) version of the methods, for large-scale images. △ Less

Submitted 2 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

arXiv:2210.15940 [pdf, other]

Multilevel fista for image restoration

Authors: Guillaume Lauga, Elisa Riccietti, Nelly Pustelnik, Paulo Gonçalves

Abstract: This paper presents a multilevel FISTA algorithm, based on the use of the Moreau envelope to build the correction brought by the coarse models, which is easy to compute when the explicit form of the proximal operator of the considered functions is known. This approach is supported by strong theoretical guarantees: we prove both the rate of convergence and the convergence of the iterates to a minim… ▽ More This paper presents a multilevel FISTA algorithm, based on the use of the Moreau envelope to build the correction brought by the coarse models, which is easy to compute when the explicit form of the proximal operator of the considered functions is known. This approach is supported by strong theoretical guarantees: we prove both the rate of convergence and the convergence of the iterates to a minimum in the convex case, an important result for ill-posed problems. We evaluate our approach on image restoration problems and we show that it outperforms classical FISTA for large-scale images. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2208.00789 [pdf, other]

Self-supervised learning with rotation-invariant kernels

Authors: Léon Zheng, Gilles Puy, Elisa Riccietti, Patrick Pérez, Rémi Gribonval

Abstract: We introduce a regularization loss based on kernel mean embeddings with rotation-invariant kernels on the hypersphere (also known as dot-product kernels) for self-supervised learning of image representations. Besides being fully competitive with the state of the art, our method significantly reduces time and memory complexity for self-supervised training, making it implementable for very large emb… ▽ More We introduce a regularization loss based on kernel mean embeddings with rotation-invariant kernels on the hypersphere (also known as dot-product kernels) for self-supervised learning of image representations. Besides being fully competitive with the state of the art, our method significantly reduces time and memory complexity for self-supervised training, making it implementable for very large embedding dimensions on existing devices and more easily adjustable than previous methods to settings with limited resources. Our work follows the major paradigm where the model learns to be invariant to some predefined image transformations (crop**, blurring, color jittering, etc.), while avoiding a degenerate solution by regularizing the embedding distribution. Our particular contribution is to propose a loss family promoting the embedding distribution to be close to the uniform distribution on the hypersphere, with respect to the maximum mean discrepancy pseudometric. We demonstrate that this family encompasses several regularizers of former methods, including uniformity-based and information-maximization methods, which are variants of our flexible regularization loss with different kernels. Beyond its practical consequences for state-of-the-art self-supervised learning with limited resources, the proposed generic regularization approach opens perspectives to leverage more widely the literature on kernel methods in order to improve self-supervised learning methods. △ Less

Submitted 8 March, 2023; v1 submitted 28 July, 2022; originally announced August 2022.

Journal ref: The Eleventh International Conference on Learning Representations, May 2023, Kigali, Rwanda

arXiv:2205.11874 [pdf, other]

Approximation speed of quantized vs. unquantized ReLU neural networks and beyond

Authors: Antoine Gonon, Nicolas Brisebarre, Rémi Gribonval, Elisa Riccietti

Abstract: We deal with two complementary questions about approximation properties of ReLU networks. First, we study how the uniform quantization of ReLU networks with real-valued weights impacts their approximation properties. We establish an upper-bound on the minimal number of bits per coordinate needed for uniformly quantized ReLU networks to keep the same polynomial asymptotic approximation speeds as un… ▽ More We deal with two complementary questions about approximation properties of ReLU networks. First, we study how the uniform quantization of ReLU networks with real-valued weights impacts their approximation properties. We establish an upper-bound on the minimal number of bits per coordinate needed for uniformly quantized ReLU networks to keep the same polynomial asymptotic approximation speeds as unquantized ones. We also characterize the error of nearest-neighbour uniform quantization of ReLU networks. This is achieved using a new lower-bound on the Lipschitz constant of the map that associates the parameters of ReLU networks to their realization, and an upper-bound generalizing classical results. Second, we investigate when ReLU networks can be expected, or not, to have better approximation properties than other classical approximation families. Indeed, several approximation families share the following common limitation: their polynomial asymptotic approximation speed of any set is bounded from above by the encoding speed of this set. We introduce a new abstract property of approximation families, called infinite-encodability, which implies this upper-bound. Many classical approximation families, defined with dictionaries or ReLU networks, are shown to be infinite-encodable. This unifies and generalizes several situations where this upper-bound is known. △ Less

Submitted 7 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

arXiv:2112.00386 [pdf, other]

Spurious Valleys, NP-hardness, and Tractability of Sparse Matrix Factorization With Fixed Support

Authors: Quoc-Tung Le, Elisa Riccietti, Rémi Gribonval

Abstract: The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero coefficients in the sparse factors, and determining their values. While the first step is usually seen as the most challenging one due to its combinatorial nature, t… ▽ More The problem of approximating a dense matrix by a product of sparse factors is a fundamental problem for many signal processing and machine learning tasks. It can be decomposed into two subproblems: finding the position of the non-zero coefficients in the sparse factors, and determining their values. While the first step is usually seen as the most challenging one due to its combinatorial nature, this paper focuses on the second step, referred to as sparse matrix approximation with fixed support. First, we show its NP-hardness, while also presenting a nontrivial family of supports making the problem practically tractable with a dedicated algorithm. Then, we investigate the landscape of its natural optimization formulation, proving the absence of spurious local valleys and spurious local minima, whose presence could prevent local optimization methods to achieve global optimality. The advantages of the proposed algorithm over state-of-the-art first-order optimization methods are discussed. △ Less

Submitted 22 November, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

arXiv:2110.01235 [pdf, other]

Identifiability in Two-Layer Sparse Matrix Factorization

Authors: Léon Zheng, Elisa Riccietti, Rémi Gribonval

Abstract: Sparse matrix factorization is the problem of approximating a matrix $\mathbf{Z}$ by a product of $J$ sparse factors $\mathbf{X}^{(J)} \mathbf{X}^{(J-1)} \ldots \mathbf{X}^{(1)}$. This paper focuses on identifiability issues that appear in this problem, in view of better understanding under which sparsity constraints the problem is well-posed. We give conditions under which the problem of factori… ▽ More Sparse matrix factorization is the problem of approximating a matrix $\mathbf{Z}$ by a product of $J$ sparse factors $\mathbf{X}^{(J)} \mathbf{X}^{(J-1)} \ldots \mathbf{X}^{(1)}$. This paper focuses on identifiability issues that appear in this problem, in view of better understanding under which sparsity constraints the problem is well-posed. We give conditions under which the problem of factorizing a matrix into \emph{two} sparse factors admits a unique solution, up to unavoidable permutation and scaling equivalences. Our general framework considers an arbitrary family of prescribed sparsity patterns, allowing us to capture more structured notions of sparsity than simply the count of nonzero entries. These conditions are shown to be related to essential uniqueness of exact matrix decomposition into a sum of rank-one matrices, with structured sparsity constraints. In particular, in the case of fixed-support sparse matrix factorization, we give a general sufficient condition for identifiability based on rank-one matrix completability, and we derive from it a completion algorithm that can verify if this sufficient condition is satisfied, and recover the entries in the two sparse factors if this is the case. A companion paper further exploits these conditions to derive identifiability properties and theoretically sound factorization methods for multi-layer sparse matrix factorization with support constraints associated to some well-known fast transforms such as the Hadamard or the Discrete Fourier Transforms. △ Less

Submitted 17 November, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2110.01230

arXiv:2110.01230 [pdf, other]

Efficient Identification of Butterfly Sparse Matrix Factorizations

Authors: Léon Zheng, Elisa Riccietti, Rémi Gribonval

Abstract: Fast transforms correspond to factorizations of the form $\mathbf{Z} = \mathbf{X}^{(1)} \ldots \mathbf{X}^{(J)}$, where each factor $ \mathbf{X}^{(\ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N \times N$ matrix having the so-called… ▽ More Fast transforms correspond to factorizations of the form $\mathbf{Z} = \mathbf{X}^{(1)} \ldots \mathbf{X}^{(J)}$, where each factor $ \mathbf{X}^{(\ell)}$ is sparse and possibly structured. This paper investigates essential uniqueness of such factorizations, i.e., uniqueness up to unavoidable scaling ambiguities. Our main contribution is to prove that any $N \times N$ matrix having the so-called butterfly structure admits an essentially unique factorization into $J$ butterfly factors (where $N = 2^{J}$), and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors. This hierarchical identifiability property relies on a simple identifiability condition in the two-layer and fixed-support setting. This approach contrasts with existing ones that fit the product of butterfly factors to a given matrix via gradient descent. The proposed method can be applied in particular to retrieve the factorization of the Hadamard or the discrete Fourier transform matrices of size $N=2^J$. Computing such factorizations costs $\mathcal{O}(N^{2})$, which is of the order of dense matrix-vector multiplication, while the obtained factorizations enable fast $\mathcal{O}(N \log N)$ matrix-vector multiplications and have the potential to be applied to compress deep neural networks. △ Less

Submitted 7 November, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

Journal ref: SIAM Journal on Mathematics of Data Science, Society for Industrial and Applied Mathematics, In press

arXiv:1912.13427 [pdf, ps, other]

An inexact non stationary Tikhonov procedure for large-scale nonlinear ill-posed problems

Authors: Stefania Bellavia, Marco Donatelli, Elisa Riccietti

Abstract: In this work we consider the stable numerical solution of large-scale ill-posed nonlinear least squares problems with nonzero residual. We propose a non-stationary Tikhonov method with inexact step computation, specially designed for large-scale problems. At each iteration the method requires the solution of an elliptical trust-region subproblem to compute the step. This task is carried out employ… ▽ More In this work we consider the stable numerical solution of large-scale ill-posed nonlinear least squares problems with nonzero residual. We propose a non-stationary Tikhonov method with inexact step computation, specially designed for large-scale problems. At each iteration the method requires the solution of an elliptical trust-region subproblem to compute the step. This task is carried out employing a Lanczos approach, by which an approximated solution is computed. The trust region radius is chosen to ensure the resulting Tikhonov regularization parameter to satisfy a prescribed condition on the model, which is proved to ensure regularizing properties to the method. The proposed approach is tested on a parameter identification problem and on an image registration problem, and it is shown to provide important computational savings with respect to its exact counterpart. △ Less

Submitted 1 January, 2020; v1 submitted 31 December, 2019; originally announced December 2019.

arXiv:1911.00026 [pdf, ps, other]

On the iterative solution of systems of the form $A^T A x=A^Tb+c$

Authors: Henri Calandra, Serge Gratton, Elisa Riccietti, Xavier Vasseur

Abstract: Given a full column rank matrix $A \in \mathbb{R}^{m\times n}$ ($m\geq n$), we consider a special class of linear systems of the form $A^\top Ax=A^\top b+c$ with $x, c \in \mathbb{R}^{n}$ and $b \in \mathbb{R}^{m}$. The occurrence of $c$ in the right-hand side of the equation prevents the direct application of standard methods for least squares problems. Hence, we investigate alternative solution… ▽ More Given a full column rank matrix $A \in \mathbb{R}^{m\times n}$ ($m\geq n$), we consider a special class of linear systems of the form $A^\top Ax=A^\top b+c$ with $x, c \in \mathbb{R}^{n}$ and $b \in \mathbb{R}^{m}$. The occurrence of $c$ in the right-hand side of the equation prevents the direct application of standard methods for least squares problems. Hence, we investigate alternative solution methods that, as in the case of normal equations, take advantage of the peculiar structure of the system to avoid unstable computations, such as forming $A^\top A$ explicitly. We propose two iterative methods that are based on specific reformulations of the problem and we provide explicit closed formulas for the structured condition number related to each problem. These formula allow us to compute a more accurate estimate of the forward error than the standard one used for generic linear systems, that does not take into account the structure of the perturbations. The relevance of our estimates is shown on a set of synthetic test problems. Numerical experiments highlight both the increased robustness and accuracy of the proposed methods compared to the standard conjugate gradient method. It is also found that the new methods can compare to standard direct methods in terms of solution accuracy. △ Less

Submitted 31 October, 2019; originally announced November 2019.

arXiv:1909.08099 [pdf, ps, other]

Worst-case Complexity Bounds of Directional Direct-search Methods for Multiobjective Optimization

Authors: A. L. Custódio, Y. Diouane, R. Garmanjani, E. Riccietti

Abstract: Direct Multisearch is a well-established class of algorithms, suited for multiobjective derivative-free optimization. In this work, we analyze the worst-case complexity of this class of methods in its most general formulation for unconstrained optimization. Considering nonconvex smooth functions, we show that to drive a given criticality measure below a specific positive threshold, Direct Multisea… ▽ More Direct Multisearch is a well-established class of algorithms, suited for multiobjective derivative-free optimization. In this work, we analyze the worst-case complexity of this class of methods in its most general formulation for unconstrained optimization. Considering nonconvex smooth functions, we show that to drive a given criticality measure below a specific positive threshold, Direct Multisearch takes at most a number of iterations proportional to the square of the inverse of the threshold, raised to the number of components of the objective function. This number is also proportional to the size of the set of linked sequences between the first unsuccessful iteration and the iteration immediately before the one where the criticality condition is satisfied. We then focus on a particular instance of Direct Multisearch, which considers a more strict criterion for accepting new nondominated points. In this case, we can establish a better worst-case complexity bound, simply proportional to the square of the inverse of the threshold, for driving the same criticality measure below the considered threshold. △ Less

Submitted 3 November, 2020; v1 submitted 17 September, 2019; originally announced September 2019.

arXiv:1904.04692 [pdf, ps, other]

On high-order multilevel optimization strategies

Authors: Henri Calandra, Serge Gratton, Elisa Riccietti, Xavier Vasseur

Abstract: We propose a new family of multilevel methods for unconstrained minimization. The resulting strategies are multilevel extensions of high-order optimization methods based on q-order Taylor models (with q >= 1) that have been recently proposed in the literature. The use of high-order models, while decreasing the worst-case complexity bound, makes these methods computationally more expensive. Hence,… ▽ More We propose a new family of multilevel methods for unconstrained minimization. The resulting strategies are multilevel extensions of high-order optimization methods based on q-order Taylor models (with q >= 1) that have been recently proposed in the literature. The use of high-order models, while decreasing the worst-case complexity bound, makes these methods computationally more expensive. Hence, to counteract this effect, we propose a multilevel strategy that exploits a hierarchy of problems of decreasing dimension, still approximating the original one, to reduce the global cost of the step computation. A theoretical analysis of the family of methods is proposed. Specifically, local and global convergence results are proved and a complexity bound to reach first order stationary points is also derived. A multilevel version of the well known adaptive method based on cubic regularization (ARC, corresponding to q = 2 in our setting) has been implemented. Numerical experiments clearly highlight the relevance of the new multilevel approach leading to considerable computational savings in terms of floating point operations compared to the classical one-level strategy. △ Less

Submitted 9 April, 2019; originally announced April 2019.

arXiv:1904.04685 [pdf, other]

On the approximation of the solution of partial differential equations by artificial neural networks trained by a multilevel Levenberg-Marquardt method

Authors: Henri Calandra, Serge Gratton, Elisa Riccietti, Xavier Vasseur

Abstract: This paper is concerned with the approximation of the solution of partial differential equations by means of artificial neural networks. Here a feedforward neural network is used to approximate the solution of the partial differential equation. The learning problem is formulated as a least squares problem, choosing the residual of the partial differential equation as a loss function, whereas a mul… ▽ More This paper is concerned with the approximation of the solution of partial differential equations by means of artificial neural networks. Here a feedforward neural network is used to approximate the solution of the partial differential equation. The learning problem is formulated as a least squares problem, choosing the residual of the partial differential equation as a loss function, whereas a multilevel Levenberg-Marquardt method is employed as a training method. This setting allows us to get further insight into the potential of multilevel methods. Indeed, when the least squares problem arises from the training of artificial neural networks, the variables subject to optimization are not related by any geometrical constraints and the standard interpolation and restriction operators cannot be employed any longer. A heuristic, inspired by algebraic multigrid methods, is then proposed to construct the multilevel transfer operators. Numerical experiments show encouraging results related to the efficiency of the new multilevel optimization method for the training of artificial neural networks, compared to the standard corresponding one-level procedure. △ Less

Submitted 9 April, 2019; originally announced April 2019.

arXiv:1504.03442 [pdf, ps, other]

On an adaptive regularization for ill-posed nonlinear systems and its trust-region implementation

Authors: Stefania Bellavia, Benedetta Morini, Elisa Riccietti

Abstract: In this paper we address the stable numerical solution of nonlinear ill-posed systems by a trust-region method. We show that an appropriate choice of the trust-region radius gives rise to a procedure that has the potential to approach a solution of the unperturbed system. This regularizing property is shown theoretically and validated numerically. In this paper we address the stable numerical solution of nonlinear ill-posed systems by a trust-region method. We show that an appropriate choice of the trust-region radius gives rise to a procedure that has the potential to approach a solution of the unperturbed system. This regularizing property is shown theoretically and validated numerically. △ Less

Submitted 14 April, 2015; originally announced April 2015.

Comments: arXiv admin note: text overlap with arXiv:1410.2780

Showing 1–19 of 19 results for author: Riccietti, E