-
The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks
Authors:
Andrea Bonfanti,
Giuseppe Bruno,
Cristina Cipriani
Abstract:
The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not const…
▽ More
The Neural Tangent Kernel (NTK) viewpoint is widely employed to analyze the training dynamics of overparameterized Physics-Informed Neural Networks (PINNs). However, unlike the case of linear Partial Differential Equations (PDEs), we show how the NTK perspective falls short in the nonlinear scenario. Specifically, we establish that the NTK yields a random matrix at initialization that is not constant during training, contrary to conventional belief. Another significant difference from the linear regime is that, even in the idealistic infinite-width limit, the Hessian does not vanish and hence it cannot be disregarded during training. This motivates the adoption of second-order optimization methods. We explore the convergence guarantees of such methods in both linear and nonlinear cases, addressing challenges such as spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we highlight the benefits of second-order methods in benchmark test cases.
△ Less
Submitted 26 June, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
A minimax optimal control approach for robust neural ODEs
Authors:
Cristina Cipriani,
Alessandro Scagliotti,
Tobias Wöhrer
Abstract:
In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control the…
▽ More
In this paper, we address the adversarial training of neural ODEs from a robust control perspective. This is an alternative to the classical training via empirical risk minimization, and it is widely used to enforce reliable outcomes for input perturbations. Neural ODEs allow the interpretation of deep neural networks as discretizations of control systems, unlocking powerful tools from control theory for the development and the understanding of machine learning. In this specific case, we formulate the adversarial training with perturbed data as a minimax optimal control problem, for which we derive first order optimality conditions in the form of Pontryagin's Maximum Principle. We provide a novel interpretation of robust training leading to an alternative weighted technique, which we test on a low-dimensional classification task.
△ Less
Submitted 30 March, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks
Authors:
Cristina Cipriani,
Massimo Fornasier,
Alessandro Scagliotti
Abstract:
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning archi…
▽ More
The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.
△ Less
Submitted 10 August, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
A Measure Theoretical Approach to the Mean-field Maximum Principle for Training NeurODEs
Authors:
Benoît Bonnet,
Cristina Cipriani,
Massimo Fornasier,
Hui Huang
Abstract:
In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with $L^2$-regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequ…
▽ More
In this paper we consider a measure-theoretical formulation of the training of NeurODEs in the form of a mean-field optimal control with $L^2$-regularization of the control. We derive first order optimality conditions for the NeurODE training problem in the form of a mean-field maximum principle, and show that it admits a unique control solution, which is Lipschitz continuous in time. As a consequence of this uniqueness property, the mean-field maximum principle also provides a strong quantitative generalization error for finite sample approximations. Our derivation of the mean-field maximum principle is much simpler than the ones currently available in the literature for mean-field optimal control problems, and is based on a generalized Lagrange multiplier theorem on convex sets of spaces of measures. The latter is also new, and can be considered as a result of independent interest.
△ Less
Submitted 8 April, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Zero-Inertia Limit: from Particle Swarm Optimization to Consensus Based Optimization
Authors:
Cristina Cipriani,
Hui Huang,
**niao Qiu
Abstract:
Recently a continuous description of the particle swarm optimization (PSO) based on a system of stochastic differential equations was proposed by Grassi and Pareschi in arXiv:2012.05613 where the authors formally showed the link between PSO and the consensus based optimization (CBO) through zero-inertia limit. This paper is devoted to solving this theoretical open problem proposed in arXiv:2012.05…
▽ More
Recently a continuous description of the particle swarm optimization (PSO) based on a system of stochastic differential equations was proposed by Grassi and Pareschi in arXiv:2012.05613 where the authors formally showed the link between PSO and the consensus based optimization (CBO) through zero-inertia limit. This paper is devoted to solving this theoretical open problem proposed in arXiv:2012.05613 by providing a rigorous derivation of CBO from PSO through the limit of zero inertia, and a quantified convergence rate is obtained as well. The proofs are based on a probabilistic approach by investigating the weak convergence of the corresponding stochastic differential equations (SDEs) of Mckean type in the continuous path space and the results are illustrated with some numerical examples.
△ Less
Submitted 6 April, 2022; v1 submitted 14 April, 2021;
originally announced April 2021.