-
Piecewise Polynomial Regression of Tame Functions via Integer Programming
Authors:
Gilles Bareilles,
Johannes Aspman,
Jiri Nemecek,
Jakub Marecek
Abstract:
Tame functions are a class of nonsmooth, nonconvex functions, which feature in a wide range of applications: functions encountered in the training of deep neural networks with all common activations, value functions of mixed-integer programs, or wave functions of small molecules. We consider approximating tame functions with piecewise polynomial functions. We bound the quality of approximation of…
▽ More
Tame functions are a class of nonsmooth, nonconvex functions, which feature in a wide range of applications: functions encountered in the training of deep neural networks with all common activations, value functions of mixed-integer programs, or wave functions of small molecules. We consider approximating tame functions with piecewise polynomial functions. We bound the quality of approximation of a tame function by a piecewise polynomial function with a given number of segments on any full-dimensional cube. We also present the first mixed-integer programming formulation of piecewise polynomial regression. Together, these can be used to estimate tame functions. We demonstrate promising computational results.
△ Less
Submitted 4 June, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Taming Binarized Neural Networks and Mixed-Integer Programs
Authors:
Johannes Aspman,
Georgios Korpas,
Jakub Marecek
Abstract:
There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that…
▽ More
There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that binarized neural networks admit a tame representation. This, in turn, makes it possible to use the framework of Bolte et al. for implicit differentiation, which offers the possibility for practical implementation of backpropagation in the context of binarized neural networks.
This approach could also be used for a broader class of mixed-integer programs, beyond the training of binarized neural networks, as encountered in symbolic approaches to AI and beyond.
△ Less
Submitted 20 December, 2023; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Riemannian Stochastic Approximation for Minimizing Tame Nonsmooth Objective Functions
Authors:
Johannes Aspman,
Vyacheslav Kungurtsev,
Reza Roohi Seraji
Abstract:
In many learning applications, the parameters in a model are structurally constrained in a way that can be modeled as them lying on a Riemannian manifold. Riemannian optimization, wherein procedures to enforce an iterative minimizing sequence to be constrained to the manifold, is used to train such models. At the same time, tame geometry has become a significant topological description of nonsmoot…
▽ More
In many learning applications, the parameters in a model are structurally constrained in a way that can be modeled as them lying on a Riemannian manifold. Riemannian optimization, wherein procedures to enforce an iterative minimizing sequence to be constrained to the manifold, is used to train such models. At the same time, tame geometry has become a significant topological description of nonsmooth functions that appear in the landscapes of training neural networks and other important models with structural compositions of continuous nonlinear functions with nonsmooth maps. In this paper, we study the properties of such stratifiable functions on a manifold and the behavior of retracted stochastic gradient descent, with diminishing stepsizes, for minimizing such functions.
△ Less
Submitted 8 February, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.