Search | arXiv e-print repository

Mean-field methods and algorithmic perspectives for high-dimensional machine learning

Abstract: The main difficulty that arises in the analysis of most machine learning algorithms is to handle, analytically and numerically, a large number of interacting random variables. In this Ph.D manuscript, we revisit an approach based on the tools of statistical physics of disordered systems. Developed through a rich literature, they have been precisely designed to infer the macroscopic behavior of a l… ▽ More The main difficulty that arises in the analysis of most machine learning algorithms is to handle, analytically and numerically, a large number of interacting random variables. In this Ph.D manuscript, we revisit an approach based on the tools of statistical physics of disordered systems. Developed through a rich literature, they have been precisely designed to infer the macroscopic behavior of a large number of particles from their microscopic interactions. At the heart of this work, we strongly capitalize on the deep connection between the replica method and message passing algorithms in order to shed light on the phase diagrams of various theoretical models, with an emphasis on the potential differences between statistical and algorithmic thresholds. We essentially focus on synthetic tasks and data generated in the teacher-student paradigm. In particular, we apply these mean-field methods to the Bayes-optimal analysis of committee machines, to the worst-case analysis of Rademacher generalization bounds for perceptrons, and to empirical risk minimization in the context of generalized linear models. Finally, we develop a framework to analyze estimation models with structured prior informations, produced for instance by deep neural networks based generative models with random weights. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: Ph.D manuscript

arXiv:2006.06560 [pdf, other]

Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we prove a formula for the generalization error achieved by $\ell_2$ regularized classifiers that minimize a convex loss. This formula was first obtained by the heuristic replica method of statistical physics. Secondly, focussing on commonly used loss functions and optimizing the $\ell_2$ regularization strength, we observe that while ridge regression performance is poor, logistic and hinge regression are surprisingly able to approach the Bayes-optimal generalization error extremely closely. As $α\to \infty$ they lead to Bayes-optimal rates, a fact that does not follow from predictions of margin-based generalization error bounds. Third, we design an optimal loss and regularizer that provably leads to Bayes-optimal generalization error. △ Less

Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

arXiv:2004.01571 [pdf, other]

Tree-AMP: Compositional Inference with Tree Approximate Message Passing

Authors: Antoine Baker, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

Abstract: We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factori… ▽ More We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms previously derived for a variety of machine learning tasks such as generalized linear models, inference in multi-layer networks, matrix factorization, and reconstruction using non-separable penalties. For some models, the asymptotic performance of the algorithm can be theoretically predicted by the state evolution, and the measurements entropy estimated by the free entropy formalism. The implementation is modular by design: each module, which implements a factor, can be composed at will with other modules to solve complex inference tasks. The user only needs to declare the factor graph of the model: the inference algorithm, state evolution and entropy estimation are fully automated. △ Less

Submitted 11 December, 2021; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: Source code available at https://github.com/sphinxteam/tramp and documentation at https://sphinxteam.github.io/tramp.docs

Journal ref: Journal of Machine Learning Research 24 (2023) 1-89

arXiv:1912.02729 [pdf, ps, other]

Rademacher complexity and spin glasses: A link between the replica and statistical theories of learning

Authors: Alia Abbara, Benjamin Aubin, Florent Krzakala, Lenka Zdeborová

Abstract: Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher… ▽ More Statistical learning theory provides bounds of the generalization gap, using in particular the Vapnik-Chervonenkis dimension and the Rademacher complexity. An alternative approach, mainly studied in the statistical physics literature, is the study of generalization in simple synthetic-data models. Here we discuss the connections between these approaches and focus on the link between the Rademacher complexity in statistical learning and the theories of generalization for typical-case synthetic models from statistical physics, involving quantities known as Gardner capacity and ground state energy. We show that in these models the Rademacher complexity is closely related to the ground state energy computed by replica theories. Using this connection, one may reinterpret many results of the literature as rigorous Rademacher bounds in a variety of models in the high-dimensional statistics limit. Somewhat surprisingly, we also show that statistical learning theory provides predictions for the behavior of the ground-state energies in some full replica symmetry breaking models. △ Less

Submitted 15 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 15 + 10 pages, v2 revised and accepted at MSML

Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:27-54, 2020

arXiv:1912.02008 [pdf, other]

Exact asymptotics for phase retrieval and compressed sensing with random generative priors

Authors: Benjamin Aubin, Bruno Loureiro, Antoine Baker, Florent Krzakala, Lenka Zdeborová

Abstract: We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the p… ▽ More We consider the problem of compressed sensing and of (real-valued) phase retrieval with random measurement matrix. We derive sharp asymptotics for the information-theoretically optimal performance and for the best known polynomial algorithm for an ensemble of generative priors consisting of fully connected deep neural networks with random weight matrices and arbitrary activations. We compare the performance to sparse separable priors and conclude that generative priors might be advantageous in terms of algorithmic performance. In particular, while sparsity does not allow to perform compressive phase retrieval efficiently close to its information-theoretic limit, it is found that under the random generative prior compressed phase retrieval becomes tractable. △ Less

Submitted 12 June, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

Comments: 13+3 pages, 7 figures, v2 revised and accepted at MSML

Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:55-73, 2020

arXiv:1901.00314 [pdf, ps, other]

doi 10.1088/1751-8121/ab227a

Storage capacity in symmetric binary perceptrons

Authors: Benjamin Aubin, Will Perkins, Lenka Zdeborová

Abstract: We study the problem of determining the capacity of the binary perceptron for two variants of the problem where the corresponding constraint is symmetric. We call these variants the rectangle-binary-perceptron (RPB) and the $u-$function-binary-perceptron (UBP). We show that, unlike for the usual step-function-binary-perceptron, the critical capacity in these symmetric cases is given by the anneale… ▽ More We study the problem of determining the capacity of the binary perceptron for two variants of the problem where the corresponding constraint is symmetric. We call these variants the rectangle-binary-perceptron (RPB) and the $u-$function-binary-perceptron (UBP). We show that, unlike for the usual step-function-binary-perceptron, the critical capacity in these symmetric cases is given by the annealed computation in a large region of parameter space (for all rectangular constraints and for narrow enough $u-$function constraints, $K<K^*$). We prove this fact (under two natural assumptions) using the first and second moment methods. We further use the second moment method to conjecture that solutions of the symmetric binary perceptrons are organized in a so-called frozen-1RSB structure, without using the replica method. We then use the replica method to estimate the capacity threshold for the UBP case when the $u-$function is wide $K>K^*$. We conclude that full-step-replica-symmetry breaking would have to be evaluated in order to obtain the exact capacity in this case. △ Less

Submitted 31 March, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

Comments: 18 + 7 pages

Journal ref: J. Phys. A: Math. Theor. 52 294003 (2019)

arXiv:1806.05451 [pdf, other]

doi 10.1088/1742-5468/ab43d2

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Authors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Abstract: Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of… ▽ More Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap. △ Less

Submitted 29 February, 2024; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Journal ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018

Showing 1–7 of 7 results for author: Aubin, B