Search | arXiv e-print repository

arXiv:2405.20993 [pdf, other]

Information limits and Thouless-Anderson-Palmer equations for spiked matrix models with structured noise

Authors: Jean Barbier, Francesco Camilli, Marco Mondelli, Yizhou Xu

Abstract: We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractabili… ▽ More We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner matrix, the more realistic case of structured noise still proves to be challenging. To capture the structure while maintaining mathematical tractability, a line of work has focused on rotationally invariant noise. However, existing studies either provide sub-optimal algorithms or are limited to special cases of noise ensembles. In this paper, using tools from statistical physics (replica method) and random matrix theory (generalized spherical integrals) we establish the first characterization of the information-theoretic limits for a noise matrix drawn from a general trace ensemble. Remarkably, our analysis unveils the asymptotic equivalence between the rotationally invariant model and a surrogate Gaussian one. Finally, we show how to saturate the predicted statistical limits using an efficient algorithm inspired by the theory of adaptive Thouless-Anderson-Palmer (TAP) equations. △ Less

Submitted 8 July, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

MSC Class: 62F15; 82B44

arXiv:2403.07189 [pdf, ps, other]

A multiscale cavity method for sublinear-rank symmetric matrix factorization

Authors: Jean Barbier, Justin Ko, Anas A. Rahman

Abstract: We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M = o(N^{1/10})$. Allowing for a $N$-dependent rank offers new challenges and requires new methods. Working in the Bayesian-optimal setting, we show that whenever the signal has i.i.d. entries th… ▽ More We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime where the rank $M$ of the signal matrix to infer scales with its size $N$ as $M = o(N^{1/10})$. Allowing for a $N$-dependent rank offers new challenges and requires new methods. Working in the Bayesian-optimal setting, we show that whenever the signal has i.i.d. entries the limiting mutual information between signal and data is given by a variational formula involving a rank-one replica symmetric potential. In other words, from the information-theoretic perspective, the case of a (slowly) growing rank is the same as when $M = 1$ (namely, the standard spiked Wigner model). The proof is primarily based on a novel multiscale cavity method allowing for growing rank along with some information-theoretic identities on worst noise for the Gaussian vector channel. We believe that the cavity method developed here will play a role in the analysis of a broader class of inference and spin models where the degrees of freedom are large arrays instead of vectors. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2307.05635 [pdf, ps, other]

Fundamental limits of overparametrized shallow neural networks for supervised learning

Authors: Francesco Camilli, Daria Tieplova, Jean Barbier

Abstract: We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simp… ▽ More We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 30 pages, 1 figure

MSC Class: 68Txx; 68T07

arXiv:2306.01412 [pdf, ps, other]

Matrix Inference in Growing Rank Regimes

Authors: Farzad Pourkamali, Jean Barbier, Nicolas Macris

Abstract: The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=Θ(N^α)$ with $α\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a)… ▽ More The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=Θ(N^α)$ with $α\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a) $\mathbf{S}=\mathbf{X}\mathbf{X}^\intercal$ with entries of $\mathbf{X}\in\mathbb{R}^{N\times M}$ independent identically distributed; (b) $\mathbf{S}$ sampled from a rotationally invariant distribution. Surprisingly, the formulas match the rank-one case. Two efficient algorithms are explored and conjectured to saturate the MMSE when no statistical-to-computational gap is present: (1) Decimation Approximate Message Passing; (2) a spectral algorithm based on a Rotation Invariant Estimator. For linear ranks $M=Θ(N)$ the mutual information is rigorously derived for signal-matrices from a rotationally invariant distribution. Close connections with scalar inference in free probability are uncovered, which allow to deduce a simple formula for the MMSE as an integral involving the limiting spectral measure of the data matrix only. An interesting issue is whether the known information theoretic phase transitions for rank-one, and hence also sub-linear-rank, still persist in linear-rank. Our analysis suggests that only a smoothed-out trace of the transitions persists. Furthermore, the change of behavior between low and truly high-rank regimes only happens at the linear scale $α=1$. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2303.08406 [pdf, other]

Capacity-Achieving Sparse Regression Codes via Vector Approximate Message Passing

Authors: Yizhou Xu, YuHao Liu, ShanSuo Liang, Tingyi Wu, Bo Bai, Jean Barbier, TianQi Hou

Abstract: Sparse regression codes (SPARCs) are a promising coding scheme that can approach the Shannon limit over Additive White Gaussian Noise (AWGN) channels. Previous works have proven the capacity-achieving property of SPARCs with Gaussian design matrices. We generalize these results to right orthogonally invariant ensembles that allow for more structured design matrices. With the Vector Approximate Mes… ▽ More Sparse regression codes (SPARCs) are a promising coding scheme that can approach the Shannon limit over Additive White Gaussian Noise (AWGN) channels. Previous works have proven the capacity-achieving property of SPARCs with Gaussian design matrices. We generalize these results to right orthogonally invariant ensembles that allow for more structured design matrices. With the Vector Approximate Message Passing (VAMP) decoder, we rigorously demonstrate the exponentially decaying error probability for design matrices that satisfy a certain criterion with the exponentially decaying power allocation. For other spectra, we design a new power allocation scheme to show that the information theoretical threshold is achievable. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2302.03306 [pdf, other]

Mismatched estimation of non-symmetric rank-one matrices corrupted by structured noise

Authors: Teng Fu, YuHao Liu, Jean Barbier, Marco Mondelli, ShanSuo Liang, TianQi Hou

Abstract: We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values. As the signal-to-noise ratio and the noise structure are unknown, a Gaussian setup is incorrectly assumed. We derive the exact analytic expression for the error of the mismatched Bayes estimator and also provide… ▽ More We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values. As the signal-to-noise ratio and the noise structure are unknown, a Gaussian setup is incorrectly assumed. We derive the exact analytic expression for the error of the mismatched Bayes estimator and also provide the analysis of an approximate message passing (AMP) algorithm. The first result exploits the asymptotic behavior of spherical integrals for rectangular matrices and of low-rank matrix perturbations; the second one relies on the design and analysis of an auxiliary AMP. The numerical experiments show that there is a performance gap between the AMP and Bayes estimators, which is due to the incorrect estimation of the signal norm. △ Less

Submitted 8 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2210.01237 [pdf, other]

Bayes-optimal limits in structured PCA, and how to reach them

Authors: Jean Barbier, Francesco Camilli, Marco Mondelli, Manuel Saenz

Abstract: How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The r… ▽ More How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide the first characterization of the Bayes-optimal limits of inference in this model. If the spike is rotation-invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical mechanics. We thus propose a novel AMP, inspired by the theory of Adaptive Thouless-Anderson-Palmer equations, which saturates the theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at remarkable universality properties. △ Less

Submitted 2 June, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2205.10009 [pdf, other]

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

Authors: Jean Barbier, TianQi Hou, Marco Mondelli, Manuel Sáenz

Abstract: We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed? While the matched Bayes-optimal setting with unstructured noise is well understood, the analysis of this mismatched problem is only at its pr… ▽ More We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed? While the matched Bayes-optimal setting with unstructured noise is well understood, the analysis of this mismatched problem is only at its premises. In this paper, we make a step towards understanding the effect of the strong source of mismatch which is the noise statistics. Our main technical contribution is the rigorous analysis of a Bayes estimator and of an approximate message passing (AMP) algorithm, both of which incorrectly assume a Gaussian setup. The first result exploits the theory of spherical integrals and of low-rank matrix perturbations; the idea behind the second one is to design and analyze an artificial AMP which, by taking advantage of the flexibility in the denoisers, is able to "correct" the mismatch. Armed with these sharp asymptotic characterizations, we unveil a rich and often unexpected phenomenology. For example, despite AMP is in principle designed to efficiently compute the Bayes estimator, the former is outperformed by the latter in terms of mean-square error. We show that this performance gap is due to an incorrect estimation of the signal norm. In fact, when the SNR is large enough, the overlaps of the AMP and the Bayes estimator coincide, and they even match those of optimal estimators taking into account the structure of the noise. △ Less

Submitted 20 May, 2022; originally announced May 2022.

arXiv:2205.08980 [pdf, other]

Sparse superposition codes with rotational invariant coding matrices for memoryless channels

Authors: YuHao Liu, Teng Fu, Jean Barbier, TianQi Hou

Abstract: We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approx… ▽ More We recently showed in [1] the superiority of certain structured coding matrices ensembles (such as partial row-orthogonal) for sparse superposition codes when compared with purely random matrices with i.i.d. entries, both information-theoretically and under practical vector approximate message-passing decoding. Here we generalize this result to binary input channels under generalized vector approximate message-passing decoding [2].We focus on specific binary output channels for concreteness but our analysis based on the replica symmetric method from statistical physics applies to any memoryless channel. We confirm that the "spectral criterion" introduced in [1], a coding-matrix design principle which allows the code to be capacity-achieving in the "large section size" asymptotic limit, extends to generic memoryless channels. Moreover, we also show that the vanishing error floor property [3] of this coding scheme is universal for arbitrary spectrum of the coding matrix. △ Less

Submitted 10 July, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: Submitted to the The IEEE Information Theory Workshop (ITW 2022)

arXiv:2205.00750 [pdf, other]

The mighty force: statistical inference and high-dimensional statistics

Authors: Erik Aurell, Jean Barbier, Aurelien Decelle, Roberto Mulet

Abstract: This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k… ▽ More This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k.a. direct coupling analysis), inference from graphs (the community detection problem), and the dynamic cavity method, which in particular allows for inference from graphs encoding causal relations. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: To appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific

arXiv:2203.00438 [pdf, ps, other]

An Analytical Approach to Compute the Exact Preimage of Feed-Forward Neural Networks

Authors: Théo Nancy, Vassili Maillet, Johann Barbier

Abstract: Neural networks are a convenient way to automatically fit functions that are too complex to be described by hand. The downside of this approach is that it leads to build a black-box without understanding what happened inside. Finding the preimage would help to better understand how and why such neural networks had given such outputs. Because most of the neural networks are noninjective function, i… ▽ More Neural networks are a convenient way to automatically fit functions that are too complex to be described by hand. The downside of this approach is that it leads to build a black-box without understanding what happened inside. Finding the preimage would help to better understand how and why such neural networks had given such outputs. Because most of the neural networks are noninjective function, it is often impossible to compute it entirely only by a numerical way. The point of this study is to give a method to compute the exact preimage of any Feed-Forward Neural Network with linear or piecewise linear activation functions for hidden layers. In contrast to other methods, this one is not returning a unique solution for a unique output but returns analytically the entire and exact preimage. △ Less

Submitted 25 August, 2022; v1 submitted 28 February, 2022; originally announced March 2022.

arXiv:2202.04541 [pdf, other]

Sparse superposition codes under VAMP decoding with generic rotational invariant coding matrices

Authors: TianQi Hou, YuHao Liu, Teng Fu, Jean Barbier

Abstract: Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder… ▽ More Sparse superposition codes were originally proposed as a capacity-achieving communication scheme over the gaussian channel, whose coding matrices were made of i.i.d. gaussian entries.We extend this coding scheme to more generic ensembles of rotational invariant coding matrices with arbitrary spectrum, which include the gaussian ensemble as a special case. We further introduce and analyse a decoder based on vector approximate message-passing (VAMP).Our main findings, based on both a standard replica symmetric potential theory and state evolution analysis, are the superiority of certain structured ensembles of coding matrices (such as partial row-orthogonal) when compared to i.i.d. matrices, as well as a spectrum-independent upper bound on VAMP's threshold. Most importantly, we derive a simple "spectral criterion " for the scheme to be at the same time capacity-achieving while having the best possible algorithmic threshold, in the "large section size" asymptotic limit. Our results therefore provide practical design principles for the coding matrices in this promising communication scheme. △ Less

Submitted 26 May, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Submitted to the 2022 IEEE International Symposium on Information Theory (ISIT)

MSC Class: 94-08

arXiv:2112.02066 [pdf, ps, other]

Marginals of a spherical spin glass model with correlated disorder

Authors: Jean Barbier, Manuel Sáenz

Abstract: In this paper we prove the weak convergence, in a high-temperature phase, of the finite marginals of the Gibbs measure associated to a symmetric spherical spin glass model with correlated couplings towards an explicit asymptotic decoupled measure. We also provide upper bounds for the rate of convergence in terms of the one of the energy per variable. Furthermore, we establish a concentration inequ… ▽ More In this paper we prove the weak convergence, in a high-temperature phase, of the finite marginals of the Gibbs measure associated to a symmetric spherical spin glass model with correlated couplings towards an explicit asymptotic decoupled measure. We also provide upper bounds for the rate of convergence in terms of the one of the energy per variable. Furthermore, we establish a concentration inequality for bounded functions under a higher temperature condition. These results are exemplified by analysing the asymptotic behaviour of the empirical mean of coordinate-wise functions of samples from the Gibbs measure of the model. △ Less

Submitted 3 December, 2021; originally announced December 2021.

arXiv:2109.06610 [pdf, other]

doi 10.1103/PhysRevE.106.024136

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

Authors: Jean Barbier, Nicolas Macris

Abstract: We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising prob… ▽ More We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising problems whose mutual information and minimum mean-square error are computable using techniques from random matrix theory. Next, we analyze the more challenging models of dictionary learning. To do so we introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method. This allows us to derive variational formulas for the mutual information between hidden representations and the noisy data of the dictionary learning problem, as well as for the overlaps quantifying the optimal reconstruction error. The proposed method reduces the number of degrees of freedom from $Θ(N^2)$ matrix entries to $Θ(N)$ eigenvalues (or singular values), and yields Coulomb gas representations of the mutual information which are reminiscent of matrix models in physics. The main ingredients are a combination of large deviation results for random matrices together with a new replica symmetric decoupling ansatz at the level of the probability distributions of eigenvalues (or singular values) of certain overlap matrices and the use of HarishChandra-Itzykson-Zuber spherical integrals. △ Less

Submitted 26 February, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:2107.06936 [pdf, ps, other]

Performance of Bayesian linear regression in a model with mismatch

Authors: Jean Barbier, Wei-Kuo Chen, Dmitry Panchenko, Manuel Sáenz

Abstract: In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previousl… ▽ More In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is used for inference, much less is known when there is a mismatch. Here we consider a model in which the responses are corrupted by gaussian noise and are known to be generated as linear combinations of the covariates, but the distributions of the ground-truth regression coefficients and of the noise are unknown. This regression task can be rephrased as a statistical mechanics model known as the Gardner spin glass, an analogy which we exploit. Using a leave-one-out approach we characterize the mean-square error for the regression coefficients. We also derive the log-normalizing constant of the posterior. Similar models have been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are much more straightforward. An interesting consequence of our analysis is that in the quadratic loss case, the performance of the Bayesian estimator is independent of a global "temperature" hyperparameter and matches the ridge estimator: sampling and optimizing are equally good. △ Less

Submitted 10 November, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

arXiv:2010.14863 [pdf, other]

High-dimensional inference: a statistical mechanics perspective

Authors: Jean Barbier

Abstract: Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of high-dimensional noisy data. This "high-dimensional regime" is reminiscent of statistical mechanics, which aims at describing the macro… ▽ More Statistical inference is the science of drawing conclusions about some system from data. In modern signal processing and machine learning, inference is done in very high dimension: very many unknown characteristics about the system have to be deduced from a lot of high-dimensional noisy data. This "high-dimensional regime" is reminiscent of statistical mechanics, which aims at describing the macroscopic behavior of a complex system based on the knowledge of its microscopic interactions. It is by now clear that there are many connections between inference and statistical physics. This article aims at emphasizing some of the deep links connecting these apparently separated disciplines through the description of paradigmatic models of high-dimensional inference in the language of statistical mechanics. This article has been published in the issue on artificial intelligence of Ithaca, an Italian popularization-of-science journal. The selected topics and references are highly biased and not intended to be exhaustive in any ways. Its purpose is to serve as introduction to statistical mechanics of inference through a very specific angle that corresponds to my own tastes and limited knowledge. △ Less

Submitted 28 October, 2020; originally announced October 2020.

arXiv:2009.12939 [pdf, ps, other]

doi 10.1093/imaiai/iaab027

Strong replica symmetry for high-dimensional disordered log-concave Gibbs measures

Authors: Jean Barbier, Dmitry Panchenko, Manuel Sáenz

Abstract: We consider a generic class of log-concave, possibly random, (Gibbs) measures. We prove the concentration of an infinite family of order parameters called multioverlaps. Because they completely parametrise the quenched Gibbs measure of the system, this implies a simple representation of the asymptotic Gibbs measures, as well as the decoupling of the variables in a strong sense. These results may p… ▽ More We consider a generic class of log-concave, possibly random, (Gibbs) measures. We prove the concentration of an infinite family of order parameters called multioverlaps. Because they completely parametrise the quenched Gibbs measure of the system, this implies a simple representation of the asymptotic Gibbs measures, as well as the decoupling of the variables in a strong sense. These results may prove themselves useful in several contexts. In particular in machine learning and high-dimensional inference, log-concave measures appear in convex empirical risk minimisation, maximum a-posteriori inference or M-estimation. We believe that they may be applicable in establishing some type of "replica symmetric formulas" for the free energy, inference or generalisation error in such settings. △ Less

Submitted 22 February, 2022; v1 submitted 27 September, 2020; originally announced September 2020.

Journal ref: Inf. Inference, 11, no. 3 (2022) 1079-1108

arXiv:2006.11313 [pdf, other]

Information theoretic limits of learning a sparse rule

Authors: Clément Luneau, Jean Barbier, Nicolas Macris

Abstract: We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bay… ▽ More We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate. The all-or-nothing phenomenon has previously been shown to occur in high-dimensional linear regression. Our analysis goes beyond the linear case and applies to learning the weights of a perceptron with general activation function in a teacher-student scenario. In particular, we discuss an all-or-nothing phenomenon for the generalization error with a sublinear set of training examples. △ Less

Submitted 27 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: 56 pages, 4 figures, accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Extended version that includes the supplementary material

arXiv:2006.07971 [pdf, other]

All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation

Authors: Jean Barbier, Nicolas Macris, Cynthia Rush

Abstract: We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropr… ▽ More We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix and analyze the approximate message passing algorithm in the sparse regime. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, we find all-or-nothing phase transitions for the asymptotic minimum and algorithmic mean-square errors. These jump from their maximum possible value to zero, at well defined signal-to-noise thresholds whose asymptotic values we determine exactly. In the asymptotic regime the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing. △ Less

Submitted 30 October, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Part of this work (in particular the proof of Theorem 1) is already present in reference arXiv:1911.05030

arXiv:2005.08017 [pdf, ps, other]

Information-theoretic limits of a multiview low-rank symmetric spiked matrix model

Authors: Jean Barbier, Galen Reeves

Abstract: We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable… ▽ More We consider a generalization of an important class of high-dimensional inference problems, namely spiked symmetric matrix models, often used as probabilistic models for principal component analysis. Such paradigmatic models have recently attracted a lot of attention from a number of communities due to their phenomenological richness with statistical-to-computational gaps, while remaining tractable. We rigorously establish the information-theoretic limits through the proof of single-letter formulas for the mutual information and minimum mean-square error. On a technical side we improve the recently introduced adaptive interpolation method, so that it can be used to study low-rank models (i.e., estimation problems of "tall matrices") in full generality, an important step towards the rigorous analysis of more complicated inference and learning models. △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: Presented at the 2020 International Symposium on Information Theory (ISIT)

arXiv:2005.03115 [pdf, other]

Strong replica symmetry in high-dimensional optimal Bayesian inference

Authors: Jean Barbier, Dmitry Panchenko

Abstract: We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities t… ▽ More We consider generic optimal Bayesian inference, namely, models of signal reconstruction where the posterior distribution and all hyperparameters are known. Under a standard assumption on the concentration of the free energy, we show how replica symmetry in the strong sense of concentration of all multioverlaps can be established as a consequence of the Franz-de Sanctis identities; the identities themselves in the current setting are obtained via a novel perturbation coming from exponentially distributed "side-observations" of the signal. Concentration of multioverlaps means that asymptotically the posterior distribution has a particularly simple structure encoded by a random probability measure (or, in the case of binary signal, a non-random probability measure). We believe that such strong control of the model should be key in the study of inference problems with underlying sparse graphical structure (error correcting codes, block models, etc) and, in particular, in the rigorous derivation of replica symmetric formulas for the free energy and mutual information in this context. △ Less

Submitted 22 February, 2022; v1 submitted 6 May, 2020; originally announced May 2020.

Journal ref: Communications in Mathematical Physics 393, no. 3 (2022) 1199-1239

arXiv:2004.06975 [pdf, ps, other]

doi 10.1109/ISIT44484.2020.9174104

High-dimensional rank-one nonsymmetric matrix decomposition: the spherical case

Authors: Clément Luneau, Nicolas Macris, Jean Barbier

Abstract: We consider the problem of estimating a rank-one nonsymmetric matrix under additive white Gaussian noise. The matrix to estimate can be written as the outer product of two vectors and we look at the special case in which both vectors are uniformly distributed on spheres. We prove a replica-symmetric formula for the average mutual information between these vectors and the observations in the high-d… ▽ More We consider the problem of estimating a rank-one nonsymmetric matrix under additive white Gaussian noise. The matrix to estimate can be written as the outer product of two vectors and we look at the special case in which both vectors are uniformly distributed on spheres. We prove a replica-symmetric formula for the average mutual information between these vectors and the observations in the high-dimensional regime. This goes beyond previous results which considered vectors with independent and identically distributed elements. The method used can be extended to rank-one tensor problems. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: Will appear in 2020 IEEE International Symposium on Information Theory (ISIT). Long version with appendices, 26 pages

arXiv:1911.05030 [pdf, other]

0-1 phase transitions in sparse spiked matrix estimation

Authors: Jean Barbier, Nicolas Macris

Abstract: We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We pr… ▽ More We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix in suitable sparse limits. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error. A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression (compressive sensing). △ Less

Submitted 12 November, 2019; originally announced November 2019.

arXiv:1910.00285 [pdf, other]

doi 10.1088/1751-8121/ab8416

Blind calibration for compressed sensing: State evolution and an online algorithm

Authors: Marylou Gabrié, Jean Barbier, Florent Krzakala, Lenka Zdeborová

Abstract: Compressed sensing, allows to acquire compressible signals with a small number of measurements. In applications, a hardware implementation often requires a calibration as the sensing process is not perfectly known. Blind calibration, that is performing at the same time calibration and compressed sensing is thus particularly appealing. A potential approach was suggested by Schülke and collaborators… ▽ More Compressed sensing, allows to acquire compressible signals with a small number of measurements. In applications, a hardware implementation often requires a calibration as the sensing process is not perfectly known. Blind calibration, that is performing at the same time calibration and compressed sensing is thus particularly appealing. A potential approach was suggested by Schülke and collaborators in Schülke et al. 2013 and 2015, using approximate message passing (AMP) for blind calibration (cal-AMP). Here, the algorithm is extended from the already proposed offline case to the online case, where the calibration is refined step by step as new measured samples are received. Furthermore, we show that the performance of both the offline and the online algorithms can be theoretically studied via the State Evolution (SE) formalism. Through numerical simulations, the efficiency of cal-AMP and the consistency of the theoretical predictions are confirmed. △ Less

Submitted 23 March, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Journal ref: J. Phys. A: Math. Theor. 53 334004 (2020)

arXiv:1907.07103 [pdf, ps, other]

Concentration of the matrix-valued minimum mean-square error in optimal Bayesian inference

Authors: Jean Barbier

Abstract: We consider Bayesian inference of signals with vector-valued entries. Extending concentration techniques from the mathematical physics of spin glasses, we show that the matrix-valued minimum mean-square error concentrates when the size of the problem increases. Such results are often crucial for proving single-letter formulas for the mutual information when they exist. Our proof is valid in the op… ▽ More We consider Bayesian inference of signals with vector-valued entries. Extending concentration techniques from the mathematical physics of spin glasses, we show that the matrix-valued minimum mean-square error concentrates when the size of the problem increases. Such results are often crucial for proving single-letter formulas for the mutual information when they exist. Our proof is valid in the optimal Bayesian inference setting, meaning that it relies on the assumption that the model and all its hyper-parameters are known. Examples of inference and learning problems covered by our results are spiked matrix and tensor models, the committee machine neural network with few hidden neurons in the teacher-student scenario, or multi-layers generalized linear models. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Comments: arXiv admin note: text overlap with arXiv:1904.02808

arXiv:1904.04565 [pdf, ps, other]

doi 10.1093/imaiai/iaaa022

Mutual information for low-rank even-order symmetric tensor estimation

Authors: Clément Luneau, Jean Barbier, Nicolas Macris

Abstract: We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This re… ▽ More We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors. △ Less

Submitted 23 September, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: Preprint of an article accepted for publication in Information and Inference: A Journal of the IMA

arXiv:1904.02808 [pdf, other]

Overlap matrix concentration in optimal Bayesian inference

Authors: Jean Barbier

Abstract: We consider models of Bayesian inference of signals with vectorial components of finite dimensionality. We show that, under a proper perturbation, these models are replica symmetric in the sense that the overlap matrix concentrates. The overlap matrix is the order parameter in these models and is directly related to error metrics such as minimum mean-square errors. Our proof is valid in the optima… ▽ More We consider models of Bayesian inference of signals with vectorial components of finite dimensionality. We show that, under a proper perturbation, these models are replica symmetric in the sense that the overlap matrix concentrates. The overlap matrix is the order parameter in these models and is directly related to error metrics such as minimum mean-square errors. Our proof is valid in the optimal Bayesian inference setting. This means that it relies on the assumption that the model and all its hyper-parameters are known so that the posterior distribution can be written exactly. Examples of important problems in high-dimensional inference and learning to which our results apply are low-rank tensor factorization, the committee machine neural network with a finite number of hidden neurons in the teacher-student scenario, or multi-layer versions of the generalized linear model. △ Less

Submitted 24 January, 2020; v1 submitted 4 April, 2019; originally announced April 2019.

arXiv:1902.07273 [pdf, other]

Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve map** the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmi… ▽ More We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve map** the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method. △ Less

Submitted 16 July, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1901.06521 [pdf, other]

doi 10.1007/s10955-019-02470-6

Concentration of multi-overlaps for random ferromagnetic spin models

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori l… ▽ More We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori line. Here we treat all multi-overlaps by a non-trivial application of Griffiths-Kelly-Sherman correlation inequalities. Our results apply in particular to the pure and mixed p-spin ferromagnets on random dilute Erdoes-Rényi hypergraphs. On physical grounds one expects that multi-overlap concentration directly implies the correctness of the cavity (or replica symmetric) formula for the pressure. The proof of this formula for the general p-spin ferromagnet on a random dilute hypergraph remains an open problem. △ Less

Submitted 19 January, 2019; originally announced January 2019.

arXiv:1901.06516 [pdf, ps, other]

doi 10.1088/1751-8121/ab2735

The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models

Authors: Jean Barbier, Nicolas Macris

Abstract: In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We… ▽ More In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We then generalize this analysis to a paradigmatic inference problem, namely rank-one matrix estimation, also refered to as the Wigner spike model in statistics. We give many pointers to the recent literature where the method has been succesfully applied. △ Less

Submitted 7 March, 2020; v1 submitted 19 January, 2019; originally announced January 2019.

arXiv:1812.02537 [pdf, other]

Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Abstract: Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and pro… ▽ More Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Submitted to Journal of Machine Learning Research (JMLR)

arXiv:1806.05451 [pdf, other]

doi 10.1088/1742-5468/ab43d2

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Authors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Abstract: Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of… ▽ More Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap. △ Less

Submitted 29 February, 2024; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Journal ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018

arXiv:1806.05121 [pdf, other]

Adaptive Path Interpolation for Sparse Systems: Application to a Simple Censored Block Model

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation meth… ▽ More Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation method directly proves that the replica symmetric prediction is exact, in a simple and unified manner. When the underlying factor graph of the inference problem is sparse the replica prediction is considerably more complicated, and rigorous results are often lacking or obtained by rather complicated methods. In this work we show how to extend the adaptive path interpolation method to sparse systems. We concentrate on a Censored Block Model, where hidden variables are measured through a binary erasure channel, for which we fully prove the replica prediction. △ Less

Submitted 18 July, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1805.09785 [pdf, other]

doi 10.1088/1742-5468/ab3430

Entropy and mutual information in models of deep neural networks

Authors: Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Abstract: We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is kno… ▽ More We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive. △ Less

Submitted 29 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

Journal ref: J. Stat. Mech. (2019) 124014. & NeurIPS 2018

arXiv:1802.08963 [pdf, other]

doi 10.1109/ISIT.2018.8437522

The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices

Authors: Jean Barbier, Nicolas Macris, Antoine Maillard, Florent Krzakala

Abstract: There has been definite progress recently in proving the variational single-letter formula given by the heuristic replica method for various estimation problems. In particular, the replica formula for the mutual information in the case of noisy linear estimation with random i.i.d. matrices, a problem with applications ranging from compressed sensing to statistics, has been proven rigorously. In th… ▽ More There has been definite progress recently in proving the variational single-letter formula given by the heuristic replica method for various estimation problems. In particular, the replica formula for the mutual information in the case of noisy linear estimation with random i.i.d. matrices, a problem with applications ranging from compressed sensing to statistics, has been proven rigorously. In this contribution we go beyond the restrictive i.i.d. matrix assumption and discuss the formula proposed by Takeda, Uda, Kabashima and later by Tulino, Verdu, Caire and Shamai who used the replica method. Using the recently introduced adaptive interpolation method and random matrix theory, we prove this formula for a relevant large sub-class of rotationally invariant matrices. △ Less

Submitted 15 November, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

Comments: Presented at the 2018 IEEE International Symposium on Information Theory (ISIT)

Journal ref: 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, 2018, pp. 1390-1394

arXiv:1709.10368 [pdf, ps, other]

The Layered Structure of Tensor Estimation and its Mutual Information

Authors: Jean Barbier, Nicolas Macris, Léo Miolane

Abstract: We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula fo… ▽ More We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula for the mutual information for the order 3 problem from the knowledge of the formula for the order 2 problem, still using the same kind of interpolation. Our proof technique straightforwardly generalizes and allows to rigorously obtain the mutual information at any order in a recursive way. △ Less

Submitted 27 November, 2018; v1 submitted 29 September, 2017; originally announced September 2017.

Comments: 55th Annual Allerton Conference on Communication, Control, and Computing, 2017

arXiv:1708.03395 [pdf, other]

doi 10.1073/pnas.1802705116

Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models

Authors: Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, Lenka Zdeborová

Abstract: Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal es… ▽ More Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Non-rigorous predictions for the optimal errors existed for special cases of GLMs, e.g. for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance, and locate the associated sharp phase transitions separating learnable and non-learnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multi-purpose algorithms. This paper is divided in two parts that can be read independently: The first part (main part) presents the model and main results, discusses some applications and sketches the main ideas of the proof. The second part (supplementary informations) is much more detailed and provides more examples as well as all the proofs. △ Less

Submitted 1 November, 2018; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: 101 pages, 5 figures

Journal ref: Proceedings of the National Academy of Sciences 116. 12 (2019): 5451-5460

arXiv:1707.04203 [pdf, other]

Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocatio… ▽ More Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocation or spatially coupled constructions. Here, we generalize the analysis of these codes to a much broader setting that includes all memoryless channels. We show, for a large class of memoryless channels, that spatial coupling allows an efficient decoder, based on the generalized approximate message-passing (GAMP) algorithm, to reach the potential (or Bayes optimal) threshold of the underlying (or uncoupled) code ensemble. Moreover, we argue that spatially coupled sparse superposition codes universally achieve capacity under GAMP decoding by showing, through analytical computations, that the error floor vanishes and the potential threshold tends to capacity as one of the code parameter goes to infinity. Furthermore, we provide a closed form formula for the algorithmic threshold of the underlying code ensemble in terms of a Fisher information. Relating an algorithmic threshold to a Fisher information has theoretical as well as practical importance. Our proof relies on the state evolution analysis and uses the potential method developed in the theory of low-density parity-check (LDPC) codes and compressed sensing. △ Less

Submitted 8 November, 2018; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: Submitted to the IEEE transactions on information theory

arXiv:1705.02780 [pdf, other]

The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference

Authors: Jean Barbier, Nicolas Macris

Abstract: In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relativel… ▽ More In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference. △ Less

Submitted 27 October, 2018; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: Published in "Probability Theory and Related Fields"

arXiv:1704.04158 [pdf, other]

I-MMSE relations in random linear estimation and a sub-extensive interpolation method

Authors: Jean Barbier, Nicolas Macris

Abstract: Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. T… ▽ More Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. The main technical ingredient is a new interpolation method called "sub-extensive interpolation method". We use it to provide a new proof of an I-MMSE relation recently found by Reeves and Pfister [1] when the measurement rate is varied. Our proof makes it clear that this relation is intimately related to another I-MMSE relation also recently proved in [2]. One can directly verify that the identity relating the two types of variation of mutual information is indeed consistent with the one letter replica symmetric formula for the mutual information, first derived by Tanaka [3] for binary signals, and recently proved in more generality in [1,2,4,5] (by independent methods). However our proof is independent of any knowledge of Tanaka's formula. △ Less

Submitted 13 April, 2017; originally announced April 2017.

Comments: Presented at the International Symposium on Information Theory (ISIT) 2017, Aachen, Germany

arXiv:1701.05823 [pdf, other]

doi 10.1109/TIT.2020.2990880

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

Authors: Jean Barbier, Nicolas Macris, Mohamad Dia, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these consid… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation. △ Less

Submitted 28 August, 2020; v1 submitted 20 January, 2017; originally announced January 2017.

Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 7, pp. 4270-4303, July 2020

arXiv:1701.03590 [pdf, other]

doi 10.1109/ISIT.2017.8006798

Generalized Approximate Message-Passing Decoder for Universal Sparse Superposition Codes

Authors: Erdem Biyik, Jean Barbier, Mohamad Dia

Abstract: Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been s… ▽ More Sparse superposition (SS) codes were originally proposed as a capacity-achieving communication scheme over the additive white Gaussian noise channel (AWGNC) [1]. Very recently, it was discovered that these codes are universal, in the sense that they achieve capacity over any memoryless channel under generalized approximate message-passing (GAMP) decoding [2], although this decoder has never been stated for SS codes. In this contribution we introduce the GAMP decoder for SS codes, we confirm empirically the universality of this communication scheme through its study on various channels and we provide the main analysis tools: state evolution and potential. We also compare the performance of GAMP with the Bayes-optimal MMSE decoder. We empirically illustrate that despite the presence of a phase transition preventing GAMP to reach the optimal performance, spatial coupling allows to boost the performance that eventually tends to capacity in a proper limit. We also prove that, in contrast with the AWGNC case, SS codes for binary input channels have a vanishing error floor in the limit of large codewords. Moreover, the performance of Hadamard-based encoders is assessed for practical implementations. △ Less

Submitted 13 January, 2017; originally announced January 2017.

arXiv:1607.02335 [pdf, other]

doi 10.1109/ALLERTON.2016.7852290

The Mutual Information in Random Linear Estimation

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considera… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields, in particular, a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. △ Less

Submitted 6 September, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016

Journal ref: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Pages: 625 - 632

arXiv:1606.04142 [pdf, other]

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova

Abstract: Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows… ▽ More Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016) pp 424-432

arXiv:1603.04591 [pdf, other]

Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in… ▽ More We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in the large input alphabet size limit: i) the GAMP algorithmic threshold of the underlying (or uncoupled) code ensemble is simply expressed as a Fisher information; ii) the potential threshold tends to Shannon's capacity. Although we focus on coding for sake of coherence with our previous results, the framework and methods are very general and hold for a wide class of generalized estimation problems with random linear mixing. △ Less

Submitted 15 March, 2016; originally announced March 2016.

Comments: Submitted to the Information Theory Workshop (ITW) 2016, Cambridge, United Kingdom

arXiv:1603.01817 [pdf, other]

Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding a… ▽ More Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding are used, without need of power allocation. In this note we prove that state evolution (which tracks message passing) indeed saturates the potential threshold of the underlying code ensemble, which approaches in a proper limit the optimal threshold. Our proof uses ideas developed in the theory of low-density parity-check codes and compressive sensing. △ Less

Submitted 6 March, 2016; originally announced March 2016.

Comments: Submitted to the International Symposium on Information Theory (ISIT) 2016, Barcelona, Spain

arXiv:1511.05860 [pdf, other]

doi 10.1088/1742-6596/699/1/012013

Scampi: a robust approximate message-passing framework for compressive imaging

Authors: Jean Barbier, Eric W. Tramel, Florent Krzakala

Abstract: Reconstruction of images from noisy linear measurements is a core problem in image processing, for which convex optimization methods based on total variation (TV) minimization have been the long-standing state-of-the-art. We present an alternative probabilistic reconstruction procedure based on approximate message-passing, Scampi, which operates in the compressive regime, where the inverse imaging… ▽ More Reconstruction of images from noisy linear measurements is a core problem in image processing, for which convex optimization methods based on total variation (TV) minimization have been the long-standing state-of-the-art. We present an alternative probabilistic reconstruction procedure based on approximate message-passing, Scampi, which operates in the compressive regime, where the inverse imaging problem is underdetermined. While the proposed method is related to the recently proposed GrAMPA algorithm of Borgerding, Schniter, and Rangan, we further develop the probabilistic approach to compressive imaging by introducing an expectation-maximizaiton learning of model parameters, making the Scampi robust to model uncertainties. Additionally, our numerical experiments indicate that Scampi can provide reconstruction performance superior to both GrAMPA as well as convex approaches to TV reconstruction. Finally, through exhaustive best-case experiments, we show that in many cases the maximal performance of both Scampi and convex TV can be quite close, even though the approaches are a prori distinct. The theoretical reasons for this correspondence remain an open question. Nevertheless, the proposed algorithm remains more practical, as it requires far less parameter tuning to perform optimally. △ Less

Submitted 21 November, 2015; v1 submitted 18 November, 2015; originally announced November 2015.

Comments: Presented at the 2015 International Meeting on High-Dimensional Data Driven Science, Kyoto, Japan

Journal ref: 2016 J. Phys.: Conf. Ser. 699 012013

arXiv:1511.01650 [pdf, other]

Statistical physics and approximate message-passing algorithms for sparse linear estimation problems in signal processing and coding theory

Authors: Jean Barbier

Abstract: This thesis is interested in the application of statistical physics methods and inference to sparse linear estimation problems. The main tools are the graphical models and approximate message-passing algorithm together with the cavity method. We will also use the replica method of statistical physics of disordered systems which allows to associate to the studied problems a cost function referred a… ▽ More This thesis is interested in the application of statistical physics methods and inference to sparse linear estimation problems. The main tools are the graphical models and approximate message-passing algorithm together with the cavity method. We will also use the replica method of statistical physics of disordered systems which allows to associate to the studied problems a cost function referred as the potential of free entropy in physics. It allows to predict the different phases of typical complexity of the problem as a function of external parameters such as the noise level or the number of measurements one has about the signal: the inference can be typically easy, hard or impossible. We will see that the hard phase corresponds to a regime of coexistence of the actual solution together with another unwanted solution of the message passing equations. In this phase, it represents a metastable state which is not the true equilibrium solution. This phenomenon can be linked to supercooled water blocked in the liquid state below its freezing critical temperature. We will use a method that allows to overcome the metastability mimicing the strategy adopted by nature itself for supercooled water: the nucleation and spatial coupling. In supercooled water, a weak localized perturbation is enough to create a crystal nucleus that will propagate in all the medium thanks to the physical couplings between closeby atoms. The same process will help the algorithm to find the signal, thanks to the introduction of a nucleus containing local information about the signal. It will then spread as a "reconstruction wave" similar to the crystal in the water. After an introduction to statistical inference and sparse linear estimation, we will introduce the necessary tools. Then we will move to applications of these notions to signal processing and coding theory problems. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Comments: PhD thesis defended the september 18th 2015 at the Ecole Normale Supérieure of Paris, in front of the jury composed of Prof. Laurent DAUDET, examinateur, Prof. Silvio FRANZ, examinateur, Prof. Florent KRZAKALA, directeur, Prof. Marc LELARGE, examinateur, Prof. Nicolas MACRIS, rapporteur, Prof. Marc MÉZARD, examinateur, Prof. Federico RICCI-TERSENGHI, examinateur, Prof. David SAAD, rapporteur

arXiv:1503.08040 [pdf, other]

doi 10.1109/TIT.2017.2713833

Approximate message-passing decoder and capacity-achieving sparse superposition codes

Authors: Jean Barbier, Florent Krzakala

Abstract: We study the approximate message-passing decoder for sparse superposition coding on the additive white Gaussian noise channel and extend our preliminary work [1]. We use heuristic statistical-physics-based tools such as the cavity and the replica methods for the statistical analysis of the scheme. While superposition codes asymptotically reach the Shannon capacity, we show that our iterative decod… ▽ More We study the approximate message-passing decoder for sparse superposition coding on the additive white Gaussian noise channel and extend our preliminary work [1]. We use heuristic statistical-physics-based tools such as the cavity and the replica methods for the statistical analysis of the scheme. While superposition codes asymptotically reach the Shannon capacity, we show that our iterative decoder is limited by a phase transition similar to the one that happens in Low Density Parity check codes. We consider two solutions to this problem, that both allow to reach the Shannon capacity: i) a power allocation strategy and ii) the use of spatial coupling, a novelty for these codes that appears to be promising. We present in particular simulations suggesting that spatial coupling is more robust and allows for better reconstruction at finite code lengths. Finally, we show empirically that the use of a fast Hadamard-based operator allows for an efficient reconstruction, both in terms of computational time and memory, and the ability to deal with very large messages. △ Less

Submitted 29 September, 2016; v1 submitted 27 March, 2015; originally announced March 2015.

Comments: 40 pages, 18 figures

Journal ref: IEEE Transactions on Information Theory, Volume: 63, Issue: 8 (Aug. 2017)

arXiv:1409.7465 [pdf, other]

Error correcting codes and spatial coupling

Authors: Rafah El-Khatib, Jean Barbier, Ayaka Sakata, Rüdiger Urbanke

Abstract: These are notes from the lecture of Rüdiger Urbanke given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC and ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborovà f… ▽ More These are notes from the lecture of Rüdiger Urbanke given at the autumn school "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", that took place in Les Houches, France from Monday September 30th, 2013, till Friday October 11th, 2013. The school was organized by Florent Krzakala from UPMC and ENS Paris, Federico Ricci-Tersenghi from La Sapienza Roma, Lenka Zdeborovà from CEA Saclay and CNRS, and Riccardo Zecchina from Politecnico Torino. The first three sections cover the basics of polar codes and low density parity check codes. In the last three sections, we see how the spatial coupling helps belief propagation decoding. △ Less

Submitted 25 September, 2014; originally announced September 2014.

Comments: Chapter of "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms", Eds.: F. Krzakala, F. Ricci-Tersenghi, L. Zdeborovà, R. Zecchina, E. W. Tramel, L. F. Cugliandolo (Oxford University Press, to appear)

Showing 1–50 of 56 results for author: Barbier, J