Search | arXiv e-print repository

The PRODSAT phase of random quantum satisfiability

Authors: Joon Lee, Nicolas Macris, Jean Bernoulli Ravelomanana, Perrine Vantalon

Abstract: The $k$-QSAT problem is a quantum analog of the famous $k$-SAT constraint satisfaction problem. We must determine the zero energy ground states of a Hamiltonian of $N$ qubits consisting of a sum of $M$ random $k$-local rank-one projectors. It is known that product states of zero energy exist with high probability if and only if the underlying factor graph has a clause-covering dimer configuration.… ▽ More The $k$-QSAT problem is a quantum analog of the famous $k$-SAT constraint satisfaction problem. We must determine the zero energy ground states of a Hamiltonian of $N$ qubits consisting of a sum of $M$ random $k$-local rank-one projectors. It is known that product states of zero energy exist with high probability if and only if the underlying factor graph has a clause-covering dimer configuration. This means that the threshold of the PRODSAT phase is a purely geometric quantity equal to the dimer covering threshold. We revisit and fully prove this result through a combination of complex analysis and algebraic methods based on Buchberger's algorithm for complex polynomial equations with random coefficients. We also discuss numerical experiments investigating the presence of entanglement in the PRODSAT phase in the sense that product states do not span the whole zero energy ground state space. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2403.04615 [pdf, ps, other]

Rectangular Rotational Invariant Estimator for High-Rank Matrix Estimation

Authors: Farzad Pourkamali, Nicolas Macris

Abstract: We consider estimating a matrix from noisy observations coming from an arbitrary additive bi-rotational invariant perturbation. We propose an estimator which is optimal among the class of rectangular rotational invariant estimators and can be applied irrespective of the prior on the signal. For the particular case of Gaussian noise, we prove the optimality of the proposed estimator, and we find an… ▽ More We consider estimating a matrix from noisy observations coming from an arbitrary additive bi-rotational invariant perturbation. We propose an estimator which is optimal among the class of rectangular rotational invariant estimators and can be applied irrespective of the prior on the signal. For the particular case of Gaussian noise, we prove the optimality of the proposed estimator, and we find an explicit expression for the MMSE in terms of the limiting singular value distribution of the observation matrix. Moreover, we prove a formula linking the asymptotic mutual information and the limit of a log-spherical integral of rectangular matrices. We also provide numerical checks for our results for general bi-rotational invariant noise, as well as Gaussian noise, which match our theoretical predictions. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2304.12264

arXiv:2402.07626 [pdf, other]

Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features

Authors: Rodrigo Veiga, Anastasia Remizova, Nicolas Macris

Abstract: We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double de… ▽ More We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement. △ Less

Submitted 10 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024

arXiv:2306.04592 [pdf, other]

Bayesian Extensive-Rank Matrix Factorization with Rotational Invariant Priors

Authors: Farzad Pourkamali, Nicolas Macris

Abstract: We consider a statistical model for matrix factorization in a regime where the rank of the two hidden matrix factors grows linearly with their dimension and their product is corrupted by additive noise. Despite various approaches, statistical and algorithmic limits of such problems have remained elusive. We study a Bayesian setting with the assumptions that (a) one of the matrix factors is symmetr… ▽ More We consider a statistical model for matrix factorization in a regime where the rank of the two hidden matrix factors grows linearly with their dimension and their product is corrupted by additive noise. Despite various approaches, statistical and algorithmic limits of such problems have remained elusive. We study a Bayesian setting with the assumptions that (a) one of the matrix factors is symmetric, (b) both factors as well as the additive noise have rotational invariant priors, (c) the priors are known to the statistician. We derive analytical formulas for Rotation Invariant Estimators to reconstruct the two matrix factors, and conjecture that these are optimal in the large-dimension limit, in the sense that they minimize the average mean-square-error. We provide numerical checks which confirm the optimality conjecture when confronted to Oracle Estimators which are optimal by definition, but involve the ground-truth. Our derivation relies on a combination of tools, namely random matrix theory transforms, spherical integral formulas, and the replica method from statistical mechanics. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.01412 [pdf, ps, other]

Matrix Inference in Growing Rank Regimes

Authors: Farzad Pourkamali, Jean Barbier, Nicolas Macris

Abstract: The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=Θ(N^α)$ with $α\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a)… ▽ More The inference of a large symmetric signal-matrix $\mathbf{S} \in \mathbb{R}^{N\times N}$ corrupted by additive Gaussian noise, is considered for two regimes of growth of the rank $M$ as a function of $N$. For sub-linear ranks $M=Θ(N^α)$ with $α\in(0,1)$ the mutual information and minimum mean-square error (MMSE) are derived for two classes of signal-matrices: (a) $\mathbf{S}=\mathbf{X}\mathbf{X}^\intercal$ with entries of $\mathbf{X}\in\mathbb{R}^{N\times M}$ independent identically distributed; (b) $\mathbf{S}$ sampled from a rotationally invariant distribution. Surprisingly, the formulas match the rank-one case. Two efficient algorithms are explored and conjectured to saturate the MMSE when no statistical-to-computational gap is present: (1) Decimation Approximate Message Passing; (2) a spectral algorithm based on a Rotation Invariant Estimator. For linear ranks $M=Θ(N)$ the mutual information is rigorously derived for signal-matrices from a rotationally invariant distribution. Close connections with scalar inference in free probability are uncovered, which allow to deduce a simple formula for the MMSE as an integral involving the limiting spectral measure of the data matrix only. An interesting issue is whether the known information theoretic phase transitions for rank-one, and hence also sub-linear-rank, still persist in linear-rank. Our analysis suggests that only a smoothed-out trace of the transitions persists. Furthermore, the change of behavior between low and truly high-rank regimes only happens at the linear scale $α=1$. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2304.12264 [pdf, ps, other]

Rectangular Rotational Invariant Estimator for General Additive Noise Matrices

Authors: Farzad Pourkamali, Nicolas Macris

Abstract: We propose a rectangular rotational invariant estimator to recover a real matrix from noisy matrix observations coming from an arbitrary additive rotational invariant perturbation, in the large dimension limit. Using the Bayes-optimality of this estimator, we derive the asymptotic minimum mean squared error (MMSE). For the particular case of Gaussian noise, we find an explicit expression for the M… ▽ More We propose a rectangular rotational invariant estimator to recover a real matrix from noisy matrix observations coming from an arbitrary additive rotational invariant perturbation, in the large dimension limit. Using the Bayes-optimality of this estimator, we derive the asymptotic minimum mean squared error (MMSE). For the particular case of Gaussian noise, we find an explicit expression for the MMSE in terms of the limiting singular value distribution of the observation matrix. Moreover, we prove a formula linking the asymptotic mutual information and the limit of log-spherical integral of rectangular matrices. We also provide numerical checks for our results, which match our theoretical predictions and known Bayesian inference results. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.09474 [pdf, other]

Gradient flow on extensive-rank positive semi-definite matrix denoising

Authors: Antoine Bodin, Nicolas Macris

Abstract: In this work, we present a new approach to analyze the gradient flow for a positive semi-definite matrix denoising problem in an extensive-rank and high-dimensional regime. We use recent linear pencil techniques of random matrix theory to derive fixed point equations which track the complete time evolution of the matrix-mean-square-error of the problem. The predictions of the resulting fixed point… ▽ More In this work, we present a new approach to analyze the gradient flow for a positive semi-definite matrix denoising problem in an extensive-rank and high-dimensional regime. We use recent linear pencil techniques of random matrix theory to derive fixed point equations which track the complete time evolution of the matrix-mean-square-error of the problem. The predictions of the resulting fixed point equations are validated by numerical experiments. In this short note we briefly illustrate a few predictions of our formalism by way of examples, and in particular we uncover continuous phase transitions in the extensive-rank and high-dimensional regime, which connect to the classical phase transitions of the low-rank problem in the appropriate limit. The formalism has much wider applicability than shown in this communication. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2212.06757 [pdf, other]

Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures

Authors: Antoine Bodin, Nicolas Macris

Abstract: A recent line of work has shown remarkable behaviors of the generalization error curves in simple learning models. Even the least-squares regression has shown atypical features such as the model-wise double descent, and further works have observed triple or multiple descents. Another important characteristic are the epoch-wise descent structures which emerge during training. The observations of mo… ▽ More A recent line of work has shown remarkable behaviors of the generalization error curves in simple learning models. Even the least-squares regression has shown atypical features such as the model-wise double descent, and further works have observed triple or multiple descents. Another important characteristic are the epoch-wise descent structures which emerge during training. The observations of model-wise and epoch-wise descents have been analytically derived in limited theoretical settings (such as the random feature model) and are otherwise experimental. In this work, we provide a full and unified analysis of the whole time-evolution of the generalization curve, in the asymptotic large-dimensional regime and under gradient-flow, within a wider theoretical setting stemming from a gaussian covariate model. In particular, we cover most cases already disparately observed in the literature, and also provide examples of the existence of multiple descent structures as a function of a model parameter or time. Furthermore, we show that our theoretical predictions adequately match the learning curves obtained by gradient descent over realistic datasets. Technically we compute averages of rational expressions involving random matrices using recent developments in random matrix theory based on "linear pencils". Another contribution, which is also of independent interest in random matrix theory, is a new derivation of related fixed point equations (and an extension there-off) using Dyson brownian motions. △ Less

Submitted 18 December, 2022; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2112.11356 [pdf, ps, other]

On the convergence to the non-equilibrium steady state of a Langevin dynamics with widely separated time scales and different temperatures

Authors: Diego Alberici, Nicolas Macris, Emanuele Mingione

Abstract: We study the solution of the two-temperatures Fokker-Planck equation and rigorously analyse its convergence towards an explicit non-equilibrium stationary measure for long time and two widely separated time scales. The exponential rates of convergence are estimated assuming the validity of logarithmic Sobolev inequalities for the conditional and marginal distributions of the limit measure. We show… ▽ More We study the solution of the two-temperatures Fokker-Planck equation and rigorously analyse its convergence towards an explicit non-equilibrium stationary measure for long time and two widely separated time scales. The exponential rates of convergence are estimated assuming the validity of logarithmic Sobolev inequalities for the conditional and marginal distributions of the limit measure. We show that these estimates are sharp in the exactly solvable case of a quadratic potential. We discuss a few examples where the logarithmic Sobolev inequalities are satisfied through simple, though not optimal, criteria. In particular we consider a spin-glass model with slowly varying external magnetic fields whose non-equilibrium measure corresponds to Guerra's hierarchical construction appearing in Talagrand's proof of the Parisi formula. △ Less

Submitted 10 March, 2023; v1 submitted 21 December, 2021; originally announced December 2021.

Journal ref: Annales Henri Poincaré (2024)

arXiv:2110.11805 [pdf, other]

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Authors: Antoine Bodin, Nicolas Macris

Abstract: Recent evidence has shown the existence of a so-called double-descent and even triple-descent behavior for the generalization error of deep-learning models. This important phenomenon commonly appears in implemented neural network architectures, and also seems to emerge in epoch-wise curves during the training process. A recent line of research has highlighted that random matrix tools can be used t… ▽ More Recent evidence has shown the existence of a so-called double-descent and even triple-descent behavior for the generalization error of deep-learning models. This important phenomenon commonly appears in implemented neural network architectures, and also seems to emerge in epoch-wise curves during the training process. A recent line of research has highlighted that random matrix tools can be used to obtain precise analytical asymptotics of the generalization (and training) errors of the random feature model. In this contribution, we analyze the whole temporal behavior of the generalization and training errors under gradient flow for the random feature model. We show that in the asymptotic limit of large system size the full time-evolution path of both errors can be calculated analytically. This allows us to observe how the double and triple descents develop over time, if and when early stop** is an option, and also observe time-wise descent structures. Our techniques are based on Cauchy complex integral representations of the errors together with recent random matrix methods based on linear pencils. △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2109.06610 [pdf, other]

doi 10.1103/PhysRevE.106.024136

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

Authors: Jean Barbier, Nicolas Macris

Abstract: We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising prob… ▽ More We consider increasingly complex models of matrix denoising and dictionary learning in the Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank growing linearly with the system size. This is in contrast with most existing literature concerned with the low-rank (i.e., constant-rank) regime. We first consider a class of rotationally invariant matrix denoising problems whose mutual information and minimum mean-square error are computable using techniques from random matrix theory. Next, we analyze the more challenging models of dictionary learning. To do so we introduce a novel combination of the replica method from statistical mechanics together with random matrix theory, coined spectral replica method. This allows us to derive variational formulas for the mutual information between hidden representations and the noisy data of the dictionary learning problem, as well as for the overlaps quantifying the optimal reconstruction error. The proposed method reduces the number of degrees of freedom from $Θ(N^2)$ matrix entries to $Θ(N)$ eigenvalues (or singular values), and yields Coulomb gas representations of the mutual information which are reminiscent of matrix models in physics. The main ingredients are a combination of large deviation results for random matrices together with a new replica symmetric decoupling ansatz at the level of the probability distributions of eigenvalues (or singular values) of certain overlap matrices and the use of HarishChandra-Itzykson-Zuber spherical integrals. △ Less

Submitted 26 February, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:2107.08927 [pdf, other]

Mismatched Estimation of rank-one symmetric matrices under Gaussian noise

Authors: Farzad Pourkamali, Nicolas Macris

Abstract: We consider the estimation of an n-dimensional vector s from the noisy element-wise measurements of $\mathbf{s}\mathbf{s}^T$, a generic problem that arises in statistics and machine learning. We study a mismatched Bayesian inference setting, where some of the parameters are not known to the statistician. We derive the full exact analytic expression of the asymptotic mean squared error (MSE) in the… ▽ More We consider the estimation of an n-dimensional vector s from the noisy element-wise measurements of $\mathbf{s}\mathbf{s}^T$, a generic problem that arises in statistics and machine learning. We study a mismatched Bayesian inference setting, where some of the parameters are not known to the statistician. We derive the full exact analytic expression of the asymptotic mean squared error (MSE) in the large system size limit for the particular case of Gaussian priors and additive noise. From our formulas, we see that estimation is still possible in the mismatched case; and also that the minimum MSE (MMSE) can be achieved if the statistician chooses suitable parameters. Our technique relies on the asymptotics of the spherical integrals and can be applied as long as the statistician chooses a rotationally invariant prior. △ Less

Submitted 13 September, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

arXiv:2105.12257 [pdf, other]

Rank-one matrix estimation: analytic time evolution of gradient descent dynamics

Authors: Antoine Bodin, Nicolas Macris

Abstract: We consider a rank-one symmetric matrix corrupted by additive noise. The rank-one matrix is formed by an $n$-component unknown vector on the sphere of radius $\sqrt{n}$, and we consider the problem of estimating this vector from the corrupted matrix in the high dimensional limit of $n$ large, by gradient descent for a quadratic cost function on the sphere. Explicit formulas for the whole time evol… ▽ More We consider a rank-one symmetric matrix corrupted by additive noise. The rank-one matrix is formed by an $n$-component unknown vector on the sphere of radius $\sqrt{n}$, and we consider the problem of estimating this vector from the corrupted matrix in the high dimensional limit of $n$ large, by gradient descent for a quadratic cost function on the sphere. Explicit formulas for the whole time evolution of the overlap between the estimator and unknown vector, as well as the cost, are rigorously derived. In the long time limit we recover the well known spectral phase transition, as a function of the signal-to-noise ratio. The explicit formulas also allow to point out interesting transient features of the time evolution. Our analysis technique is based on recent progress in random matrix theory and uses local versions of the semi-circle law. △ Less

Submitted 25 May, 2021; originally announced May 2021.

arXiv:2012.07747 [pdf, other]

doi 10.1007/s10915-022-02044-x

Solving non-linear Kolmogorov equations in large dimensions by using deep learning: a numerical comparison of discretization schemes

Authors: Nicolas Macris, Raffaele Marino

Abstract: Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen-Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black-Scholes equation describes the evolution of the price of derivative inv… ▽ More Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen-Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black-Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced by E, Han, and Jentzen [1][2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov's equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity. △ Less

Submitted 11 September, 2022; v1 submitted 9 December, 2020; originally announced December 2020.

Journal ref: J Sci Comput 94, 8 (2023)

arXiv:2006.14989 [pdf, other]

Tensor estimation with structured priors

Authors: Clément Luneau, Nicolas Macris

Abstract: We consider rank-one symmetric tensor estimation when the tensor is corrupted by Gaussian noise and the spike forming the tensor is a structured signal coming from a generalized linear model. The latter is a mathematically tractable model of a non-trivial hidden lower-dimensional latent structure in a signal. We work in a large dimensional regime with fixed ratio of signal-to-latent space dimensio… ▽ More We consider rank-one symmetric tensor estimation when the tensor is corrupted by Gaussian noise and the spike forming the tensor is a structured signal coming from a generalized linear model. The latter is a mathematically tractable model of a non-trivial hidden lower-dimensional latent structure in a signal. We work in a large dimensional regime with fixed ratio of signal-to-latent space dimensions. Remarkably, in this asymptotic regime, the mutual information between the spike and the observations can be expressed as a finite-dimensional variational problem, and it is possible to deduce the minimum-mean-square-error from its solution. We discuss, on examples, properties of the phase transitions as a function of the signal-to-noise ratio. Typically, the critical signal-to-noise ratio decreases with increasing signal-to-latent space dimensions. We discuss the limit of vanishing ratio of signal-to-latent space dimensions and determine the limiting tensor estimation problem. We also point out similarities and differences with the case of matrices. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.11313 [pdf, other]

Information theoretic limits of learning a sparse rule

Authors: Clément Luneau, Jean Barbier, Nicolas Macris

Abstract: We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bay… ▽ More We consider generalized linear models in regimes where the number of nonzero components of the signal and accessible data points are sublinear with respect to the size of the signal. We prove a variational formula for the asymptotic mutual information per sample when the system size grows to infinity. This result allows us to derive an expression for the minimum mean-square error (MMSE) of the Bayesian estimator when the signal entries have a discrete distribution with finite support. We find that, for such signals and suitable vanishing scalings of the sparsity and sampling rate, the MMSE is nonincreasing piecewise constant. In specific instances the MMSE even displays an all-or-nothing phase transition, that is, the MMSE sharply jumps from its maximum value to zero at a critical sampling rate. The all-or-nothing phenomenon has previously been shown to occur in high-dimensional linear regression. Our analysis goes beyond the linear case and applies to learning the weights of a perceptron with general activation function in a teacher-student scenario. In particular, we discuss an all-or-nothing phenomenon for the generalization error with a sublinear set of training examples. △ Less

Submitted 27 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: 56 pages, 4 figures, accepted to the 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Extended version that includes the supplementary material

arXiv:2006.07971 [pdf, other]

All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation

Authors: Jean Barbier, Nicolas Macris, Cynthia Rush

Abstract: We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropr… ▽ More We determine statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix and analyze the approximate message passing algorithm in the sparse regime. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, we find all-or-nothing phase transitions for the asymptotic minimum and algorithmic mean-square errors. These jump from their maximum possible value to zero, at well defined signal-to-noise thresholds whose asymptotic values we determine exactly. In the asymptotic regime the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing. △ Less

Submitted 30 October, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: Part of this work (in particular the proof of Theorem 1) is already present in reference arXiv:1911.05030

arXiv:2004.06975 [pdf, ps, other]

doi 10.1109/ISIT44484.2020.9174104

High-dimensional rank-one nonsymmetric matrix decomposition: the spherical case

Authors: Clément Luneau, Nicolas Macris, Jean Barbier

Abstract: We consider the problem of estimating a rank-one nonsymmetric matrix under additive white Gaussian noise. The matrix to estimate can be written as the outer product of two vectors and we look at the special case in which both vectors are uniformly distributed on spheres. We prove a replica-symmetric formula for the average mutual information between these vectors and the observations in the high-d… ▽ More We consider the problem of estimating a rank-one nonsymmetric matrix under additive white Gaussian noise. The matrix to estimate can be written as the outer product of two vectors and we look at the special case in which both vectors are uniformly distributed on spheres. We prove a replica-symmetric formula for the average mutual information between these vectors and the observations in the high-dimensional regime. This goes beyond previous results which considered vectors with independent and identically distributed elements. The method used can be extended to rank-one tensor problems. △ Less

Submitted 15 April, 2020; originally announced April 2020.

Comments: Will appear in 2020 IEEE International Symposium on Information Theory (ISIT). Long version with appendices, 26 pages

arXiv:1912.06105 [pdf, other]

Bell Diagonal and Werner state generation: entanglement, non-locality, steering and discord on the IBM quantum computer

Authors: Elias Riedel Gårding, Nicolas Schwaller, Su Yeon Chang, Samuel Bosch, Willy Robert Laborde, Javier Naya Hernandez, Chun Lam Chan, Frédéric Gessler, Xinyu Si, Marc-André Dupertuis, Nicolas Macris

Abstract: We propose the first correct special-purpose quantum circuits for preparation of Bell-diagonal states (BDS), and implement them on the IBM Quantum computer, characterizing and testing complex aspects of their quantum correlations in the full parameter space. Among the circuits proposed, one involves only two quantum bits but requires adapted quantum tomography routines handling classical bits in p… ▽ More We propose the first correct special-purpose quantum circuits for preparation of Bell-diagonal states (BDS), and implement them on the IBM Quantum computer, characterizing and testing complex aspects of their quantum correlations in the full parameter space. Among the circuits proposed, one involves only two quantum bits but requires adapted quantum tomography routines handling classical bits in parallel. The entire class of Bell-diagonal states is generated, and a number of characteristic indicators, namely entanglement of formation, CHSH non-locality, steering and discord, are experimentally evaluated over the full parameter space and compared with theory. As a by-product of this work we also find a remarkable general inequality between "quantum discord" and "asymmetric relative entropy of discord": the former never exceeds the latter. We also prove that for all BDS the two coincide. △ Less

Submitted 16 May, 2021; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: 20 pages, 23 figures

arXiv:1911.05030 [pdf, other]

0-1 phase transitions in sparse spiked matrix estimation

Authors: Jean Barbier, Nicolas Macris

Abstract: We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We pr… ▽ More We consider statistical models of estimation of a rank-one matrix (the spike) corrupted by an additive gaussian noise matrix in the sparse limit. In this limit the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scales sub-linearly with the total dimension of the vector, and the signal strength tends to infinity at an appropriate speed. We prove explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix in suitable sparse limits. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error. A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression (compressive sensing). △ Less

Submitted 12 November, 2019; originally announced November 2019.

arXiv:1904.04565 [pdf, ps, other]

doi 10.1093/imaiai/iaaa022

Mutual information for low-rank even-order symmetric tensor estimation

Authors: Clément Luneau, Jean Barbier, Nicolas Macris

Abstract: We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This re… ▽ More We consider a statistical model for finite-rank symmetric tensor factorization and prove a single-letter variational expression for its asymptotic mutual information when the tensor is of even order. The proof applies the adaptive interpolation method originally invented for rank-one factorization. Here we show how to extend the adaptive interpolation to finite-rank and even-order tensors. This requires new nontrivial ideas with respect to the current analysis in the literature. We also underline where the proof falls short when dealing with odd-order tensors. △ Less

Submitted 23 September, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: Preprint of an article accepted for publication in Information and Inference: A Journal of the IMA

arXiv:1902.07273 [pdf, other]

Mutual Information for the Stochastic Block Model by the Adaptive Interpolation Method

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve map** the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmi… ▽ More We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve map** the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method. △ Less

Submitted 16 July, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1901.06521 [pdf, other]

doi 10.1007/s10955-019-02470-6

Concentration of multi-overlaps for random ferromagnetic spin models

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori l… ▽ More We consider ferromagnetic spin models on dilute random graphs and prove that, with suitable one-body infinitesimal perturbations added to the Hamiltonian, the multi-overlaps concentrate for all temperatures, both with respect to the thermal Gibbs average and the quenched randomness. Results of this nature have been known only for the lowest order overlaps, at high temperature or on the Nishimori line. Here we treat all multi-overlaps by a non-trivial application of Griffiths-Kelly-Sherman correlation inequalities. Our results apply in particular to the pure and mixed p-spin ferromagnets on random dilute Erdoes-Rényi hypergraphs. On physical grounds one expects that multi-overlap concentration directly implies the correctness of the cavity (or replica symmetric) formula for the pressure. The proof of this formula for the general p-spin ferromagnet on a random dilute hypergraph remains an open problem. △ Less

Submitted 19 January, 2019; originally announced January 2019.

arXiv:1901.06516 [pdf, ps, other]

doi 10.1088/1751-8121/ab2735

The adaptive interpolation method for proving replica formulas. Applications to the Curie-Weiss and Wigner spike models

Authors: Jean Barbier, Nicolas Macris

Abstract: In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We… ▽ More In this contribution we give a pedagogic introduction to the newly introduced adaptive interpolation method to prove in a simple and unified way replica formulas for Bayesian optimal inference problems. Many aspects of this method can already be explained at the level of the simple Curie-Weiss spin system. This provides a new method of solution for this model which does not appear to be known. We then generalize this analysis to a paradigmatic inference problem, namely rank-one matrix estimation, also refered to as the Wigner spike model in statistics. We give many pointers to the recent literature where the method has been succesfully applied. △ Less

Submitted 7 March, 2020; v1 submitted 19 January, 2019; originally announced January 2019.

arXiv:1812.02537 [pdf, other]

Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Abstract: Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and pro… ▽ More Factorizing low-rank matrices is a problem with many applications in machine learning and statistics, ranging from sparse PCA to community detection and sub-matrix localization. For probabilistic models in the Bayes optimal setting, general expressions for the mutual information have been proposed using powerful heuristic statistical physics computations via the replica and cavity methods, and proven in few specific cases by a variety of methods. Here, we use the spatial coupling methodology developed in the framework of error correcting codes, to rigorously derive the mutual information for the symmetric rank-one case. We characterize the detectability phase transitions in a large set of estimation problems, where we show that there exists a gap between what currently known polynomial algorithms (in particular spectral methods and approximate message-passing) can do and what is expected information theoretically. Moreover, we show that the computational gap vanishes for the proposed spatially coupled model, a promising feature with many possible applications. Our proof technique has an interest on its own and exploits three essential ingredients: the interpolation method first introduced in statistical physics, the analysis of approximate message-passing algorithms first introduced in compressive sensing, and the theory of threshold saturation for spatially coupled systems first developed in coding theory. Our approach is very generic and can be applied to many other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Submitted to Journal of Machine Learning Research (JMLR)

arXiv:1807.05572 [pdf, other]

doi 10.1002/qute.201900015

Efficient quantum algorithms for $GHZ$ and $W$ states, and implementation on the IBM quantum computer

Authors: Diogo Cruz, Romain Fournier, Fabien Gremion, Alix Jeannerot, Kenichi Komagata, Tara Tosic, Jarla Thiesbrummel, Chun Lam Chan, Nicolas Macris, Marc-André Dupertuis, Clément Javerzac-Galy

Abstract: We propose efficient algorithms with logarithmic step complexities for the generation of entangled $GHZ_N$ and $W_N$ states useful for quantum networks, and we demonstrate an implementation on the IBM quantum computer up to $N=16$. Improved quality is then investigated using full quantum tomography for low-$N$ GHZ and W states. This is completed by parity oscillations and histogram distance for la… ▽ More We propose efficient algorithms with logarithmic step complexities for the generation of entangled $GHZ_N$ and $W_N$ states useful for quantum networks, and we demonstrate an implementation on the IBM quantum computer up to $N=16$. Improved quality is then investigated using full quantum tomography for low-$N$ GHZ and W states. This is completed by parity oscillations and histogram distance for large $N$ GHZ and W states respectively. We are capable to robustly build states with about twice the number of quantum bits which were previously achieved. Finally we attempt quantum error correction on GHZ using recent schemes proposed in the literature, but with the present amount of decoherence they prove detrimental. △ Less

Submitted 15 July, 2018; originally announced July 2018.

Journal ref: Adv. Quantum Technol. 1900015 (2019)

arXiv:1806.05451 [pdf, other]

doi 10.1088/1742-5468/ab43d2

The committee machine: Computational to statistical gaps in learning a two-layers neural network

Authors: Benjamin Aubin, Antoine Maillard, Jean Barbier, Florent Krzakala, Nicolas Macris, Lenka Zdeborová

Abstract: Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of… ▽ More Heuristic tools from statistical physics have been used in the past to locate the phase transitions and compute the optimal learning and generalization errors in the teacher-student scenario in multi-layer neural networks. In this contribution, we provide a rigorous justification of these approaches for a two-layers neural network model called the committee machine. We also introduce a version of the approximate message passing (AMP) algorithm for the committee machine that allows to perform optimal learning in polynomial time for a large set of parameters. We find that there are regimes in which a low generalization error is information-theoretically achievable while the AMP algorithm fails to deliver it, strongly suggesting that no efficient algorithm exists for those cases, and unveiling a large computational gap. △ Less

Submitted 29 February, 2024; v1 submitted 14 June, 2018; originally announced June 2018.

Comments: 18 pages + supplementary material, 3 figures. (v2: update to match the published version ; v3: clarification of the caption of Fig. 3)

Journal ref: J. Stat. Mech. (2019) 124023. & NeurIPS 2018

arXiv:1806.05121 [pdf, other]

Adaptive Path Interpolation for Sparse Systems: Application to a Simple Censored Block Model

Authors: Jean Barbier, Chun Lam Chan, Nicolas Macris

Abstract: Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation meth… ▽ More Recently a new adaptive path interpolation method has been developed as a simple and versatile scheme to calculate exactly the asymptotic mutual information of Bayesian inference problems defined on dense factor graphs. These include random linear and generalized estimation, sparse superposition codes, or low-rank matrix and tensor estimation. For all these systems, the adaptive interpolation method directly proves that the replica symmetric prediction is exact, in a simple and unified manner. When the underlying factor graph of the inference problem is sparse the replica prediction is considerably more complicated, and rigorous results are often lacking or obtained by rather complicated methods. In this work we show how to extend the adaptive path interpolation method to sparse systems. We concentrate on a Censored Block Model, where hidden variables are measured through a binary erasure channel, for which we fully prove the replica prediction. △ Less

Submitted 18 July, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1805.09785 [pdf, other]

doi 10.1088/1742-5468/ab3430

Entropy and mutual information in models of deep neural networks

Authors: Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

Abstract: We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is kno… ▽ More We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive. △ Less

Submitted 29 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

Journal ref: J. Stat. Mech. (2019) 124014. & NeurIPS 2018

arXiv:1802.08963 [pdf, other]

doi 10.1109/ISIT.2018.8437522

The Mutual Information in Random Linear Estimation Beyond i.i.d. Matrices

Authors: Jean Barbier, Nicolas Macris, Antoine Maillard, Florent Krzakala

Abstract: There has been definite progress recently in proving the variational single-letter formula given by the heuristic replica method for various estimation problems. In particular, the replica formula for the mutual information in the case of noisy linear estimation with random i.i.d. matrices, a problem with applications ranging from compressed sensing to statistics, has been proven rigorously. In th… ▽ More There has been definite progress recently in proving the variational single-letter formula given by the heuristic replica method for various estimation problems. In particular, the replica formula for the mutual information in the case of noisy linear estimation with random i.i.d. matrices, a problem with applications ranging from compressed sensing to statistics, has been proven rigorously. In this contribution we go beyond the restrictive i.i.d. matrix assumption and discuss the formula proposed by Takeda, Uda, Kabashima and later by Tulino, Verdu, Caire and Shamai who used the replica method. Using the recently introduced adaptive interpolation method and random matrix theory, we prove this formula for a relevant large sub-class of rotationally invariant matrices. △ Less

Submitted 15 November, 2018; v1 submitted 25 February, 2018; originally announced February 2018.

Comments: Presented at the 2018 IEEE International Symposium on Information Theory (ISIT)

Journal ref: 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, 2018, pp. 1390-1394

arXiv:1709.10368 [pdf, ps, other]

The Layered Structure of Tensor Estimation and its Mutual Information

Authors: Jean Barbier, Nicolas Macris, Léo Miolane

Abstract: We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula fo… ▽ More We consider rank-one non-symmetric tensor estimation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula for the mutual information for the order 3 problem from the knowledge of the formula for the order 2 problem, still using the same kind of interpolation. Our proof technique straightforwardly generalizes and allows to rigorously obtain the mutual information at any order in a recursive way. △ Less

Submitted 27 November, 2018; v1 submitted 29 September, 2017; originally announced September 2017.

Comments: 55th Annual Allerton Conference on Communication, Control, and Computing, 2017

arXiv:1708.03395 [pdf, other]

doi 10.1073/pnas.1802705116

Optimal Errors and Phase Transitions in High-Dimensional Generalized Linear Models

Authors: Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, Lenka Zdeborová

Abstract: Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal es… ▽ More Generalized linear models (GLMs) arise in high-dimensional machine learning, statistics, communications and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes or benchmark models in neural networks. We evaluate the mutual information (or "free entropy") from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Non-rigorous predictions for the optimal errors existed for special cases of GLMs, e.g. for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance, and locate the associated sharp phase transitions separating learnable and non-learnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multi-purpose algorithms. This paper is divided in two parts that can be read independently: The first part (main part) presents the model and main results, discusses some applications and sketches the main ideas of the proof. The second part (supplementary informations) is much more detailed and provides more examples as well as all the proofs. △ Less

Submitted 1 November, 2018; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: 101 pages, 5 figures

Journal ref: Proceedings of the National Academy of Sciences 116. 12 (2019): 5451-5460

arXiv:1707.04203 [pdf, other]

Universal Sparse Superposition Codes with Spatial Coupling and GAMP Decoding

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocatio… ▽ More Sparse superposition codes, or sparse regression codes, constitute a new class of codes which was first introduced for communication over the additive white Gaussian noise (AWGN) channel. It has been shown that such codes are capacity-achieving over the AWGN channel under optimal maximum-likelihood decoding as well as under various efficient iterative decoding schemes equipped with power allocation or spatially coupled constructions. Here, we generalize the analysis of these codes to a much broader setting that includes all memoryless channels. We show, for a large class of memoryless channels, that spatial coupling allows an efficient decoder, based on the generalized approximate message-passing (GAMP) algorithm, to reach the potential (or Bayes optimal) threshold of the underlying (or uncoupled) code ensemble. Moreover, we argue that spatially coupled sparse superposition codes universally achieve capacity under GAMP decoding by showing, through analytical computations, that the error floor vanishes and the potential threshold tends to capacity as one of the code parameter goes to infinity. Furthermore, we provide a closed form formula for the algorithmic threshold of the underlying code ensemble in terms of a Fisher information. Relating an algorithmic threshold to a Fisher information has theoretical as well as practical importance. Our proof relies on the state evolution analysis and uses the potential method developed in the theory of low-density parity-check (LDPC) codes and compressed sensing. △ Less

Submitted 8 November, 2018; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: Submitted to the IEEE transactions on information theory

arXiv:1705.02780 [pdf, other]

The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference

Authors: Jean Barbier, Nicolas Macris

Abstract: In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relativel… ▽ More In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the adaptive interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with an interpolation path that is adaptive. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference. △ Less

Submitted 27 October, 2018; v1 submitted 8 May, 2017; originally announced May 2017.

Comments: Published in "Probability Theory and Related Fields"

arXiv:1704.04158 [pdf, other]

I-MMSE relations in random linear estimation and a sub-extensive interpolation method

Authors: Jean Barbier, Nicolas Macris

Abstract: Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. T… ▽ More Consider random linear estimation with Gaussian measurement matrices and noise. One can compute infinitesimal variations of the mutual information under infinitesimal variations of the signal-to-noise ratio or of the measurement rate. We discuss how each variation is related to the minimum mean-square error and deduce that the two variations are directly connected through a very simple identity. The main technical ingredient is a new interpolation method called "sub-extensive interpolation method". We use it to provide a new proof of an I-MMSE relation recently found by Reeves and Pfister [1] when the measurement rate is varied. Our proof makes it clear that this relation is intimately related to another I-MMSE relation also recently proved in [2]. One can directly verify that the identity relating the two types of variation of mutual information is indeed consistent with the one letter replica symmetric formula for the mutual information, first derived by Tanaka [3] for binary signals, and recently proved in more generality in [1,2,4,5] (by independent methods). However our proof is independent of any knowledge of Tanaka's formula. △ Less

Submitted 13 April, 2017; originally announced April 2017.

Comments: Presented at the International Symposium on Information Theory (ISIT) 2017, Aachen, Germany

arXiv:1701.05823 [pdf, other]

doi 10.1109/TIT.2020.2990880

Mutual Information and Optimality of Approximate Message-Passing in Random Linear Estimation

Authors: Jean Barbier, Nicolas Macris, Mohamad Dia, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these consid… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections. A few examples where this problem is relevant are compressed sensing, sparse superposition codes, and code division multiple access. There has been a number of works considering the mutual information for this problem using the replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-Toninelli type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. In addition, we prove that the low complexity approximate message-passing algorithm is optimal outside of the so-called hard phase, in the sense that it asymptotically reaches the minimal-mean-square error. In this work spatial coupling is used primarily as a proof technique. However our results also prove two important features of spatially coupled noisy linear random Gaussian estimation. First there is no algorithmically hard phase. This means that for such systems approximate message-passing always reaches the minimal-mean-square error. Secondly, in a proper limit the mutual information associated to such systems is the same as the one of uncoupled linear random Gaussian estimation. △ Less

Submitted 28 August, 2020; v1 submitted 20 January, 2017; originally announced January 2017.

Journal ref: IEEE Transactions on Information Theory, vol. 66, no. 7, pp. 4270-4303, July 2020

arXiv:1701.04651 [pdf, other]

Displacement Convexity in Spatially Coupled Scalar Recursions

Authors: Rafah El-Khatib, Nicolas Macris, Tom Richardson, Ruediger Urbanke

Abstract: We introduce a technique for the analysis of general spatially coupled systems that are governed by scalar recursions. Such systems can be expressed in variational form in terms of a potential functional. We show, under mild conditions, that the potential functional is \emph{displacement convex} and that the minimizers are given by the fixed points of the recursions. Furthermore, we give the condi… ▽ More We introduce a technique for the analysis of general spatially coupled systems that are governed by scalar recursions. Such systems can be expressed in variational form in terms of a potential functional. We show, under mild conditions, that the potential functional is \emph{displacement convex} and that the minimizers are given by the fixed points of the recursions. Furthermore, we give the conditions on the system such that the minimizing fixed point is unique up to translation along the spatial direction. The condition matches those in \cite{KRU12} for the existence of spatial fixed points. \emph{Displacement convexity} applies to a wide range of spatially coupled recursions appearing in coding theory, compressive sensing, random constraint satisfaction problems, as well as statistical mechanical models. We illustrate it with applications to Low-Density Parity-Check and generalized LDPC codes used for transmission on the binary erasure channel, or general binary memoryless symmetric channels within the Gaussian reciprocal channel approximation, as well as compressive sensing. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: 33 pages, 9 figures

arXiv:1701.04318 [pdf, other]

The Velocity of the Propagating Wave for Spatially Coupled Systems with Applications to LDPC Codes

Authors: Rafah El-Khatib, Nicolas Macris

Abstract: We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of density evolution equations that tracks the profile of decoding errors along the spatial direction of coupling. It is known that, for suitable boundary conditions and after a transient phase, the error profile exhibits a "solitonic behavior". Namely, a uniquely-shaped wavelike solution develops,… ▽ More We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of density evolution equations that tracks the profile of decoding errors along the spatial direction of coupling. It is known that, for suitable boundary conditions and after a transient phase, the error profile exhibits a "solitonic behavior". Namely, a uniquely-shaped wavelike solution develops, that propagates with constant velocity. Under this assumption we derive an analytical formula for the velocity in the framework of a continuum limit of the spatially coupled system. The general formalism is developed for spatially coupled low-density parity-check codes on general binary memoryless symmetric channels which form the main system of interest in this work. We apply the formula for special channels and illustrate that it matches the direct numerical evaluation of the velocity for a wide range of noise values. A possible application of the velocity formula to the evaluation of finite size scaling law parameters is also discussed. We conduct a similar analysis for general scalar systems and illustrate the findings with applications to compressive sensing and generalized low-density parity-check codes on the binary erasure or binary symmetric channels. △ Less

Submitted 16 January, 2017; originally announced January 2017.

Comments: 33 pages, 12 figures

arXiv:1701.03767 [pdf, ps, other]

Analysis of Coupled Scalar Systems by Displacement Convexity

Authors: Rafah El-Khatib, Nicolas Macris, Tom Richardson, Rüdiger Urbanke

Abstract: Potential functionals have been introduced recently as an important tool for the analysis of coupled scalar systems (e.g. density evolution equations). In this contribution, we investigate interesting properties of this potential. Using the tool of displacement convexity, we show that, under mild assumptions on the system, the potential functional is displacement convex. Furthermore, we give the c… ▽ More Potential functionals have been introduced recently as an important tool for the analysis of coupled scalar systems (e.g. density evolution equations). In this contribution, we investigate interesting properties of this potential. Using the tool of displacement convexity, we show that, under mild assumptions on the system, the potential functional is displacement convex. Furthermore, we give the conditions on the system such that the potential is strictly displacement convex, in which case the minimizer is unique. △ Less

Submitted 13 January, 2017; originally announced January 2017.

Comments: 5 pages, 1 figure, submitted to the IEEE International Symposium on Information Theory (ISIT) 2014

arXiv:1701.03764 [pdf, other]

The Velocity of the Decoding Wave for Spatially Coupled Codes on BMS Channels

Authors: Rafah El-Khatib, Nicolas Macris

Abstract: We consider the dynamics of belief propagation decoding of spatially coupled Low-Density Parity-Check codes. It has been conjectured that after a short transient phase, the profile of "error probabilities" along the spatial direction of a spatially coupled code develops a uniquely-shaped wave-like solution that propagates with constant velocity v. Under this assumption, and for transmission over g… ▽ More We consider the dynamics of belief propagation decoding of spatially coupled Low-Density Parity-Check codes. It has been conjectured that after a short transient phase, the profile of "error probabilities" along the spatial direction of a spatially coupled code develops a uniquely-shaped wave-like solution that propagates with constant velocity v. Under this assumption, and for transmission over general Binary Memoryless Symmetric channels, we derive a formula for v. We also propose approximations that are simpler to compute and support our findings using numerical data. △ Less

Submitted 13 January, 2017; originally announced January 2017.

Comments: 5 pages, 3 figures, submitted to the IEEE International Symposium on Information Theory (ISIT) 2016

arXiv:1701.03759 [pdf, other]

The Velocity of the Propagating Wave for General Coupled Scalar Systems

Authors: Rafah El-Khatib, Nicolas Macris

Abstract: We consider spatially coupled systems governed by a set of scalar density evolution equations. Such equations track the behavior of message-passing algorithms used, for example, in coding, sparse sensing, or constraint-satisfaction problems. Assuming that the "profile" describing the average state of the algorithm exhibits a solitonic wave-like behavior after initial transient iterations, we deriv… ▽ More We consider spatially coupled systems governed by a set of scalar density evolution equations. Such equations track the behavior of message-passing algorithms used, for example, in coding, sparse sensing, or constraint-satisfaction problems. Assuming that the "profile" describing the average state of the algorithm exhibits a solitonic wave-like behavior after initial transient iterations, we derive a formula for the propagation velocity of the wave. We illustrate the formula with two applications, namely Generalized LDPC codes and compressive sensing. △ Less

Submitted 13 January, 2017; originally announced January 2017.

Comments: 5 pages, 5 figures, submitted to the Information Theory Workshop (ITW) 2016 in Cambridge, UK

arXiv:1607.02335 [pdf, other]

doi 10.1109/ALLERTON.2016.7852290

The Mutual Information in Random Linear Estimation

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala

Abstract: We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considera… ▽ More We consider the estimation of a signal from the knowledge of its noisy linear random Gaussian projections, a problem relevant in compressed sensing, sparse superposition codes or code division multiple access just to cite few. There has been a number of works considering the mutual information for this problem using the heuristic replica method from statistical physics. Here we put these considerations on a firm rigorous basis. First, we show, using a Guerra-type interpolation, that the replica formula yields an upper bound to the exact mutual information. Secondly, for many relevant practical cases, we present a converse lower bound via a method that uses spatial coupling, state evolution analysis and the I-MMSE theorem. This yields, in particular, a single letter formula for the mutual information and the minimal-mean-square error for random Gaussian linear estimation of all discrete bounded signals. △ Less

Submitted 6 September, 2016; v1 submitted 8 July, 2016; originally announced July 2016.

Comments: Presented at the 54th Annual Allerton Conference on Communication, Control, and Computing, 2016

Journal ref: 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Pages: 625 - 632

arXiv:1606.04142 [pdf, other]

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, Lenka Zdeborova

Abstract: Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows… ▽ More Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Journal ref: Advances in Neural Information Processing Systems 29 (NIPS 2016) pp 424-432

arXiv:1603.04591 [pdf, other]

Threshold Saturation of Spatially Coupled Sparse Superposition Codes for All Memoryless Channels

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in… ▽ More We recently proved threshold saturation for spatially coupled sparse superposition codes on the additive white Gaussian noise channel. Here we generalize our analysis to a much broader setting. We show for any memoryless channel that spatial coupling allows generalized approximate message-passing (GAMP) decoding to reach the potential (or Bayes optimal) threshold of the code ensemble. Moreover in the large input alphabet size limit: i) the GAMP algorithmic threshold of the underlying (or uncoupled) code ensemble is simply expressed as a Fisher information; ii) the potential threshold tends to Shannon's capacity. Although we focus on coding for sake of coherence with our previous results, the framework and methods are very general and hold for a wide class of generalized estimation problems with random linear mixing. △ Less

Submitted 15 March, 2016; originally announced March 2016.

Comments: Submitted to the Information Theory Workshop (ITW) 2016, Cambridge, United Kingdom

arXiv:1603.01817 [pdf, other]

Proof of Threshold Saturation for Spatially Coupled Sparse Superposition Codes

Authors: Jean Barbier, Mohamad Dia, Nicolas Macris

Abstract: Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding a… ▽ More Recently, a new class of codes, called sparse superposition or sparse regression codes, has been proposed for communication over the AWGN channel. It has been proven that they achieve capacity using power allocation and various forms of iterative decoding. Empirical evidence has also strongly suggested that the codes achieve capacity when spatial coupling and approximate message passing decoding are used, without need of power allocation. In this note we prove that state evolution (which tracks message passing) indeed saturates the potential threshold of the underlying code ensemble, which approaches in a proper limit the optimal threshold. Our proof uses ideas developed in the theory of low-density parity-check codes and compressive sensing. △ Less

Submitted 6 March, 2016; originally announced March 2016.

Comments: Submitted to the International Symposium on Information Theory (ISIT) 2016, Barcelona, Spain

arXiv:1310.1294 [pdf, other]

The Bethe Free Energy Allows to Compute the Conditional Entropy of Graphical Code Instances. A Proof from the Polymer Expansion

Authors: Nicolas Macris, Marc Vuffray

Abstract: The main objective of this paper is to explore the precise relationship between the Bethe free energy (or entropy) and the Shannon conditional entropy of graphical error correcting codes. The main result shows that the Bethe free energy associated with a low-density parity-check code used over a binary symmetric channel in a large noise regime is, with high probability, asymptotically exact as the… ▽ More The main objective of this paper is to explore the precise relationship between the Bethe free energy (or entropy) and the Shannon conditional entropy of graphical error correcting codes. The main result shows that the Bethe free energy associated with a low-density parity-check code used over a binary symmetric channel in a large noise regime is, with high probability, asymptotically exact as the block length grows. To arrive at this result we develop new techniques for rather general graphical models based on the loop sum as a starting point and the polymer expansion from statistical mechanics. The true free energy is computed as a series expansion containing the Bethe free energy as its zero-th order term plus a series of corrections. It is easily seen that convergence criteria for such expansions are satisfied for general high-temperature models. We apply these general results to ensembles of low-density generator-matrix and parity-check codes. While the application to generator-matrix codes follows standard "high temperature" methods, the case of parity-check codes requires non-trivial new ideas because the hard constraints correspond to a zero-temperature regime. Nevertheless one can combine the polymer expansion with expander and counting arguments to show that the difference between the true and Bethe free energies vanishes with high probability in the large block △ Less

Submitted 11 June, 2015; v1 submitted 4 October, 2013; originally announced October 2013.

arXiv:1309.7543 [pdf, ps, other]

doi 10.1109/TIT.2014.2360692

Threshold Saturation for Spatially-Coupled LDPC and LDGM Codes on BMS Channels

Authors: Santhosh Kumar, Andrew J. Young, Nicolas Macris, Henry D. Pfister

Abstract: Spatially-coupled low-density parity-check (LDPC) codes, which were first introduced as LDPC convolutional codes, have been shown to exhibit excellent performance under low-complexity belief-propagation decoding. This phenomenon is now termed threshold saturation via spatial coupling. Spatially-coupled codes have been successfully applied in numerous areas. In particular, it was proven that spatia… ▽ More Spatially-coupled low-density parity-check (LDPC) codes, which were first introduced as LDPC convolutional codes, have been shown to exhibit excellent performance under low-complexity belief-propagation decoding. This phenomenon is now termed threshold saturation via spatial coupling. Spatially-coupled codes have been successfully applied in numerous areas. In particular, it was proven that spatially-coupled regular LDPC codes universally achieve capacity over the class of binary memoryless symmetric (BMS) channels under belief-propagation decoding. Recently, potential functions have been used to simplify threshold saturation proofs for scalar and vector recursions. In this paper, potential functions are used to prove threshold saturation for irregular LDPC and low-density generator-matrix (LDGM) codes on BMS channels, extending the simplified proof technique to BMS channels. The corresponding potential functions are closely related to the average Bethe free entropy of the ensembles in the large-system limit. These functions also appear in statistical physics when the replica method is used to analyze optimal decoding. △ Less

Submitted 11 October, 2014; v1 submitted 29 September, 2013; originally announced September 2013.

Comments: (v1) This article supersedes arXiv:1301.6111 (v2) Accepted to the IEEE Transactions on Information Theory

Journal ref: IEEE Transactions on Information Theory, Vol. 60, No. 12, pp. 7389-7415, Dec. 2014

arXiv:1307.5210 [pdf, other]

Approaching the Rate-Distortion Limit with Spatial Coupling, Belief propagation and Decimation

Authors: Vahid Aref, Nicolas Macris, Marc Vuffray

Abstract: We investigate an encoding scheme for lossy compression of a binary symmetric source based on simple spatially coupled Low-Density Generator-Matrix codes. The degree of the check nodes is regular and the one of code-bits is Poisson distributed with an average depending on the compression rate. The performance of a low complexity Belief Propagation Guided Decimation algorithm is excellent. The algo… ▽ More We investigate an encoding scheme for lossy compression of a binary symmetric source based on simple spatially coupled Low-Density Generator-Matrix codes. The degree of the check nodes is regular and the one of code-bits is Poisson distributed with an average depending on the compression rate. The performance of a low complexity Belief Propagation Guided Decimation algorithm is excellent. The algorithmic rate-distortion curve approaches the optimal curve of the ensemble as the width of the coupling window grows. Moreover, as the check degree grows both curves approach the ultimate Shannon rate-distortion limit. The Belief Propagation Guided Decimation encoder is based on the posterior measure of a binary symmetric test-channel. This measure can be interpreted as a random Gibbs measure at a "temperature" directly related to the "noise level of the test-channel". We investigate the links between the algorithmic performance of the Belief Propagation Guided Decimation encoder and the phase diagram of this Gibbs measure. The phase diagram is investigated thanks to the cavity method of spin glass theory which predicts a number of phase transition thresholds. In particular the dynamical and condensation "phase transition temperatures" (equivalently test-channel noise thresholds) are computed. We observe that: (i) the dynamical temperature of the spatially coupled construction saturates towards the condensation temperature; (ii) for large degrees the condensation temperature approaches the temperature (i.e. noise level) related to the information theoretic Shannon test-channel noise parameter of rate-distortion theory. This provides heuristic insight into the excellent performance of the Belief Propagation Guided Decimation algorithm. The paper contains an introduction to the cavity method. △ Less

Submitted 11 June, 2015; v1 submitted 19 July, 2013; originally announced July 2013.

arXiv:1304.6026 [pdf, ps, other]

Displacement Convexity, A Useful Framework for the Study of Spatially Coupled Codes

Authors: Rafah El-Khatib, Nicolas Macris, Ruediger Urbanke

Abstract: Spatial coupling has recently emerged as a powerful paradigm to construct graphical models that work well under low-complexity message-passing algorithms. Although much progress has been made on the analysis of spatially coupled models under message passing, there is still room for improvement, both in terms of simplifying existing proofs as well as in terms of proving additional properties. We… ▽ More Spatial coupling has recently emerged as a powerful paradigm to construct graphical models that work well under low-complexity message-passing algorithms. Although much progress has been made on the analysis of spatially coupled models under message passing, there is still room for improvement, both in terms of simplifying existing proofs as well as in terms of proving additional properties. We introduce one further tool for the analysis, namely the concept of displacement convexity. This concept plays a crucial role in the theory of optimal transport and, quite remarkably, it is also well suited for the analysis of spatially coupled systems. In cases where the concept applies, displacement convexity allows functionals of distributions which are not convex in the usual sense to be represented in an alternative form, so that they are convex with respect to the new parametrization. As a proof of concept we consider spatially coupled $(l,r)$-regular Gallager ensembles when transmission takes place over the binary erasure channel. We show that the potential function of the coupled system is displacement convex. Due to possible translational degrees of freedom convexity by itself falls short of establishing the uniqueness of the minimizing profile. For the spatially coupled $(l,r)$-regular system strict displacement convexity holds when a global translation degree of freedom is removed. Implications for the uniqueness of the minimizer and for solutions of the density evolution equation are discussed. △ Less

Submitted 28 September, 2013; v1 submitted 22 April, 2013; originally announced April 2013.

Comments: Extension of paper submitted to ITW 2013

arXiv:1303.0540 [pdf, ps, other]

The Space of Solutions of Coupled XORSAT Formulae

Authors: S. Hamed Hassani, Nicolas Macris, Rudiger Urbanke

Abstract: The XOR-satisfiability (XORSAT) problem deals with a system of $n$ Boolean variables and $m$ clauses. Each clause is a linear Boolean equation (XOR) of a subset of the variables. A $K$-clause is a clause involving $K$ distinct variables. In the random $K$-XORSAT problem a formula is created by choosing $m$ $K$-clauses uniformly at random from the set of all possible clauses on $n$ variables. The s… ▽ More The XOR-satisfiability (XORSAT) problem deals with a system of $n$ Boolean variables and $m$ clauses. Each clause is a linear Boolean equation (XOR) of a subset of the variables. A $K$-clause is a clause involving $K$ distinct variables. In the random $K$-XORSAT problem a formula is created by choosing $m$ $K$-clauses uniformly at random from the set of all possible clauses on $n$ variables. The set of solutions of a random formula exhibits various geometrical transitions as the ratio $\frac{m}{n}$ varies. We consider a {\em coupled} $K$-XORSAT ensemble, consisting of a chain of random XORSAT models that are spatially coupled across a finite window along the chain direction. We observe that the threshold saturation phenomenon takes place for this ensemble and we characterize various properties of the space of solutions of such coupled formulae. △ Less

Submitted 3 March, 2013; originally announced March 2013.

Comments: Submitted to ISIT 2013

Showing 1–50 of 77 results for author: Macris, N