-
Mean-field Potts and random-cluster dynamics from high-entropy initializations
Authors:
Antonio Blanca,
Reza Gheissari,
Xusheng Zhang
Abstract:
A common obstruction to efficient sampling from high-dimensional distributions is the multimodality of the target distribution because Markov chains may get trapped far from stationarity. Still, one hopes that this is only a barrier to the mixing of Markov chains from worst-case initializations and can be overcome by choosing high-entropy initializations, e.g., a product or weakly correlated distr…
▽ More
A common obstruction to efficient sampling from high-dimensional distributions is the multimodality of the target distribution because Markov chains may get trapped far from stationarity. Still, one hopes that this is only a barrier to the mixing of Markov chains from worst-case initializations and can be overcome by choosing high-entropy initializations, e.g., a product or weakly correlated distribution. Ideally, from such initializations, the dynamics would escape from the saddle points separating modes quickly and spread its mass between the dominant modes.
In this paper, we study convergence from high-entropy initializations for the random-cluster and Potts models on the complete graph -- two extensively studied high-dimensional landscapes that pose many complexities like discontinuous phase transitions and asymmetric metastable modes. We study the Chayes--Machta and Swendsen--Wang dynamics for the mean-field random-cluster model and the Glauber dynamics for the Potts model. We sharply characterize the set of product measure initializations from which these Markov chains mix rapidly, even though their mixing times from worst-case initializations are exponentially slow. Our proofs require careful approximations of projections of high-dimensional Markov chains (which are not themselves Markovian) by tractable 1-dimensional random processes, followed by analysis of the latter's escape from saddle points separating stable modes.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Finding planted cliques using Markov chain Monte Carlo
Authors:
Reza Gheissari,
Aukosh Jagannath,
Yiming Xu
Abstract:
The planted clique problem is a paradigmatic model of statistical-to-computational gaps: the planted clique is information-theoretically detectable if its size $k\ge 2\log_2 n$ but polynomial-time algorithms only exist for the recovery task when $k= Ω(\sqrt{n})$. By now, there are many simple and fast algorithms that succeed as soon as $k = Ω(\sqrt{n})$. Glaringly, however, no MCMC approach to the…
▽ More
The planted clique problem is a paradigmatic model of statistical-to-computational gaps: the planted clique is information-theoretically detectable if its size $k\ge 2\log_2 n$ but polynomial-time algorithms only exist for the recovery task when $k= Ω(\sqrt{n})$. By now, there are many simple and fast algorithms that succeed as soon as $k = Ω(\sqrt{n})$. Glaringly, however, no MCMC approach to the problem had been shown to work, including the Metropolis process on cliques studied by Jerrum since 1992. In fact, Chen, Mossel, and Zadik recently showed that any Metropolis process whose state space is the set of cliques fails to find any sub-linear sized planted clique in polynomial time if initialized naturally from the empty set. Here, we redeem MCMC performance for the planted clique problem by relaxing the state space to all vertex subsets and adding a corresponding energy penalty for missing edges. With that, we prove that energy-minimizing Markov chains (gradient descent and a low-temperature relaxation of it) succeed at recovering planted cliques of size $k = Ω(\sqrt{n})$ if initialized from the full graph. Importantly, initialized from the empty set, the relaxation still does not help the gradient descent find sub-linear planted cliques. We also demonstrate robustness of these Markov chain approaches under a natural contamination model.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Fast relaxation of the random field Ising dynamics
Authors:
Ahmed El Alaoui,
Ronen Eldan,
Reza Gheissari,
Arianna Piana
Abstract:
We study the convergence properties of Glauber dynamics for the random field Ising model (RFIM) with ferromagnetic interactions on finite domains of $\mathbb{Z}^d$, $d \ge 2$. Of particular interest is the Griffiths phase where correlations decay exponentially fast in expectation over the quenched disorder, but there exist arbitrarily large islands of weak fields where low-temperature behavior is…
▽ More
We study the convergence properties of Glauber dynamics for the random field Ising model (RFIM) with ferromagnetic interactions on finite domains of $\mathbb{Z}^d$, $d \ge 2$. Of particular interest is the Griffiths phase where correlations decay exponentially fast in expectation over the quenched disorder, but there exist arbitrarily large islands of weak fields where low-temperature behavior is observed. Our results are twofold:
1. Under weak spatial mixing (boundary-to-bulk exponential decay of correlations) in expectation, we show that the dynamics satisfy a weak Poincaré inequality -- equivalent to large-set expansion -- implying algebraic relaxation to equilibrium over timescales polynomial in the volume $N$ of the domain, and polynomial time mixing from a warm start. From this we construct a polynomial-time approximate sampling algorithm based on running Glauber dynamics over an increasing sequence of approximations of the domain.
2. Under strong spatial mixing (exponential decay of correlations even near boundary pinnings) in expectation, we prove a full Poincaré inequality, implying exponential relaxation to equilibrium and $N^{o(1)}$-mixing time. Note by way of example, both weak and strong spatial mixing hold at any temperature, provided the external fields are strong enough.
Our proofs combine a stochastic localization technique which has the effect of increasing the variance of the field, with a field-dependent coarse graining which controls the resulting sub-critical percolation process of sites with weak fields.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
High-dimensional SGD aligns with emerging outlier eigenspaces
Authors:
Gerard Ben Arous,
Reza Gheissari,
Jiaoyang Huang,
Aukosh Jagannath
Abstract:
We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient m…
▽ More
We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
On the tractability of sampling from the Potts model at low temperatures via Swendsen--Wang dynamics
Authors:
Antonio Blanca,
Reza Gheissari
Abstract:
Sampling from the $q$-state ferromagnetic Potts model is a fundamental question in statistical physics, probability theory, and theoretical computer science. On general graphs, this problem is computationally hard, and this hardness holds at arbitrarily low temperatures. At the same time, in recent years, there has been significant progress showing the existence of low-temperature sampling algorit…
▽ More
Sampling from the $q$-state ferromagnetic Potts model is a fundamental question in statistical physics, probability theory, and theoretical computer science. On general graphs, this problem is computationally hard, and this hardness holds at arbitrarily low temperatures. At the same time, in recent years, there has been significant progress showing the existence of low-temperature sampling algorithms in various specific families of graphs. Our aim in this paper is to understand the minimal structural properties of general graphs that enable polynomial-time sampling from the $q$-state ferromagnetic Potts model at low temperatures. We study this problem from the perspective of the widely-used Swendsen--Wang dynamics and the closely related random-cluster dynamics.
Our results demonstrate that the key graph property behind fast or slow convergence time for these dynamics is whether the independent edge-percolation on the graph admits a strongly supercritical phase. By this, we mean that at large $p<1$, it has a unique giant component of linear size, and the complement of that giant component is comprised of only small components. Specifically, we prove that such a condition implies fast mixing of the Swendsen--Wang and random-cluster dynamics on two general families of bounded-degree graphs: (a) graphs of at most stretched-exponential volume growth and (b) locally treelike graphs. In the other direction, we show that, even among graphs in those families, these Markov chains can converge exponentially slowly at arbitrarily low temperatures if the edge-percolation condition does not hold. In the process, we develop new tools for the analysis of non-local Markov chains, including a framework to bound the speed of disagreement propagation in the presence of long-range correlations, and an understanding of spatial mixing properties on trees with random boundary conditions.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
Authors:
Gerard Ben Arous,
Reza Gheissari,
Aukosh Jagannath
Abstract:
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ball…
▽ More
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.
△ Less
Submitted 17 August, 2023; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Sampling from Potts on random graphs of unbounded degree via random-cluster dynamics
Authors:
Antonio Blanca,
Reza Gheissari
Abstract:
We consider the problem of sampling from the ferromagnetic Potts and random-cluster models on a general family of random graphs via the Glauber dynamics for the random-cluster model. The random-cluster model is parametrized by an edge probability $p \in (0,1)$ and a cluster weight $q > 0$. We establish that for every $q\ge 1$, the random-cluster Glauber dynamics mixes in optimal $Θ(n\log n)$ steps…
▽ More
We consider the problem of sampling from the ferromagnetic Potts and random-cluster models on a general family of random graphs via the Glauber dynamics for the random-cluster model. The random-cluster model is parametrized by an edge probability $p \in (0,1)$ and a cluster weight $q > 0$. We establish that for every $q\ge 1$, the random-cluster Glauber dynamics mixes in optimal $Θ(n\log n)$ steps on $n$-vertex random graphs having a prescribed degree sequence with bounded average branching $γ$ throughout the full high-temperature uniqueness regime $p<p_u(q,γ)$.
The family of random graph models we consider includes the Erdős--Rényi random graph $G(n,γ/n)$, and so we provide the first polynomial-time sampling algorithm for the ferromagnetic Potts model on Erdős--Rényi random graphs for the full tree uniqueness regime. We accompany our results with mixing time lower bounds (exponential in the largest degree) for the Potts Glauber dynamics, in the same settings where our $Θ(n \log n)$ bounds for the random-cluster Glauber dynamics apply. This reveals a novel and significant computational advantage of random-cluster based algorithms for sampling from the Potts model at high temperatures.
△ Less
Submitted 24 February, 2023; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Random-cluster dynamics on random regular graphs in tree uniqueness
Authors:
Antonio Blanca,
Reza Gheissari
Abstract:
We establish rapid mixing of the random-cluster Glauber dynamics on random $Δ$-regular graphs for all $q\ge 1$ and $p<p_u(q,Δ)$, where the threshold $p_u(q,Δ)$ corresponds to a uniqueness/non-uniqueness phase transition for the random-cluster model on the (infinite) $Δ$-regular tree. It is expected that this threshold is sharp, and for $q>2$ the Glauber dynamics on random $Δ$-regular graphs underg…
▽ More
We establish rapid mixing of the random-cluster Glauber dynamics on random $Δ$-regular graphs for all $q\ge 1$ and $p<p_u(q,Δ)$, where the threshold $p_u(q,Δ)$ corresponds to a uniqueness/non-uniqueness phase transition for the random-cluster model on the (infinite) $Δ$-regular tree. It is expected that this threshold is sharp, and for $q>2$ the Glauber dynamics on random $Δ$-regular graphs undergoes an exponential slowdown at $p_u(q,Δ)$.
More precisely, we show that for every $q\ge 1$, $Δ\ge 3$, and $p<p_u(q,Δ)$, with probability $1-o(1)$ over the choice of a random $Δ$-regular graph on $n$ vertices, the Glauber dynamics for the random-cluster model has $Θ(n \log n)$ mixing time. As a corollary, we deduce fast mixing of the Swendsen--Wang dynamics for the Potts model on random $Δ$-regular graphs for every $q\ge 2$, in the tree uniqueness region. Our proof relies on a sharp bound on the "shattering time", i.e., the number of steps required to break up any configuration into $O(\log n)$ sized clusters. This is established by analyzing a delicate and novel iterative scheme to simultaneously reveal the underlying random graph with clusters of the Glauber dynamics configuration on it, at a given time.
△ Less
Submitted 10 April, 2021; v1 submitted 5 August, 2020;
originally announced August 2020.
-
Online stochastic gradient descent on non-convex losses from high-dimensional inference
Authors:
Gerard Ben Arous,
Reza Gheissari,
Aukosh Jagannath
Abstract:
Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random…
▽ More
Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional.
We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior.
We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.
△ Less
Submitted 10 May, 2021; v1 submitted 23 March, 2020;
originally announced March 2020.