-
Weak recovery, hypothesis testing, and mutual information in stochastic block models and planted factor graphs
Authors:
Elchanan Mossel,
Allan Sly,
Youngtak Sohn
Abstract:
The stochastic block model is a canonical model of communities in random graphs. It was introduced in the social sciences and statistics as a model of communities, and in theoretical computer science as an average case model for graph partitioning problems under the name of the ``planted partition model.'' Given a sparse stochastic block model, the two standard inference tasks are: (i) Weak recove…
▽ More
The stochastic block model is a canonical model of communities in random graphs. It was introduced in the social sciences and statistics as a model of communities, and in theoretical computer science as an average case model for graph partitioning problems under the name of the ``planted partition model.'' Given a sparse stochastic block model, the two standard inference tasks are: (i) Weak recovery: can we estimate the communities with non trivial overlap with the true communities? (ii) Detection/Hypothesis testing: can we distinguish if the sample was drawn from the block model or from a random graph with no community structure with probability tending to $1$ as the graph size tends to infinity?
In this work, we show that for sparse stochastic block models, the two inference tasks are equivalent except at a critical point. That is, weak recovery is information theoretically possible if and only if detection is possible. We thus find a strong connection between these two notions of inference for the model. We further prove that when detection is impossible, an explicit hypothesis test based on low degree polynomials in the adjacency matrix of the observed graph achieves the optimal statistical power. This low degree test is efficient as opposed to the likelihood ratio test, which is not known to be efficient. Moreover, we prove that the asymptotic mutual information between the observed network and the community structure exhibits a phase transition at the weak recovery threshold.
Our results are proven in much broader settings including the hypergraph stochastic block models and general planted factor graphs. In these settings we prove that the impossibility of weak recovery implies contiguity and provide a condition which guarantees the equivalence of weak recovery and detection.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Monotonicity of Recurrence in Random Walks
Authors:
Rupert Li,
Elchanan Mossel,
Benjamin Weiss
Abstract:
We consider non-homogeneous random walks on the positive quadrant in two dimensions. In the 1960's the following question was asked: is it true if such a random walk $X$ is recurrent and $Y$ is another random walk that at every point is more likely to go down and more likely to go left than $Y$, then $Y$ is also recurrent?
We provide an example showing that the answer is negative. We also show t…
▽ More
We consider non-homogeneous random walks on the positive quadrant in two dimensions. In the 1960's the following question was asked: is it true if such a random walk $X$ is recurrent and $Y$ is another random walk that at every point is more likely to go down and more likely to go left than $Y$, then $Y$ is also recurrent?
We provide an example showing that the answer is negative. We also show that if either the random walk $X$ or $Y$ is sufficiently homogeneous then the answer is in fact positive.
△ Less
Submitted 10 April, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Finding Super-spreaders in Network Cascades
Authors:
Elchanan Mossel,
Anirudh Sridhar
Abstract:
Suppose that a cascade (e.g., an epidemic) spreads on an unknown graph, and only the infection times of vertices are observed. What can be learned about the graph from the infection times caused by multiple distinct cascades? Most of the literature on this topic focuses on the task of recovering the entire graph, which requires $Ω( \log n)$ cascades for an $n$-vertex bounded degree graph. Here we…
▽ More
Suppose that a cascade (e.g., an epidemic) spreads on an unknown graph, and only the infection times of vertices are observed. What can be learned about the graph from the infection times caused by multiple distinct cascades? Most of the literature on this topic focuses on the task of recovering the entire graph, which requires $Ω( \log n)$ cascades for an $n$-vertex bounded degree graph. Here we ask a different question: can the important parts of the graph be estimated from just a few (i.e., constant number) of cascades, even as $n$ grows large?
In this work, we focus on identifying super-spreaders (i.e., high-degree vertices) from infection times caused by a Susceptible-Infected process on a graph. Our first main result shows that vertices of degree greater than $n^{3/4}$ can indeed be estimated from a constant number of cascades. Our algorithm for doing so leverages a novel connection between vertex degrees and the second derivative of the cumulative infection curve. Conversely, we show that estimating vertices of degree smaller than $n^{1/2}$ requires at least $\log(n) / \log \log (n)$ cascades. Surprisingly, this matches (up to $\log \log n$ factors) the number of cascades needed to learn the \emph{entire} graph if it is a tree.
△ Less
Submitted 3 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Sample-Efficient Linear Regression with Self-Selection Bias
Authors:
Jason Gaitonde,
Elchanan Mossel
Abstract:
We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}_{\ell},z_{\ell})_{\ell=1}^m$ where $z_{\ell}=\max_{i\in [k]}\{\mathbf{x}_{\ell}^T\mathbf{w}_i+η_{i,\ell}\}$, but the maximizing index $i_{\ell}$…
▽ More
We consider the problem of linear regression with self-selection bias in the unknown-index setting, as introduced in recent work by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [STOC 2023]. In this model, one observes $m$ i.i.d. samples $(\mathbf{x}_{\ell},z_{\ell})_{\ell=1}^m$ where $z_{\ell}=\max_{i\in [k]}\{\mathbf{x}_{\ell}^T\mathbf{w}_i+η_{i,\ell}\}$, but the maximizing index $i_{\ell}$ is unobserved. Here, the $\mathbf{x}_{\ell}$ are assumed to be $\mathcal{N}(0,I_n)$ and the noise distribution $\mathbfη_{\ell}\sim \mathcal{D}$ is centered and independent of $\mathbf{x}_{\ell}$. We provide a novel and near optimally sample-efficient (in terms of $k$) algorithm to recover $\mathbf{w}_1,\ldots,\mathbf{w}_k\in \mathbb{R}^n$ up to additive $\ell_2$-error $\varepsilon$ with polynomial sample complexity $\tilde{O}(n)\cdot \mathsf{poly}(k,1/\varepsilon)$ and significantly improved time complexity $\mathsf{poly}(n,k,1/\varepsilon)+O(\log(k)/\varepsilon)^{O(k)}$. When $k=O(1)$, our algorithm runs in $\mathsf{poly}(n,1/\varepsilon)$ time, generalizing the polynomial guarantee of an explicit moment matching algorithm of Cherapanamjeri, et al. for $k=2$ and when it is known that $\mathcal{D}=\mathcal{N}(0,I_k)$. Our algorithm succeeds under significantly relaxed noise assumptions, and therefore also succeeds in the related setting of max-linear regression where the added noise is taken outside the maximum. For this problem, our algorithm is efficient in a much larger range of $k$ than the state-of-the-art due to Ghosh, Pananjady, Guntuboyina, and Ramchandran [IEEE Trans. Inf. Theory 2022] for not too small $\varepsilon$, and leads to improved algorithms for any $\varepsilon$ by providing a warm start for existing local convergence methods.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Low Degree Hardness for Broadcasting on Trees
Authors:
Han Huang,
Elchanan Mossel
Abstract:
We study the low-degree hardness of broadcasting on trees. Broadcasting on trees has been extensively studied in statistical physics, in computational biology in relation to phylogenetic reconstruction and in statistics and computer science in the context of block model inference, and as a simple data model for algorithms that may require depth for inference.
The inference of the root can be car…
▽ More
We study the low-degree hardness of broadcasting on trees. Broadcasting on trees has been extensively studied in statistical physics, in computational biology in relation to phylogenetic reconstruction and in statistics and computer science in the context of block model inference, and as a simple data model for algorithms that may require depth for inference.
The inference of the root can be carried by celebrated Belief Propagation (BP) algorithm which achieves Bayes-optimal performance. Despite the fact that this algorithm runs in linear time (using real operations), recent works indicated that this algorithm in fact requires high level of complexity. Moitra, Mossel and Sandon constructed a chain for which estimating the root better than random (for a typical input) is $NC1$ complete. Kohler and Mossel constructed chains such that for trees with $N$ leaves, recovering the root better than random requires a polynomial of degree $N^{Ω(1)}$. Both works above asked if such complexity bounds hold in general below the celebrated {\em Kesten-Stigum} bound.
In this work, we prove that this is indeed the case for low degree polynomials. We show that for the broadcast problem using any Markov chain on trees with $n$ leaves, below the Kesten Stigum bound, any $O(\log n)$ degree polynomial has vanishing correlation with the root.
Our result is one of the first low-degree lower bound that is proved in a setting that is not based or easily reduced to a product measure.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Gaussian Broadcast on Grids
Authors:
Pakawut Jiradilok,
Elchanan Mossel
Abstract:
Motivated by the classical work on finite noisy automata (Gray 1982, Gács 2001, Gray 2001) and by the recent work on broadcasting on grids (Makur, Mossel, and Polyanskiy 2022), we introduce Gaussian variants of these models. These models are defined on graded posets. At time $0$, all nodes begin with $X_0$. At time $k\ge 1$, each node on layer $k$ computes a combination of its inputs at layer…
▽ More
Motivated by the classical work on finite noisy automata (Gray 1982, Gács 2001, Gray 2001) and by the recent work on broadcasting on grids (Makur, Mossel, and Polyanskiy 2022), we introduce Gaussian variants of these models. These models are defined on graded posets. At time $0$, all nodes begin with $X_0$. At time $k\ge 1$, each node on layer $k$ computes a combination of its inputs at layer $k-1$ with independent Gaussian noise added. When is it possible to recover $X_0$ with non-vanishing correlation? We consider different notions of recovery including recovery from a single node, recovery from a bounded window, and recovery from an unbounded window.
Our main interest is in two models defined on grids:
In the infinite model, layer $k$ is the vertices of $\mathbb{Z}^{d+1}$ whose sum of entries is $k$ and for a vertex $v$ at layer $k \ge 1$, $X_v=α\sum (X_u + W_{u,v})$, summed over all $u$ on layer $k-1$ that differ from $v$ exactly in one coordinate, and $W_{u,v}$ are i.i.d. $\mathcal{N}(0,1)$. We show that when $α<1/(d+1)$, the correlation between $X_v$ and $X_0$ decays exponentially, and when $α>1/(d+1)$, the correlation is bounded away from $0$. The critical case when $α=1/(d+1)$ exhibits a phase transition in dimension, where $X_v$ has non-vanishing correlation with $X_0$ if and only if $d\ge 3$. The same results hold for any bounded window.
In the finite model, layer $k$ is the vertices of $\mathbb{Z}^{d+1}$ with nonnegative entries with sum $k$. We identify the sub-critical and the super-critical regimes. In the sub-critical regime, the correlation decays to $0$ for unbounded windows. In the super-critical regime, there exists for every $t$ a convex combination of $X_u$ on layer $t$ whose correlation is bounded away from $0$. We find that for the critical parameters, the correlation is vanishing in all dimensions and for unbounded window sizes.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Reconstructing the Geometry of Random Geometric Graphs
Authors:
Han Huang,
Pakawut Jiradilok,
Elchanan Mossel
Abstract:
Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance, independently among pairs. In this work, we show how to efficiently reconstruct the geometry of the underlying space from the sampled graph under the manifold a…
▽ More
Random geometric graphs are random graph models defined on metric spaces. Such a model is defined by first sampling points from a metric space and then connecting each pair of sampled points with probability that depends on their distance, independently among pairs. In this work, we show how to efficiently reconstruct the geometry of the underlying space from the sampled graph under the manifold assumption, i.e., assuming that the underlying space is a low dimensional manifold and that the connection probability is a strictly decreasing function of the Euclidean distance between the points in a given embedding of the manifold in $\mathbb{R}^N$. Our work complements a large body of work on manifold learning, where the goal is to recover a manifold from sampled points sampled in the manifold along with their (approximate) distances.
△ Less
Submitted 10 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Stable matchings with correlated Preferences
Authors:
Christopher Hoffman,
Avi Levy,
Elchanan Mossel
Abstract:
The stable matching problem has been the subject of intense theoretical and empirical study since the seminal 1962 paper by Gale and Shapley. The number of stable matchings for different systems of preferences has been studied in many contexts, going back to Donald Knuth in the 1970s. In this paper, we consider a family of distributions defined by the Mallows permutations and show that with high p…
▽ More
The stable matching problem has been the subject of intense theoretical and empirical study since the seminal 1962 paper by Gale and Shapley. The number of stable matchings for different systems of preferences has been studied in many contexts, going back to Donald Knuth in the 1970s. In this paper, we consider a family of distributions defined by the Mallows permutations and show that with high probability the number of stable matchings for these preferences is exponential in the number of people.
△ Less
Submitted 28 December, 2023; v1 submitted 22 December, 2023;
originally announced December 2023.
-
Sharp Thresholds Imply Circuit Lower Bounds: from random 2-SAT to Planted Clique
Authors:
David Gamarnik,
Elchanan Mossel,
Ilias Zadik
Abstract:
We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size.
Our general result implies new average-case bounded depth circuit lower bounds in a variety of set…
▽ More
We show that sharp thresholds for Boolean functions directly imply average-case circuit lower bounds. More formally we show that any Boolean function exhibiting a sharp enough threshold at \emph{arbitrary} critical density cannot be computed by Boolean circuits of bounded depth and polynomial size.
Our general result implies new average-case bounded depth circuit lower bounds in a variety of settings.
(a) ($k$-cliques) For $k=Θ(n)$, we prove that any circuit of depth $d$ deciding the presence of a size $k$ clique in a random graph requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first average-case exponential size lower bound for bounded depth (not necessarily monotone) circuits solving the fundamental $k$-clique problem (for any $k=k_n$).
(b)(random 2-SAT) We prove that any circuit of depth $d$ deciding the satisfiability of a random 2-SAT formula requires exponential-in-$n^{Θ(1/d)}$ size. To the best of our knowledge, this is the first bounded depth circuit lower bound for random $k$-SAT for any value of $k \geq 2.$ Our results also provide the first rigorous lower bound in agreement with a conjectured, but debated, ``computational hardness'' of random $k$-SAT around its satisfiability threshold.
(c)(Statistical estimation -- planted $k$-clique) Over the recent years, multiple statistical estimation problems have also been proven to exhibit a ``statistical'' sharp threshold, called the All-or-Nothing (AoN) phenomenon. We show that AoN also implies circuit lower bounds for statistical problems. As a simple corollary of that, we prove that any circuit of depth $d$ that solves to information-theoretic optimality a ``dense'' variant of the celebrated planted $k$-clique problem requires exponential-in-$n^{Θ(1/d)}$ size.
△ Less
Submitted 30 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Errors are Robustly Tamed in Cumulative Knowledge Processes
Authors:
Anna Brandenberger,
Cassandra Marcussen,
Elchanan Mossel,
Madhu Sudan
Abstract:
We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of…
▽ More
We study processes of societal knowledge accumulation, where the validity of a new unit of knowledge depends both on the correctness of its derivation and on the validity of the units it depends on. A fundamental question in this setting is: If a constant fraction of the new derivations is wrong, can investing a constant fraction, bounded away from one, of effort ensure that a constant fraction of knowledge in society is valid? Ben-Eliezer, Mikulincer, Mossel, and Sudan (ITCS 2023) introduced a concrete probabilistic model to analyze such questions and showed an affirmative answer to this question. Their study, however, focuses on the simple case where each new unit depends on just one existing unit, and units attach according to a $\textit{preferential attachment rule}$.
In this work, we consider much more general families of cumulative knowledge processes, where new units may attach according to varied attachment mechanisms and depend on multiple existing units. We also allow a (random) fraction of insertions of adversarial nodes.
We give a robust affirmative answer to the above question by showing that for $\textit{all}$ of these models, as long as many of the units follow simple heuristics for checking a bounded number of units they depend on, all errors will be eventually eliminated. Our results indicate that preserving the quality of large interdependent collections of units of knowledge is feasible, as long as careful but not too costly checks are performed when new units are derived/deposited.
△ Less
Submitted 11 June, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Influence Maximization in Ising Models
Authors:
Zongchen Chen,
Elchanan Mossel
Abstract:
Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a…
▽ More
Given a complex high-dimensional distribution over $\{\pm 1\}^n$, what is the best way to increase the expected number of $+1$'s by controlling the values of only a small number of variables? Such a problem is known as influence maximization and has been widely studied in social networks, biology, and computer science. In this paper, we consider influence maximization on the Ising model which is a prototypical example of undirected graphical models and has wide applications in many real-world problems. We establish a sharp computational phase transition for influence maximization on sparse Ising models under a bounded budget: In the high-temperature regime, we give a linear-time algorithm for finding a small subset of variables and their values which achieve nearly optimal influence; In the low-temperature regime, we show that the influence maximization problem cannot be solved in polynomial time under commonly-believed complexity assumption. The critical temperature coincides with the tree uniqueness/non-uniqueness threshold for Ising models which is also a critical point for other computational problems including approximate sampling and counting.
△ Less
Submitted 3 January, 2024; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Influences in Mixing Measures
Authors:
Frederic Koehler,
Noam Lifshitz,
Dor Minzer,
Elchanan Mossel
Abstract:
The theory of influences in product measures has profound applications in theoretical computer science, combinatorics, and discrete probability. This deep theory is intimately connected to functional inequalities and to the Fourier analysis of discrete groups. Originally, influences of functions were motivated by the study of social choice theory, wherein a Boolean function represents a voting sch…
▽ More
The theory of influences in product measures has profound applications in theoretical computer science, combinatorics, and discrete probability. This deep theory is intimately connected to functional inequalities and to the Fourier analysis of discrete groups. Originally, influences of functions were motivated by the study of social choice theory, wherein a Boolean function represents a voting scheme, its inputs represent the votes, and its output represents the outcome of the elections. Thus, product measures represent a scenario in which the votes of the parties are randomly and independently distributed, which is often far from the truth in real-life scenarios.
We begin to develop the theory of influences for more general measures under mixing or correlation decay conditions. More specifically, we prove analogues of the KKL and Talagrand influence theorems for Markov Random Fields on bounded degree graphs with correlation decay. We show how some of the original applications of the theory of in terms of voting and coalitions extend to general measures with correlation decay. Our results thus shed light both on voting with correlated voters and on the behavior of general functions of Markov Random Fields (also called ``spin-systems") with correlation decay.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Sharp thresholds in inference of planted subgraphs
Authors:
Elchanan Mossel,
Jonathan Niles-Weed,
Youngtak Sohn,
Nike Sun,
Ilias Zadik
Abstract:
A major question in the study of the Erdős--Rényi random graph is to understand the probability that it contains a given subgraph. This study originated in classical work of Erdős and Rényi (1960). More recent work studies this question both in building a general theory of sharp versus coarse transitions (Friedgut and Bourgain 1999; Hatami, 2012) and in results on the location of the transition (K…
▽ More
A major question in the study of the Erdős--Rényi random graph is to understand the probability that it contains a given subgraph. This study originated in classical work of Erdős and Rényi (1960). More recent work studies this question both in building a general theory of sharp versus coarse transitions (Friedgut and Bourgain 1999; Hatami, 2012) and in results on the location of the transition (Kahn and Kalai, 2007; Talagrand, 2010; Frankston, Kahn, Narayanan, Park, 2019; Park and Pham, 2022).
In inference problems, one often studies the optimal accuracy of inference as a function of the amount of noise. In a variety of sparse recovery problems, an ``all-or-nothing (AoN) phenomenon'' has been observed: Informally, as the amount of noise is gradually increased, at some critical threshold the inference problem undergoes a sharp jump from near-perfect recovery to near-zero accuracy (Gamarnik and Zadik, 2017; Reeves, Xu, Zadik, 2021). We can regard AoN as the natural inference analogue of the sharp threshold phenomenon in random graphs. In contrast with the general theory developed for sharp thresholds of random graph properties, the AoN phenomenon has only been studied so far in specific inference settings.
In this paper we study the general problem of inferring a graph $H=H_n$ planted in an Erdős--Rényi random graph, thus naturally connecting the two lines of research mentioned above. We show that questions of AoN are closely connected to first moment thresholds, and to a generalization of the so-called Kahn--Kalai expectation threshold that scans over subgraphs of $H$ of edge density at least $q$. In a variety of settings we characterize AoN, by showing that AoN occurs if and only if this ``generalized expectation threshold'' is roughly constant in $q$. Our proofs combine techniques from random graph theory and Bayesian inference.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
The Power of an Adversary in Glauber Dynamics
Authors:
Byron Chin,
Ankur Moitra,
Elchanan Mossel,
Colin Sandon
Abstract:
Glauber dynamics are a natural model of dynamics of dependent systems. While originally introduced in statistical physics, they have found important applications in the study of social networks, computer vision and other domains. In this work, we introduce a model of corrupted Glauber dynamics whereby instead of updating according to the prescribed conditional probabilities, some of the vertices a…
▽ More
Glauber dynamics are a natural model of dynamics of dependent systems. While originally introduced in statistical physics, they have found important applications in the study of social networks, computer vision and other domains. In this work, we introduce a model of corrupted Glauber dynamics whereby instead of updating according to the prescribed conditional probabilities, some of the vertices and their updates are controlled by an adversary. We study the effect of such corruptions on global features of the system. Among the questions we study are: How many nodes need to be controlled in order to change the average statistics of the system in polynomial time? And how many nodes are needed to obstruct approximate convergence of the dynamics?
Our results can be viewed as studying the robustness of classical sampling methods and are thus related to robust inference. The proofs connect to classical theory of Glauber dynamics from statistical physics.
△ Less
Submitted 1 May, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
When will (game) wars end?
Authors:
Manan Bhatia,
Byron Chin,
Nitya Mani,
Elchanan Mossel
Abstract:
We study several variants of the classical card game war. As anyone who played this game knows, the game can take some time to terminate, but it usually does. Here, we analyze a number of asymptotic variants of the game, where the number of cards is $n$, and show that all have expected termination time of order $n^2$. This is the same expected termination time as in the game where at each turn a f…
▽ More
We study several variants of the classical card game war. As anyone who played this game knows, the game can take some time to terminate, but it usually does. Here, we analyze a number of asymptotic variants of the game, where the number of cards is $n$, and show that all have expected termination time of order $n^2$. This is the same expected termination time as in the game where at each turn a fair coin toss decides which player wins a card, known as Gambler's Ruin and studied by Pascal, Fermat and others in the seventeenth century.
△ Less
Submitted 29 February, 2024; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Exact Phase Transitions for Stochastic Block Models and Reconstruction on Trees
Authors:
Elchanan Mossel,
Allan Sly,
Youngtak Sohn
Abstract:
In this paper we continue to rigorously establish the predictions in ground breaking work in statistical physics by Decelle, Krzakala, Moore, Zdeborová (2011) regarding the block model, in particular in the case of $q=3$ and $q=4$ communities.
We prove that for $q=3$ and $q=4$ there is no computational-statistical gap if the average degree is above some constant by showing it is information theo…
▽ More
In this paper we continue to rigorously establish the predictions in ground breaking work in statistical physics by Decelle, Krzakala, Moore, Zdeborová (2011) regarding the block model, in particular in the case of $q=3$ and $q=4$ communities.
We prove that for $q=3$ and $q=4$ there is no computational-statistical gap if the average degree is above some constant by showing it is information theoretically impossible to detect below the Kesten-Stigum bound. The proof is based on showing that for the broadcast process on Galton-Watson trees, reconstruction is impossible for $q=3$ and $q=4$ if the average degree is sufficiently large. This improves on the result of Sly (2009), who proved similar results for regular trees for $q=3$. Our analysis of the critical case $q=4$ provides a detailed picture showing that the tightness of the Kesten-Stigum bound in the antiferromagnetic case depends on the average degree of the tree. We also prove that for $q\geq 5$, the Kestin-Stigum bound is not sharp.
Our results prove conjectures of Decelle, Krzakala, Moore, Zdeborová (2011), Moore (2017), Abbe and Sandon (2018) and Ricci-Tersenghi, Semerjian, and Zdeborov{á} (2019). Our proofs are based on a new general coupling of the tree and graph processes and on a refined analysis of the broadcast process on the tree.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Is this correct? Let's check!
Authors:
Omri Ben-Eliezer,
Dan Mikulincer,
Elchanan Mossel,
Madhu Sudan
Abstract:
Societal accumulation of knowledge is a complex process. The correctness of new units of knowledge depends not only on the correctness of new reasoning, but also on the correctness of old units that the new one builds on. The errors in such accumulation processes are often remedied by error correction and detection heuristics.
Motivating examples include the scientific process based on scientifi…
▽ More
Societal accumulation of knowledge is a complex process. The correctness of new units of knowledge depends not only on the correctness of new reasoning, but also on the correctness of old units that the new one builds on. The errors in such accumulation processes are often remedied by error correction and detection heuristics.
Motivating examples include the scientific process based on scientific publications, and software development based on libraries of code.
Natural processes that aim to keep errors under control, such as peer review in scientific publications, and testing and debugging in software development, would typically check existing pieces of knowledge -- both for the reasoning that generated them and the previous facts they rely on. In this work, we present a simple process that models such accumulation of knowledge and study the persistence (or lack thereof) of errors.
We consider a simple probabilistic model for the generation of new units of knowledge based on the preferential attachment growth model, which additionally allows for errors. Furthermore, the process includes checks aimed at catching these errors. We investigate when effects of errors persist forever in the system (with positive probability) and when they get rooted out completely by the checking process.
The two basic parameters associated with the checking process are the {\em probability} of conducting a check and the depth of the check. We show that errors are rooted out if checks are sufficiently frequent and sufficiently deep. In contrast, shallow or infrequent checks are insufficient to root out errors.
△ Less
Submitted 17 June, 2024; v1 submitted 22 November, 2022;
originally announced November 2022.
-
The Power of Two Matrices in Spectral Algorithms
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral alg…
▽ More
Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices.
We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. On the other hand, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. This is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms.
△ Less
Submitted 7 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
A second moment proof of the spread lemma
Authors:
Elchanan Mossel,
Jonathan Niles-Weed,
Nike Sun,
Ilias Zadik
Abstract:
This note concerns a well-known result which we term the ``spread lemma,'' which establishes the existence (with high probability) of a desired structure in a random set. The spread lemma was central to two recent celebrated results: (a) the improved bounds of Alweiss, Lovett, Wu, and Zhang (2019) on the Erdős-Rado sunflower conjecture; and (b) the proof of the fractional Kahn--Kalai conjecture by…
▽ More
This note concerns a well-known result which we term the ``spread lemma,'' which establishes the existence (with high probability) of a desired structure in a random set. The spread lemma was central to two recent celebrated results: (a) the improved bounds of Alweiss, Lovett, Wu, and Zhang (2019) on the Erdős-Rado sunflower conjecture; and (b) the proof of the fractional Kahn--Kalai conjecture by Frankston, Kahn, Narayanan and Park (2019). While the lemma was first proved (and later refined) by delicate counting arguments, alternative proofs have also been given, via Shannon's noiseless coding theorem (Rao, 2019), and also via manipulations of Shannon entropy bounds (Tao, 2020).
In this note we present a new proof of the spread lemma, that takes advantage of an explicit recasting of the proof in the language of Bayesian statistical inference. We show that from this viewpoint the proof proceeds in a straightforward and principled probabilistic manner, leading to a truncated second moment calculation which concludes the proof. The proof can also be viewed as a demonstration of the ``planting trick'' introduced by Achlioptas and Coga-Oghlan (2008) in the study of random constraint satisfaction problems.
△ Less
Submitted 10 October, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.
-
On the Second Kahn--Kalai Conjecture
Authors:
Elchanan Mossel,
Jonathan Niles-Weed,
Nike Sun,
Ilias Zadik
Abstract:
For any given graph $H$, we are interested in $p_\mathrm{crit}(H)$, the minimal $p$ such that the Erdős-Rényi graph $G(n,p)$ contains a copy of $H$ with probability at least $1/2$. Kahn and Kalai (2007) conjectured that $p_\mathrm{crit}(H)$ is given up to a logarithmic factor by a simpler "subgraph expectation threshold" $p_\mathrm{E}(H)$, which is the minimal $p$ such that for every subgraph…
▽ More
For any given graph $H$, we are interested in $p_\mathrm{crit}(H)$, the minimal $p$ such that the Erdős-Rényi graph $G(n,p)$ contains a copy of $H$ with probability at least $1/2$. Kahn and Kalai (2007) conjectured that $p_\mathrm{crit}(H)$ is given up to a logarithmic factor by a simpler "subgraph expectation threshold" $p_\mathrm{E}(H)$, which is the minimal $p$ such that for every subgraph $H'\subseteq H$, the Erdős-Rényi graph $G(n,p)$ contains \emph{in expectation} at least $1/2$ copies of $H'$. It is trivial that $p_\mathrm{E}(H) \le p_\mathrm{crit}(H)$, and the so-called "second Kahn-Kalai conjecture" states that $p_\mathrm{crit}(H) \lesssim p_\mathrm{E}(H) \log e(H)$ where $e(H)$ is the number of edges in $H$.
In this article, we present a natural modification $p_\mathrm{E, new}(H)$ of the Kahn--Kalai subgraph expectation threshold, which we show is sandwiched between $p_\mathrm{E}(H)$ and $p_\mathrm{crit}(H)$. The new definition $p_\mathrm{E, new}(H)$ is based on the simple observation that if $G(n,p)$ contains a copy of $H$ and $H$ contains \emph{many} copies of $H'$, then $G(n,p)$ must also contain \emph{many} copies of $H'$. We then show that $p_\mathrm{crit}(H) \lesssim p_\mathrm{E, new}(H) \log e(H)$, thus proving a modification of the second Kahn--Kalai conjecture. The bound follows by a direct application of the set-theoretic "spread" property, which led to recent breakthroughs in the sunflower conjecture by Alweiss, Lovett, Wu and Zhang and the first fractional Kahn--Kalai conjecture by Frankston, Kahn, Narayanan and Park.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Agreement and Statistical Efficiency in Bayesian Perception Models
Authors:
Yash Deshpande,
Elchanan Mossel,
Youngtak Sohn
Abstract:
Bayesian models of group learning are studied in Economics since the 1970s. and more recently in computational linguistics. The models from Economics postulate that agents maximize utility in their communication and actions. The Economics models do not explain the ``probability matching" phenomena that are observed in many experimental studies. To address these observations, Bayesian models that d…
▽ More
Bayesian models of group learning are studied in Economics since the 1970s. and more recently in computational linguistics. The models from Economics postulate that agents maximize utility in their communication and actions. The Economics models do not explain the ``probability matching" phenomena that are observed in many experimental studies. To address these observations, Bayesian models that do not formally fit into the economic utility maximization framework were introduced. In these models individuals sample from their posteriors in communication. In this work we study the asymptotic behavior of such models on connected networks with repeated communication. Perhaps surprisingly, despite the fact that individual agents are not utility maximizers in the classical sense, we establish that the individuals ultimately agree and furthermore show that the limiting posterior is Bayes optimal.
We explore the interpretation of our results in terms of Large Language Models (LLMs). In the positive direction our results can be interpreted as stating that interaction between different LLMs can lead to optimal learning. However, we provide an example showing how misspecification may lead LLM agents to be overconfident in their estimates.
△ Less
Submitted 9 August, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Almost-Linear Planted Cliques Elude the Metropolis Process
Authors:
Zongchen Chen,
Elchanan Mossel,
Ilias Zadik
Abstract:
A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size $k=Θ(n^α), α\in (0,1/2)$, which is planted in the Erdős-Rényi random graph $G(n,1/2)$, in polynomial time. Information theoretically it is possible to find such planted cliques as soon as $k \ge (2+ε) \log n$.
Since th…
▽ More
A seminal work of Jerrum (1992) showed that large cliques elude the Metropolis process. More specifically, Jerrum showed that the Metropolis algorithm cannot find a clique of size $k=Θ(n^α), α\in (0,1/2)$, which is planted in the Erdős-Rényi random graph $G(n,1/2)$, in polynomial time. Information theoretically it is possible to find such planted cliques as soon as $k \ge (2+ε) \log n$.
Since the work of Jerrum, the computational problem of finding a planted clique in $G(n,1/2)$ was studied extensively and many polynomial time algorithms were shown to find the planted clique if it is of size $k = Ω(\sqrt{n})$, while no polynomial-time algorithm is known to work when $k=o(\sqrt{n})$. Notably, the first evidence of the problem's algorithmic hardness is commonly attributed to the result of Jerrum from 1992.
In this paper we revisit the original Metropolis algorithm suggested by Jerrum. Interestingly, we find that the Metropolis algorithm actually fails to recover a planted clique of size $k=Θ(n^α)$ for any constant $0 \leq α< 1$.
Moreover, we strengthen Jerrum's results in a number of other ways including: Like many results in the MCMC literature, the result of Jerrum shows that there exists a starting state (which may depend on the instance) for which the Metropolis algorithm fails. For a wide range of temperatures, we show that the algorithm fails when started at the most natural initial state, which is the empty clique. This answers an open problem stated in Jerrum (1992). We also show that the simulated tempering version of the Metropolis algorithm, a more sophisticated temperature-exchange variant of it, also fails at the same regime of parameters.
Finally, our results confirm recent predictions by Gamarnik and Zadik (2019) and Angelini, Fachin, de Feo (2021).
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Spectral Algorithms Optimally Recover Planted Sub-structures
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), a…
▽ More
Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), as well as in a censored variant of the SBM. Here we show that this optimality is somewhat universal as it carries over to other planted substructures such as the planted dense subgraph problem and submatrix localization problem, as well as to a censored version of the planted dense subgraph problem.
△ Less
Submitted 11 October, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Reconstruction on Trees and Low-Degree Polynomials
Authors:
Frederic Koehler,
Elchanan Mossel
Abstract:
The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, graphical models, phylogenetic reconstruction, Markov Chain Monte Carlo, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the…
▽ More
The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, graphical models, phylogenetic reconstruction, Markov Chain Monte Carlo, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves.
Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees. Perhaps surprisingly, we show that there are simple tree models with $N$ leaves and bounded arity where (1) nontrivial reconstruction of the root value is possible with a simple polynomial time algorithm and with robustness to noise, but not with any polynomial of degree $N^{c}$ for $c > 0$ a constant depending only on the arity, and (2) when the tree is unknown and given multiple samples with correlated root assignments, nontrivial reconstruction of the root value is possible with a simple Statistical Query algorithm but not with any polynomial of degree $N^c$. These results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems. They also complement recent work of Moitra, Mossel, and Sandon who studied the circuit complexity of Belief Propagation. As a consequence of our main result, we show that for some $c' > 0$ depending only on the arity, $\exp(N^{c'})$ many samples are needed for RBF kernel regression to obtain nontrivial correlation with the true regression function (BP). We pose related open questions about low-degree polynomials and the Kesten-Stigum threshold.
△ Less
Submitted 25 October, 2022; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Information Spread with Error Correction
Authors:
Omri Ben-Eliezer,
Elchanan Mossel,
Madhu Sudan
Abstract:
We study the process of information dispersal in a network with communication errors and local error-correction. Specifically we consider a simple model where a single bit of information initially known to a single source is dispersed through the network, and communication errors lead to differences in the agents' opinions on this information.
Naturally, such errors can very quickly make the com…
▽ More
We study the process of information dispersal in a network with communication errors and local error-correction. Specifically we consider a simple model where a single bit of information initially known to a single source is dispersed through the network, and communication errors lead to differences in the agents' opinions on this information.
Naturally, such errors can very quickly make the communication completely unreliable, and in this work we study to what extent this unreliability can be mitigated by local error-correction, where nodes periodically correct their opinion based on the opinion of (some subset of) their neighbors. We analyze how the error spreads in the "early stages" of information dispersal by monitoring the average opinion, i.e., the fraction of agents that have the correct information among all nodes that hold an opinion at a given time. Our main results show that even with significant effort in error-correction, tiny amounts of noise can lead the average opinion to be nearly uncorrelated with the truth in early stages. We also propose some local methods to help agents gauge when the information they have has stabilized.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Spectral Recovery of Binary Censored Block Models
Authors:
Souvik Dhara,
Julia Gaudio,
Elchanan Mossel,
Colin Sandon
Abstract:
Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of {\em censored} community detection with two communities, wh…
▽ More
Community detection is the problem of identifying community structure in graphs. Often the graph is modeled as a sample from the Stochastic Block Model, in which each vertex belongs to a community. The probability that two vertices are connected by an edge depends on the communities of those vertices. In this paper, we consider a model of {\em censored} community detection with two communities, where most of the data is missing as the status of only a small fraction of the potential edges is revealed. In this model, vertices in the same community are connected with probability $p$ while vertices in opposite communities are connected with probability $q$. The connectivity status of a given pair of vertices $\{u,v\}$ is revealed with probability $α$, independently across all pairs, where $α= \frac{t \log(n)}{n}$. We establish the information-theoretic threshold $t_c(p,q)$, such that no algorithm succeeds in recovering the communities exactly when $t < t_c(p,q)$. We show that when $t > t_c(p,q)$, a simple spectral algorithm based on a weighted, signed adjacency matrix succeeds in recovering the communities exactly.
While spectral algorithms are shown to have near-optimal performance in the symmetric case, we show that they may fail in the asymmetric case where the connection probabilities inside the two communities are allowed to be different. In particular, we show the existence of a parameter regime where a simple two-phase algorithm succeeds but any algorithm based on the top two eigenvectors of the weighted, signed adjacency matrix fails.
△ Less
Submitted 10 November, 2021; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Approximate polymorphisms
Authors:
Gilad Chase,
Yuval Filmus,
Dor Minzer,
Elchanan Mossel,
Nitin Saurabh
Abstract:
For a function $g\colon\{0,1\}^m\to\{0,1\}$, a function $f\colon \{0,1\}^n\to\{0,1\}$ is called a $g$-polymorphism if their actions commute: $f(g(\mathsf{row}_1(Z)),\ldots,g(\mathsf{row}_n(Z))) = g(f(\mathsf{col}_1(Z)),\ldots,f(\mathsf{col}_m(Z)))$ for all $Z\in\{0,1\}^{n\times m}$. The function $f$ is called an approximate polymorphism if this equality holds with probability close to $1$, when…
▽ More
For a function $g\colon\{0,1\}^m\to\{0,1\}$, a function $f\colon \{0,1\}^n\to\{0,1\}$ is called a $g$-polymorphism if their actions commute: $f(g(\mathsf{row}_1(Z)),\ldots,g(\mathsf{row}_n(Z))) = g(f(\mathsf{col}_1(Z)),\ldots,f(\mathsf{col}_m(Z)))$ for all $Z\in\{0,1\}^{n\times m}$. The function $f$ is called an approximate polymorphism if this equality holds with probability close to $1$, when $Z$ is sampled uniformly.
We study the structure of exact polymorphisms as well as approximate polymorphisms. Our results include:
- We prove that an approximate polymorphism $f$ must be close to an exact polymorphism;
- We give a characterization of exact polymorphisms, showing that besides trivial cases, only the functions $g = \mathsf{AND}, \mathsf{XOR}, \mathsf{OR}, \mathsf{NXOR}$ admit non-trivial exact polymorphisms.
We also study the approximate polymorphism problem in the list-decoding regime (i.e., when the probability equality holds is not close to $1$, but is bounded away from some value). We show that if $f(x \land y) = f(x) \land f(y)$ with probability larger than $s_\land \approx 0.815$ then $f$ correlates with some low-degree character, and $s_\land$ is the optimal threshold for this property.
Our result generalize the classical linearity testing result of Blum, Luby and Rubinfeld, that in this language showed that the approximate polymorphisms of $g = \mathsf{XOR}$ are close to XOR's, as well as a recent result of Filmus, Lifshitz, Minzer and Mossel, showing that the approximate polymorphisms of AND can only be close to AND functions.
△ Less
Submitted 20 June, 2021; v1 submitted 31 May, 2021;
originally announced June 2021.
-
Inference in Opinion Dynamics under Social Pressure
Authors:
Ali Jadbabaie,
Anuran Makur,
Elchanan Mossel,
Rabih Salhab
Abstract:
We introduce a new opinion dynamics model where a group of agents holds two kinds of opinions: inherent and declared. Each agent's inherent opinion is fixed and unobservable by the other agents. At each time step, agents broadcast their declared opinions on a social network, which are governed by the agents' inherent opinions and social pressure. In particular, we assume that agents may declare op…
▽ More
We introduce a new opinion dynamics model where a group of agents holds two kinds of opinions: inherent and declared. Each agent's inherent opinion is fixed and unobservable by the other agents. At each time step, agents broadcast their declared opinions on a social network, which are governed by the agents' inherent opinions and social pressure. In particular, we assume that agents may declare opinions that are not aligned with their inherent opinions to conform with their neighbors. This raises the natural question: Can we estimate the agents' inherent opinions from observations of declared opinions? For example, agents' inherent opinions may represent their true political alliances (Democrat or Republican), while their declared opinions may model the political inclinations of tweets on social media. In this context, we may seek to predict the election results by observing voters' tweets, which do not necessarily reflect their political support due to social pressure. We analyze this question in the special case where the underlying social network is a complete graph. We prove that, as long as the population does not include large majorities, estimation of aggregate and individual inherent opinions is possible. On the other hand, large majorities force minorities to lie over time, which makes asymptotic estimation impossible.
△ Less
Submitted 3 May, 2022; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Probabilistic Aspects of Voting, Intransitivity and Manipulation
Authors:
Elchanan Mossel
Abstract:
These lecture notes are based on lectures given in 2019 Saint-Flour Probability School.
These lecture notes are based on lectures given in 2019 Saint-Flour Probability School.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
Shotgun Assembly of Erdos-Renyi Random Graphs
Authors:
Julia Gaudio,
Elchanan Mossel
Abstract:
Graph shotgun assembly refers to the problem of reconstructing a graph from a collection of local neighborhoods. In this paper, we consider shotgun assembly of \ER random graphs $G(n, p_n)$, where $p_n = n^{-α}$ for $0 < α< 1$. We consider both reconstruction up to isomorphism as well as exact reconstruction (recovering the vertex labels as well as the structure). We show that given the collection…
▽ More
Graph shotgun assembly refers to the problem of reconstructing a graph from a collection of local neighborhoods. In this paper, we consider shotgun assembly of \ER random graphs $G(n, p_n)$, where $p_n = n^{-α}$ for $0 < α< 1$. We consider both reconstruction up to isomorphism as well as exact reconstruction (recovering the vertex labels as well as the structure). We show that given the collection of distance-$1$ neighborhoods, $G$ is exactly reconstructable for $0 < α< \frac{1}{3}$, but not reconstructable for $\frac{1}{2} < α< 1$. Given the collection of distance-$2$ neighborhoods, $G$ is exactly reconstructable for $α\in \left(0, \frac{1}{2}\right) \cup \left(\frac{1}{2}, \frac{3}{5}\right)$, but not reconstructable for $\frac{3}{4} < α< 1$.
△ Less
Submitted 12 January, 2022; v1 submitted 27 October, 2020;
originally announced October 2020.
-
Broadcasting on Two-Dimensional Regular Grids
Authors:
Anuran Makur,
Elchanan Mossel,
Yury Polyanskiy
Abstract:
We study a specialization of the problem of broadcasting on directed acyclic graphs, namely, broadcasting on 2D regular grids. Consider a 2D regular grid with source vertex $X$ at layer $0$ and $k+1$ vertices at layer $k\geq 1$, which are at distance $k$ from $X$. Every vertex of the 2D regular grid has outdegree $2$, the vertices at the boundary have indegree $1$, and all other vertices have inde…
▽ More
We study a specialization of the problem of broadcasting on directed acyclic graphs, namely, broadcasting on 2D regular grids. Consider a 2D regular grid with source vertex $X$ at layer $0$ and $k+1$ vertices at layer $k\geq 1$, which are at distance $k$ from $X$. Every vertex of the 2D regular grid has outdegree $2$, the vertices at the boundary have indegree $1$, and all other vertices have indegree $2$. At time $0$, $X$ is given a random bit. At time $k\geq 1$, each vertex in layer $k$ receives transmitted bits from its parents in layer $k-1$, where the bits pass through binary symmetric channels with noise level $δ\in(0,1/2)$. Then, each vertex combines its received bits using a common Boolean processing function to produce an output bit. The objective is to recover $X$ with probability of error better than $1/2$ from all vertices at layer $k$ as $k \rightarrow \infty$. Besides their natural interpretation in communication networks, such broadcasting processes can be construed as 1D probabilistic cellular automata (PCA) with boundary conditions that limit the number of sites at each time $k$ to $k+1$. We conjecture that it is impossible to propagate information in a 2D regular grid regardless of the noise level and the choice of processing function. In this paper, we make progress towards establishing this conjecture, and prove using ideas from percolation and coding theory that recovery of $X$ is impossible for any $δ$ provided that all vertices use either AND or XOR processing functions. Furthermore, we propose a martingale-based approach that establishes the impossibility of recovering $X$ for any $δ$ when all NAND processing functions are used if certain supermartingales can be rigorously constructed. We also provide numerical evidence for the existence of these supermartingales by computing explicit examples for different values of $δ$ via linear programming.
△ Less
Submitted 16 September, 2022; v1 submitted 3 October, 2020;
originally announced October 2020.
-
A Phase Transition in Arrow's Theorem
Authors:
Frederic Koehler,
Elchanan Mossel
Abstract:
Arrow's Theorem concerns a fundamental problem in social choice theory: given the individual preferences of members of a group, how can they be aggregated to form rational group preferences? Arrow showed that in an election between three or more candidates, there are situations where any voting rule satisfying a small list of natural "fairness" axioms must produce an apparently irrational intransi…
▽ More
Arrow's Theorem concerns a fundamental problem in social choice theory: given the individual preferences of members of a group, how can they be aggregated to form rational group preferences? Arrow showed that in an election between three or more candidates, there are situations where any voting rule satisfying a small list of natural "fairness" axioms must produce an apparently irrational intransitive outcome. Furthermore, quantitative versions of Arrow's Theorem in the literature show that when voters choose rankings in an i.i.d.\ fashion, the outcome is intransitive with non-negligible probability.
It is natural to ask if such a quantitative version of Arrow's Theorem holds for non-i.i.d.\ models. To answer this question, we study Arrow's Theorem under a natural non-i.i.d.\ model of voters inspired by canonical models in statistical physics; indeed, a version of this model was previously introduced by Raffaelli and Marsili in the physics literature. This model has a parameter, temperature, that prescribes the correlation between different voters. We show that the behavior of Arrow's Theorem in this model undergoes a striking phase transition: in the entire high temperature regime of the model, a Quantitative Arrow's Theorem holds showing that the probability of paradox for any voting rule satisfying the axioms is non-negligible; this is tight because the probability of paradox under pairwise majority goes to zero when approaching the critical temperature, and becomes exponentially small in the number of voters beyond it. We prove this occurs in another natural model of correlated voters and conjecture this phenomena is quite general.
△ Less
Submitted 24 September, 2021; v1 submitted 27 April, 2020;
originally announced April 2020.
-
AND Testing and Robust Judgement Aggregation
Authors:
Yuval Filmus,
Noam Lifshitz,
Dor Minzer,
Elchanan Mossel
Abstract:
A function $f\colon\{0,1\}^n\to \{0,1\}$ is called an approximate AND-homomorphism if choosing ${\bf x},{\bf y}\in\{0,1\}^n$ randomly, we have that $f({\bf x}\land {\bf y}) = f({\bf x})\land f({\bf y})$ with probability at least $1-ε$, where $x\land y = (x_1\land y_1,\ldots,x_n\land y_n)$. We prove that if $f\colon \{0,1\}^n \to \{0,1\}$ is an approximate AND-homomorphism, then $f$ is $δ$-close to…
▽ More
A function $f\colon\{0,1\}^n\to \{0,1\}$ is called an approximate AND-homomorphism if choosing ${\bf x},{\bf y}\in\{0,1\}^n$ randomly, we have that $f({\bf x}\land {\bf y}) = f({\bf x})\land f({\bf y})$ with probability at least $1-ε$, where $x\land y = (x_1\land y_1,\ldots,x_n\land y_n)$. We prove that if $f\colon \{0,1\}^n \to \{0,1\}$ is an approximate AND-homomorphism, then $f$ is $δ$-close to either a constant function or an AND function, where $δ(ε) \to 0$ as $ε\to0$. This improves on a result of Nehama, who proved a similar statement in which $δ$ depends on $n$.
Our theorem implies a strong result on judgement aggregation in computational social choice. In the language of social choice, our result shows that if $f$ is $ε$-close to satisfying judgement aggregation, then it is $δ(ε)$-close to an oligarchy (the name for the AND function in social choice theory). This improves on Nehama's result, in which $δ$ decays polynomially with $n$.
Our result follows from a more general one, in which we characterize approximate solutions to the eigenvalue equation $\mathrm T f = λg$, where $\mathrm T$ is the downwards noise operator $\mathrm T f(x) = \mathbb{E}_{\bf y}[f(x \land {\bf y})]$, $f$ is $[0,1]$-valued, and $g$ is $\{0,1\}$-valued. We identify all exact solutions to this equation, and show that any approximate solution in which $\mathrm T f$ and $λg$ are close is close to an exact solution.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation
Authors:
Vishesh Jain,
Frederic Koehler,
**gbo Liu,
Elchanan Mossel
Abstract:
The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics.
We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is…
▽ More
The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics.
We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is statistically much weaker than Belief Propagation for the reconstruction problem. More formally, any recursive algorithm with bounded memory for the reconstruction problem on the trees with the binary symmetric channel has a phase transition strictly below the Belief Propagation threshold, also known as the Kesten-Stigum bound. The proof combines in novel fashion tools from recursive reconstruction, information theory, and optimal transport, and also establishes an asymptotic normality result for BP and other message-passing algorithms near the critical threshold.
△ Less
Submitted 24 May, 2019;
originally announced May 2019.
-
Seeding with Costly Network Information
Authors:
Dean Eckles,
Hossein Esfandiari,
Elchanan Mossel,
M. Amin Rahimian
Abstract:
We study the task of selecting $k$ nodes, in a social network of size $n$, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability $p$. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees given knowledge of the entire netwo…
▽ More
We study the task of selecting $k$ nodes, in a social network of size $n$, to seed a diffusion with maximum expected spread size, under the independent cascade model with cascade probability $p$. Most of the previous work on this problem (known as influence maximization) focuses on efficient algorithms to approximate the optimal seed set with provable guarantees given knowledge of the entire network; however, obtaining full knowledge of the network is often very costly in practice. Here we develop algorithms and guarantees for approximating the optimal seed set while bounding how much network information is collected. First, we study the achievable guarantees using a sublinear influence sample size. We provide an almost tight approximation algorithm with an additive $εn$ loss and show that the squared dependence of sample size on $k$ is asymptotically optimal when $ε$ is small. We then propose a probing algorithm that queries edges from the graph and use them to find a seed set with the same almost tight approximation guarantee. We also provide a matching (up to logarithmic factors) lower-bound on the required number of edges. This algorithm is implementable in field surveys or in crawling online networks. Our probing takes $p$ as an input which may not be known in advance, and we show how to down-sample the probed edges to match the best estimate of $p$ if they are collected with a higher probability. Finally, we test our algorithms on an empirical network to quantify the tradeoff between the cost of obtaining more refined network information and the benefit of the added information for guiding improved seeding strategies.
△ Less
Submitted 21 May, 2022; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Regular graphs with linearly many triangles
Authors:
Pim van der Hoorn,
Gabor Lippner,
Elchanan Mossel
Abstract:
A $d$-regular graph on $n$ nodes has at most $T_{\max} = \frac{n}{3} \tbinom{d}{2}$ triangles. We compute the leading asymptotics of the probability that a large random $d$-regular graph has at least $c \cdot T_{\max}$ triangles, and provide a strong structural description of such graphs.
When $d$ is fixed, we show that such graphs typically consist of many disjoint $d+1$-cliques and an almost t…
▽ More
A $d$-regular graph on $n$ nodes has at most $T_{\max} = \frac{n}{3} \tbinom{d}{2}$ triangles. We compute the leading asymptotics of the probability that a large random $d$-regular graph has at least $c \cdot T_{\max}$ triangles, and provide a strong structural description of such graphs.
When $d$ is fixed, we show that such graphs typically consist of many disjoint $d+1$-cliques and an almost triangle-free part. When $d$ is allowed to grow with $n$, we show that such graphs typically consist of $d+o(d)$ sized almost cliques together with an almost triangle-free part.
This confirms a conjecture of Collet and Eckmann from 2002 and considerably strengthens their observation that the triangles cannot be totally scattered in typical instances of regular graphs with many triangles.
△ Less
Submitted 14 April, 2021; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Broadcasting on Random Directed Acyclic Graphs
Authors:
Anuran Makur,
Elchanan Mossel,
Yury Polyanskiy
Abstract:
We study a generalization of the well-known model of broadcasting on trees. Consider a directed acyclic graph (DAG) with a unique source vertex $X$, and suppose all other vertices have indegree $d\geq 2$. Let the vertices at distance $k$ from $X$ be called layer $k$. At layer $0$, $X$ is given a random bit. At layer $k\geq 1$, each vertex receives $d$ bits from its parents in layer $k-1$, which ar…
▽ More
We study a generalization of the well-known model of broadcasting on trees. Consider a directed acyclic graph (DAG) with a unique source vertex $X$, and suppose all other vertices have indegree $d\geq 2$. Let the vertices at distance $k$ from $X$ be called layer $k$. At layer $0$, $X$ is given a random bit. At layer $k\geq 1$, each vertex receives $d$ bits from its parents in layer $k-1$, which are transmitted along independent binary symmetric channel edges, and combines them using a $d$-ary Boolean processing function. The goal is to reconstruct $X$ with probability of error bounded away from $1/2$ using the values of all vertices at an arbitrarily deep layer. This question is closely related to models of reliable computation and storage, and information flow in biological networks.
In this paper, we analyze randomly constructed DAGs, for which we show that broadcasting is only possible if the noise level is below a certain degree and function dependent critical threshold. For $d\geq 3$, and random DAGs with layer sizes $Ω(\log k)$ and majority processing functions, we identify the critical threshold. For $d=2$, we establish a similar result for NAND processing functions. We also prove a partial converse for odd $d\geq 3$ illustrating that the identified thresholds are impossible to improve by selecting different processing functions if the decoder is restricted to using a single vertex.
Finally, for any noise level, we construct explicit DAGs (using expander graphs) with bounded degree and layer sizes $Θ(\log k)$ admitting reconstruction. In particular, we show that such DAGs can be generated in deterministic quasi-polynomial time or randomized polylogarithmic time in the depth. These results portray a doubly-exponential advantage for storing a bit in DAGs compared to trees, where $d=1$ but layer sizes must grow exponentially with depth in order to enable broadcasting.
△ Less
Submitted 9 March, 2020; v1 submitted 7 November, 2018;
originally announced November 2018.
-
How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories
Authors:
Younhun Kim,
Frederic Koehler,
Ankur Moitra,
Elchanan Mossel,
Govind Ramnarayan
Abstract:
Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from informa…
▽ More
Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.
△ Less
Submitted 8 May, 2019; v1 submitted 7 November, 2018;
originally announced November 2018.
-
Long ties accelerate noisy threshold-based contagions
Authors:
Dean Eckles,
Elchanan Mossel,
M. Amin Rahimian,
Subhabrata Sen
Abstract:
Network structure can affect when and how widely new ideas, products, and behaviors are adopted. In widely-used models of biological contagion, interventions that randomly rewire edges (on average making them "longer") accelerate spread. However, there are other models relevant to social contagion, such as those motivated by myopic best-response in games with strategic complements, in which an ind…
▽ More
Network structure can affect when and how widely new ideas, products, and behaviors are adopted. In widely-used models of biological contagion, interventions that randomly rewire edges (on average making them "longer") accelerate spread. However, there are other models relevant to social contagion, such as those motivated by myopic best-response in games with strategic complements, in which an individual's behavior is described by a threshold number of adopting neighbors above which adoption occurs (i.e., complex contagions). Recent work has argued that highly clustered, rather than random, networks facilitate spread of these complex contagions. Here we show that minor modifications to this model, which make it more realistic, reverse this result, thereby harmonizing qualitative facts about how network structure affects contagion. To model the trade-off between long and short edges we analyze the rate of spread over networks that are the union of circular lattices and random graphs on $n$ nodes. Allowing for noise in adoption decisions (i.e., adoptions below threshold) to occur with order ${n}^{-1/θ}$ probability along at least some "short" cycle edges is enough to ensure that random rewiring accelerates the spread of a noisy threshold-$θ$ contagion. This conclusion also holds under partial but frequent enough rewiring and when adoption decisions are reversible but infrequently so, as well as in high-dimensional lattice structures that facilitate faster-expanding contagions. Simulations illustrate the robustness of these results to several variations on this noisy best-response behavior. Hypothetical interventions that randomly rewire existing edges or add random edges (versus adding "short", triad-closing edges) in hundreds of empirical social networks reduce time to spread.
△ Less
Submitted 20 August, 2023; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Reasoning in Bayesian Opinion Exchange Networks Is PSPACE-Hard
Authors:
Jan Hązła,
Ali Jadbabaie,
Elchanan Mossel,
M. Amin Rahimian
Abstract:
We study the Bayesian model of opinion exchange of fully rational agents arranged on a network. In this model, the agents receive private signals that are indicative of an unkown state of the world. Then, they repeatedly announce the state of the world they consider most likely to their neighbors, at the same time updating their beliefs based on their neighbors' announcements.
This model is exte…
▽ More
We study the Bayesian model of opinion exchange of fully rational agents arranged on a network. In this model, the agents receive private signals that are indicative of an unkown state of the world. Then, they repeatedly announce the state of the world they consider most likely to their neighbors, at the same time updating their beliefs based on their neighbors' announcements.
This model is extensively studied in economics since the work of Aumann (1976) and Geanakoplos and Polemarchakis (1982). It is known that the agents eventually agree with high probability on any network. It is often argued that the computations needed by agents in this model are difficult, but prior to our results there was no rigorous work showing this hardness.
We show that it is PSPACE-hard for the agents to compute their actions in this model. Furthermore, we show that it is equally difficult even to approximate an agent's posterior: It is PSPACE-hard to distinguish between the posterior being almost entirely concentrated on one state of the world or another.
△ Less
Submitted 4 September, 2018;
originally announced September 2018.
-
Learning Restricted Boltzmann Machines via Influence Maximization
Authors:
Guy Bresler,
Frederic Koehler,
Ankur Moitra,
Elchanan Mossel
Abstract:
Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are…
▽ More
Graphical models are a rich language for describing high-dimensional distributions in terms of their dependence structure. While there are algorithms with provable guarantees for learning undirected graphical models in a variety of settings, there has been much less progress in the important scenario when there are latent variables. Here we study Restricted Boltzmann Machines (or RBMs), which are a popular model with wide-ranging applications in dimensionality reduction, collaborative filtering, topic modeling, feature extraction and deep learning.
The main message of our paper is a strong dichotomy in the feasibility of learning RBMs, depending on the nature of the interactions between variables: ferromagnetic models can be learned efficiently, while general models cannot. In particular, we give a simple greedy algorithm based on influence maximization to learn ferromagnetic RBMs with bounded degree. In fact, we learn a description of the distribution on the observed variables as a Markov Random Field. Our analysis is based on tools from mathematical physics that were developed to show the concavity of magnetization. Our algorithm extends straighforwardly to general ferromagnetic Ising models with latent variables.
Conversely, we show that even for a contant number of latent variables with constant degree, without ferromagneticity the problem is as hard as sparse parity with noise. This hardness result is based on a sharp and surprising characterization of the representational power of bounded degree RBMs: the distribution on their observed variables can simulate any bounded order MRF. This result is of independent interest since RBMs are the building blocks of deep belief networks.
△ Less
Submitted 5 November, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
The Probability of Intransitivity in Dice and Close Elections
Authors:
Jan Hązła,
Elchanan Mossel,
Nathan Ross,
Guangqu Zheng
Abstract:
We study the phenomenon of intransitivity in models of dice and voting.
First, we follow a recent thread of research for $n$-sided dice with pairwise ordering induced by the probability, relative to $1/2$, that a throw from one die is higher than the other. We build on a recent result of Polymath showing that three dice with i.i.d. faces drawn from the uniform distribution on $\{1,\ldots,n\}$ an…
▽ More
We study the phenomenon of intransitivity in models of dice and voting.
First, we follow a recent thread of research for $n$-sided dice with pairwise ordering induced by the probability, relative to $1/2$, that a throw from one die is higher than the other. We build on a recent result of Polymath showing that three dice with i.i.d. faces drawn from the uniform distribution on $\{1,\ldots,n\}$ and conditioned on the average of faces equal to $(n+1)/2$ are intransitive with asymptotic probability $1/4$. We show that if dice faces are drawn from a non-uniform continuous mean zero distribution conditioned on the average of faces equal to $0$, then three dice are transitive with high probability. We also extend our results to stationary Gaussian dice, whose faces, for example, can be the fractional Brownian increments with Hurst index $H\in(0,1)$.
Second, we pose an analogous model in the context of Condorcet voting. We consider $n$ voters who rank $k$ alternatives independently and uniformly at random. The winner between each two alternatives is decided by a majority vote based on the preferences. We show that in this model, if all pairwise elections are close to tied, then the asymptotic probability of obtaining any tournament on the $k$ alternatives is equal to $2^{-k(k-1)/2}$, which markedly differs from known results in the model without conditioning. We also explore the Condorcet voting model where methods other than simple majority are used for pairwise elections. We investigate some natural definitions of "close to tied" for general functions and exhibit an example where the distribution over tournaments is not uniform under those definitions.
△ Less
Submitted 13 August, 2020; v1 submitted 2 April, 2018;
originally announced April 2018.
-
Broadcasting on Bounded Degree DAGs
Authors:
Anuran Makur,
Elchanan Mossel,
Yury Polyanskiy
Abstract:
We study the following generalization of the well-known model of broadcasting on trees. Consider an infinite directed acyclic graph (DAG) with a unique source node $X$. Let the collection of nodes at distance $k$ from $X$ be called the $k$th layer. At time zero, the source node is given a bit. At time $k\geq 1$, each node in the $(k-1)$th layer inspects its inputs and sends a bit to its descendant…
▽ More
We study the following generalization of the well-known model of broadcasting on trees. Consider an infinite directed acyclic graph (DAG) with a unique source node $X$. Let the collection of nodes at distance $k$ from $X$ be called the $k$th layer. At time zero, the source node is given a bit. At time $k\geq 1$, each node in the $(k-1)$th layer inspects its inputs and sends a bit to its descendants in the $k$th layer. Each bit is flipped with a probability of error $δ\in \left(0,\frac{1}{2}\right)$ in the process of transmission. The goal is to be able to recover the original bit with probability of error better than $\frac{1}{2}$ from the values of all nodes at an arbitrarily deep layer $k$.
Besides its natural broadcast interpretation, the DAG broadcast is a natural model of noisy computation. Some special cases of the model represent information flow in biological networks, and other cases represent noisy finite automata models.
We show that there exist DAGs with bounded degree and layers of size $ω(\log(k))$ that permit recovery provided $δ$ is sufficiently small and find the critical $δ$ for the DAGs constructed. Our result demonstrates a doubly-exponential advantage for storing a bit in bounded degree DAGs compared to trees. On the negative side, we show that if the DAG is a two-dimensional regular grid, then recovery is impossible for any $δ\in \left(0,\frac{1}{2}\right)$ provided all nodes use either AND or XOR for their processing functions.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
-
The Vertex Sample Complexity of Free Energy is Polynomial
Authors:
Vishesh Jain,
Frederic Koehler,
Elchanan Mossel
Abstract:
We study the following question: given a massive Markov random field on $n$ nodes, can a small sample from it provide a rough approximation to the free energy $\mathcal{F}_n = \log{Z_n}$?
Results in graph limit literature by Borgs, Chayes, Lovász, Sós, and Vesztergombi show that for Ising models on $n$ nodes and interactions of strength $Θ(1/n)$, an $ε$ approximation to $\log Z_n / n$ can be ach…
▽ More
We study the following question: given a massive Markov random field on $n$ nodes, can a small sample from it provide a rough approximation to the free energy $\mathcal{F}_n = \log{Z_n}$?
Results in graph limit literature by Borgs, Chayes, Lovász, Sós, and Vesztergombi show that for Ising models on $n$ nodes and interactions of strength $Θ(1/n)$, an $ε$ approximation to $\log Z_n / n$ can be achieved by sampling a randomly induced model on $2^{O(1/ε^2)}$ nodes. We show that the sampling complexity of this problem is {\em polynomial in} $1/ε$. We further show a polynomial dependence on $ε$ cannot be avoided.
Our results are very general as they apply to higher order Markov random fields. For Markov random fields of order $r$, we obtain an algorithm that achieves $ε$ approximation using a number of samples polynomial in $r$ and $1/ε$ and running time that is $2^{O(1/ε^2)}$ up to polynomial factors in $r$ and $ε$. For ferromagnetic Ising models, the running time is polynomial in $1/ε$.
Our results are intimately connected to recent research on the regularity lemma and property testing, where the interest is in finding which properties can tested within $ε$ error in time polynomial in $1/ε$. In particular, our proofs build on results from a recent work by Alon, de la Vega, Kannan and Karpinski, who also introduced the notion of polynomial vertex sample complexity. Another critical ingredient of the proof is an effective bound by the authors of the paper relating the variational free energy and the free energy.
△ Less
Submitted 23 February, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
The Mean-Field Approximation: Information Inequalities, Algorithms, and Complexity
Authors:
Vishesh Jain,
Frederic Koehler,
Elchanan Mossel
Abstract:
The mean field approximation to the Ising model is a canonical variational tool that is used for analysis and inference in Ising models. We provide a simple and optimal bound for the KL error of the mean field approximation for Ising models on general graphs, and extend it to higher order Markov random fields. Our bound improves on previous bounds obtained in work in the graph limit literature by…
▽ More
The mean field approximation to the Ising model is a canonical variational tool that is used for analysis and inference in Ising models. We provide a simple and optimal bound for the KL error of the mean field approximation for Ising models on general graphs, and extend it to higher order Markov random fields. Our bound improves on previous bounds obtained in work in the graph limit literature by Borgs, Chayes, Lovász, Sós, and Vesztergombi and another recent work by Basak and Mukherjee. Our bound is tight up to lower order terms. Building on the methods used to prove the bound, along with techniques from combinatorics and optimization, we study the algorithmic problem of estimating the (variational) free energy for Ising models and general Markov random fields. For a graph $G$ on $n$ vertices and interaction matrix $J$ with Frobenius norm $\| J \|_F$, we provide algorithms that approximate the free energy within an additive error of $εn \|J\|_F$ in time $\exp(poly(1/ε))$. We also show that approximation within $(n \|J\|_F)^{1-δ}$ is NP-hard for every $δ> 0$. Finally, we provide more efficient approximation algorithms, which find the optimal mean field approximation, for ferromagnetic Ising models and for Ising models satisfying Dobrushin's condition.
△ Less
Submitted 20 February, 2018; v1 submitted 16 February, 2018;
originally announced February 2018.
-
Coalescent-based species tree estimation: a stochastic Farris transform
Authors:
Gautam Dasarathy,
Elchanan Mossel,
Robert Nowak,
Sebastien Roch
Abstract:
The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconst…
▽ More
The reconstruction of a species phylogeny from genomic data faces two significant hurdles: 1) the trees describing the evolution of each individual gene--i.e., the gene trees--may differ from the species phylogeny and 2) the molecular sequences corresponding to each gene often provide limited information about the gene trees themselves. In this paper we consider an approach to species tree reconstruction that addresses both these hurdles. Specifically, we propose an algorithm for phylogeny reconstruction under the multispecies coalescent model with a standard model of site substitution. The multispecies coalescent is commonly used to model gene tree discordance due to incomplete lineage sorting, a well-studied population-genetic effect.
In previous work, an information-theoretic trade-off was derived in this context between the number of loci, $m$, needed for an accurate reconstruction and the length of the locus sequences, $k$. It was shown that to reconstruct an internal branch of length $f$, one needs $m$ to be of the order of $1/[f^{2} \sqrt{k}]$. That previous result was obtained under the molecular clock assumption, i.e., under the assumption that mutation rates (as well as population sizes) are constant across the species phylogeny.
Here we generalize this result beyond the restrictive molecular clock assumption, and obtain a new reconstruction algorithm that has the same data requirement (up to log factors). Our main contribution is a novel reduction to the molecular clock case under the multispecies coalescent. As a corollary, we also obtain a new identifiability result of independent interest: for any species tree with $n \geq 3$ species, the rooted species tree can be identified from the distribution of its unrooted weighted gene trees even in the absence of a molecular clock.
△ Less
Submitted 13 July, 2017;
originally announced July 2017.
-
Bayesian Decision Making in Groups is Hard
Authors:
Jan Hązła,
Ali Jadbabaie,
Elchanan Mossel,
M. Amin Rahimian
Abstract:
We study the computations that Bayesian agents undertake when exchanging opinions over a network. The agents act repeatedly on their private information and take myopic actions that maximize their expected utility according to a fully rational posterior belief. We show that such computations are NP-hard for two natural utility functions: one with binary actions, and another where agents reveal the…
▽ More
We study the computations that Bayesian agents undertake when exchanging opinions over a network. The agents act repeatedly on their private information and take myopic actions that maximize their expected utility according to a fully rational posterior belief. We show that such computations are NP-hard for two natural utility functions: one with binary actions, and another where agents reveal their posterior beliefs. In fact, we show that distinguishing between posteriors that are concentrated on different states of the world is NP-hard. Therefore, even approximating the Bayesian posterior beliefs is hard. We also describe a natural search algorithm to compute agents' actions, which we call elimination of impossible signals, and show that if the network is transitive, the algorithm can be modified to run in polynomial time.
△ Less
Submitted 27 July, 2019; v1 submitted 12 May, 2017;
originally announced May 2017.
-
Gaussian Bounds for Noise Correlation of Resilient Functions
Authors:
Elchanan Mossel
Abstract:
Gaussian bounds on noise correlation of functions play an important role in hardness of approximation, in quantitative social choice theory and in testing. The author (2008) obtained sharp gaussian bounds for the expected correlation of $\ell$ low influence functions $f^{(1)},\ldots, f^{(\ell)} : Ω^n \to [0,1]$, where the inputs to the functions are correlated via the $n$-fold tensor of distributi…
▽ More
Gaussian bounds on noise correlation of functions play an important role in hardness of approximation, in quantitative social choice theory and in testing. The author (2008) obtained sharp gaussian bounds for the expected correlation of $\ell$ low influence functions $f^{(1)},\ldots, f^{(\ell)} : Ω^n \to [0,1]$, where the inputs to the functions are correlated via the $n$-fold tensor of distribution $\mathcal{P}$ on $Ω^{\ell}$.
It is natural to ask if the condition of low influences can be relaxed to the condition that the function has vanishing Fourier coefficients. Here we answer this question affirmatively. For the case of two functions $f$ and $g$, we further show that if $f,g$ have a noisy inner product that exceeds the gaussian bound, then the Fourier supports of their large coefficients intersect.
△ Less
Submitted 23 October, 2017; v1 submitted 16 April, 2017;
originally announced April 2017.
-
Non interactive simulation of correlated distributions is decidable
Authors:
Anindya De,
Elchanan Mossel,
Joe Neeman
Abstract:
A basic problem in information theory is the following: Let $\mathbf{P} = (\mathbf{X}, \mathbf{Y})$ be an arbitrary distribution where the marginals $\mathbf{X}$ and $\mathbf{Y}$ are (potentially) correlated. Let Alice and Bob be two players where Alice gets samples $\{x_i\}_{i \ge 1}$ and Bob gets samples $\{y_i\}_{i \ge 1}$ and for all $i$, $(x_i, y_i) \sim \mathbf{P}$. What joint distributions…
▽ More
A basic problem in information theory is the following: Let $\mathbf{P} = (\mathbf{X}, \mathbf{Y})$ be an arbitrary distribution where the marginals $\mathbf{X}$ and $\mathbf{Y}$ are (potentially) correlated. Let Alice and Bob be two players where Alice gets samples $\{x_i\}_{i \ge 1}$ and Bob gets samples $\{y_i\}_{i \ge 1}$ and for all $i$, $(x_i, y_i) \sim \mathbf{P}$. What joint distributions $\mathbf{Q}$ can be simulated by Alice and Bob without any interaction?
Classical works in information theory by G{á}cs-K{ö}rner and Wyner answer this question when at least one of $\mathbf{P}$ or $\mathbf{Q}$ is the distribution on $\{0,1\} \times \{0,1\}$ where each marginal is unbiased and identical. However, other than this special case, the answer to this question is understood in very few cases. Recently, Ghazi, Kamath and Sudan showed that this problem is decidable for $\mathbf{Q}$ supported on $\{0,1\} \times \{0,1\}$. We extend their result to $\mathbf{Q}$ supported on any finite alphabet.
We rely on recent results in Gaussian geometry (by the authors) as well as a new \emph{smoothing argument} inspired by the method of \emph{boosting} from learning theory and potential function arguments from complexity theory and additive combinatorics.
△ Less
Submitted 15 February, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Noise Stability is computable and low dimensional
Authors:
Anindya De,
Elchanan Mossel,
Joe Neeman
Abstract:
Questions of noise stability play an important role in hardness of approximation in computer science as well as in the theory of voting. In many applications, the goal is to find an optimizer of noise stability among all possible partitions of $\mathbb{R}^n$ for $n \geq 1$ to $k$ parts with given Gaussian measures $μ_1,\ldots,μ_k$. We call a partition $ε$-optimal, if its noise stability is optimal…
▽ More
Questions of noise stability play an important role in hardness of approximation in computer science as well as in the theory of voting. In many applications, the goal is to find an optimizer of noise stability among all possible partitions of $\mathbb{R}^n$ for $n \geq 1$ to $k$ parts with given Gaussian measures $μ_1,\ldots,μ_k$. We call a partition $ε$-optimal, if its noise stability is optimal up to an additive $ε$. In this paper, we give an explicit, computable function $n(ε)$ such that an $ε$-optimal partition exists in $\mathbb{R}^{n(ε)}$. This result has implications for the computability of certain problems in non-interactive simulation, which are addressed in a subsequent work.
△ Less
Submitted 15 February, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.