-
The mighty force: statistical inference and high-dimensional statistics
Authors:
Erik Aurell,
Jean Barbier,
Aurelien Decelle,
Roberto Mulet
Abstract:
This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k…
▽ More
This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k.a. direct coupling analysis), inference from graphs (the community detection problem), and the dynamic cavity method, which in particular allows for inference from graphs encoding causal relations.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
A novel local search based on variable-focusing for random K-SAT
Authors:
Rémi Lemoy,
Mikko Alava,
Erik Aurell
Abstract:
We introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic…
▽ More
We introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic design are discussed.
△ Less
Submitted 12 December, 2013; v1 submitted 9 October, 2013;
originally announced November 2013.
-
Witness of unsatisfiability for a random 3-satisfiability formula
Authors:
Lu-Lu Wu,
Hai-Jun Zhou,
Mikko Alava,
Erik Aurell,
Pekka Orponen
Abstract:
The random 3-satisfiability (3-SAT) problem is in the unsatisfiable (UNSAT) phase when the clause density $α$ exceeds a critical value $α_s \approx 4.267$. However, rigorously proving the unsatisfiability of a given large 3-SAT instance is extremely difficult. In this paper we apply the mean-field theory of statistical physics to the unsatisfiability problem, and show that a specific type of UNSAT…
▽ More
The random 3-satisfiability (3-SAT) problem is in the unsatisfiable (UNSAT) phase when the clause density $α$ exceeds a critical value $α_s \approx 4.267$. However, rigorously proving the unsatisfiability of a given large 3-SAT instance is extremely difficult. In this paper we apply the mean-field theory of statistical physics to the unsatisfiability problem, and show that a specific type of UNSAT witnesses (Feige-Kim-Ofek witnesses) can in principle be constructed when the clause density $α> 19$. We then construct Feige-Kim-Ofek witnesses for single 3-SAT instances through a simple random sampling algorithm and a focused local search algorithm. The random sampling algorithm works only when $α$ scales at least linearly with the variable number $N$, but the focused local search algorithm works for clause densty $α> c N^{b}$ with $b \approx 0.59$ and prefactor $c \approx 8$. The exponent $b$ can be further decreased by enlarging the single parameter $S$ of the focused local search algorithm.
△ Less
Submitted 10 March, 2013;
originally announced March 2013.
-
Analysis of Sparse Representations Using Bi-Orthogonal Dictionaries
Authors:
Mikko Vehkaperä,
Yoshiyuki Kabashima,
Saikat Chatterjee,
Erik Aurell,
Mikael Skoglund,
Lars Rasmussen
Abstract:
The sparse representation problem of recovering an N dimensional sparse vector x from M < N linear observations y = Dx given dictionary D is considered. The standard approach is to let the elements of the dictionary be independent and identically distributed (IID) zero-mean Gaussian and minimize the l1-norm of x under the constraint y = Dx. In this paper, the performance of l1-reconstruction is an…
▽ More
The sparse representation problem of recovering an N dimensional sparse vector x from M < N linear observations y = Dx given dictionary D is considered. The standard approach is to let the elements of the dictionary be independent and identically distributed (IID) zero-mean Gaussian and minimize the l1-norm of x under the constraint y = Dx. In this paper, the performance of l1-reconstruction is analyzed, when the dictionary is bi-orthogonal D = [O1 O2], where O1,O2 are independent and drawn uniformly according to the Haar measure on the group of orthogonal M x M matrices. By an application of the replica method, we obtain the critical conditions under which perfect l1-recovery is possible with bi-orthogonal dictionaries.
△ Less
Submitted 10 July, 2012; v1 submitted 18 April, 2012;
originally announced April 2012.
-
The Accuracy of Tree-based Counting in Dynamic Networks
Authors:
Supriya Krishnamurthy,
John Ardelius,
Erik Aurell,
Mads Dam,
Rolf Stadler,
Fetahi Wuhib
Abstract:
Tree-based protocols are ubiquitous in distributed systems. They are flexible, they perform generally well, and, in static conditions, their analysis is mostly simple. Under churn, however, node joins and failures can have complex global effects on the tree overlays, making analysis surprisingly subtle. To our knowledge, few prior analytic results for performance estimation of tree based protocols…
▽ More
Tree-based protocols are ubiquitous in distributed systems. They are flexible, they perform generally well, and, in static conditions, their analysis is mostly simple. Under churn, however, node joins and failures can have complex global effects on the tree overlays, making analysis surprisingly subtle. To our knowledge, few prior analytic results for performance estimation of tree based protocols under churn are currently known. We study a simple Bellman-Ford-like protocol which performs network size estimation over a tree-shaped overlay. A continuous time Markov model is constructed which allows key protocol characteristics to be estimated, including the expected number of nodes at a given (perceived) distance to the root and, for each such node, the expected (perceived) size of the subnetwork rooted at that node. We validate the model by simulation, using a range of network sizes, node degrees, and churn-to-protocol rates, with convincing results.
△ Less
Submitted 26 April, 2010;
originally announced April 2010.
-
Bounds on Thresholds Related to Maximum Satisfiability of Regular Random Formulas
Authors:
Vishwambhar Rathi,
Erik Aurell,
Lars Rasmussen,
Mikael Skoglund
Abstract:
We consider the regular balanced model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad, Dubois, Interian, and Selman. We say that a formula is $p$-satisfying if there is a truth assignment satisfying $1-2^{-k}+p 2^{-k}$ fraction of clauses. Using the first moment method we determine upper bound on the threshold clause density such that there are no $p$-satisfying assi…
▽ More
We consider the regular balanced model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad, Dubois, Interian, and Selman. We say that a formula is $p$-satisfying if there is a truth assignment satisfying $1-2^{-k}+p 2^{-k}$ fraction of clauses. Using the first moment method we determine upper bound on the threshold clause density such that there are no $p$-satisfying assignments with high probability above this upper bound. There are two aspects in deriving the lower bound using the second moment method. The first aspect is, given any $p \in (0,1)$ and $k$, evaluate the lower bound on the threshold. This evaluation is numerical in nature. The second aspect is to derive the lower bound as a function of $p$ for large enough $k$. We address the first aspect and evaluate the lower bound on the $p$-satisfying threshold using the second moment method. We observe that as $k$ increases the lower bound seems to converge to the asymptotically derived lower bound for uniform model of formula generation by Achlioptas, Naor, and Peres.
△ Less
Submitted 14 April, 2010;
originally announced April 2010.
-
Bounds on Threshold of Regular Random $k$-SAT
Authors:
Vishwambhar Rathi,
Erik Aurell,
Lars Rasmussen,
Mikael Skoglund
Abstract:
We consider the regular model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad et. al. We derive an upper bound on the satisfiability threshold and NAE-satisfiability threshold for regular random $k$-SAT for any $k \geq 3$. We show that these bounds matches with the corresponding bound for the uniform model of formula generation.
We derive lower bound on the thresh…
▽ More
We consider the regular model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad et. al. We derive an upper bound on the satisfiability threshold and NAE-satisfiability threshold for regular random $k$-SAT for any $k \geq 3$. We show that these bounds matches with the corresponding bound for the uniform model of formula generation.
We derive lower bound on the threshold by applying the second moment method to the number of satisfying assignments. For large $k$, we note that the obtained lower bounds on the threshold of a regular random formula converges to the lower bound obtained for the uniform model. Thus, we answer the question posed in \cite{AcM06} regarding the performance of the second moment method for regular random formulas.
△ Less
Submitted 23 April, 2010; v1 submitted 5 February, 2010;
originally announced February 2010.
-
Gaussian Belief with dynamic data and in dynamic network
Authors:
Erik Aurell,
René Pfitzner
Abstract:
In this paper we analyse Belief Propagation over a Gaussian model in a dynamic environment. Recently, this has been proposed as a method to average local measurement values by a distributed protocol ("Consensus Propagation", Moallemi & Van Roy, 2006), where the average is available for read-out at every single node. In the case that the underlying network is constant but the values to be average…
▽ More
In this paper we analyse Belief Propagation over a Gaussian model in a dynamic environment. Recently, this has been proposed as a method to average local measurement values by a distributed protocol ("Consensus Propagation", Moallemi & Van Roy, 2006), where the average is available for read-out at every single node. In the case that the underlying network is constant but the values to be averaged fluctuate ("dynamic data"), convergence and accuracy are determined by the spectral properties of an associated Ruelle-Perron-Frobenius operator. For Gaussian models on Erdos-Renyi graphs, numerical computation points to a spectral gap remaining in the large-size limit, implying exceptionally good scalability. In a model where the underlying network also fluctuates ("dynamic network"), averaging is more effective than in the dynamic data case. Altogether, this implies very good performance of these methods in very large systems, and opens a new field of statistical physics of large (and dynamic) information systems.
△ Less
Submitted 3 May, 2009;
originally announced May 2009.
-
Circumspect descent prevails in solving random constraint satisfaction problems
Authors:
Mikko Alava,
John Ardelius,
Erik Aurell,
Petteri Kaski,
Supriya Krishnamurthy,
Pekka Orponen,
Sakari Seitz
Abstract:
We study the performance of stochastic local search algorithms for random instances of the $K$-satisfiability ($K$-SAT) problem. We introduce a new stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a problem instance by {\em never going upwards} in energy. ChainSAT is a \emph{focused} algorithm in the sense that it considers only variables occurring in unsatisfi…
▽ More
We study the performance of stochastic local search algorithms for random instances of the $K$-satisfiability ($K$-SAT) problem. We introduce a new stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a problem instance by {\em never going upwards} in energy. ChainSAT is a \emph{focused} algorithm in the sense that it considers only variables occurring in unsatisfied clauses. We show by extensive numerical investigations that ChainSAT and other focused algorithms solve large $K$-SAT instances almost surely in linear time, up to high clause-to-variable ratios $α$; for example, for K=4 we observe linear-time performance well beyond the recently postulated clustering and condensation transitions in the solution space. The performance of ChainSAT is a surprise given that by design the algorithm gets trapped into the first local energy minimum it encounters, yet no such minima are encountered. We also study the geometry of the solution space as accessed by stochastic local search algorithms.
△ Less
Submitted 30 November, 2007;
originally announced November 2007.
-
Comparing Maintenance Strategies for Overlays
Authors:
Supriya Krishnamurthy,
Sameh El-Ansary,
Erik Aurell,
Seif Haridi
Abstract:
In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this un…
▽ More
In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this understanding to predict how lookups will scale for varying peer population as well as varying the sizes of the routing tables. We then consider a change in the maintenance algorithm of the overlay, from periodic stabilisation to a reactive one which corrects fingers only when a change is detected. We generalise our earlier analysis to underdstand how the reactive strategy compares with the periodic one.
△ Less
Submitted 1 October, 2007;
originally announced October 2007.
-
An Analytical Study of a Structured Overlay in the presence of Dynamic Membership
Authors:
Supriya Krishnamurthy,
Sameh El-Ansary,
Erik Aurell,
Seif Haridi
Abstract:
In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as…
▽ More
In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as the fraction of failed or incorrect successor and finger pointers. We show how we can use these quantities to predict both the performance and consistency of lookups under churn. All theoretical predictions match simulation results. The analysis includes both features that are generic to structured overlays deploying a ring as well as Chord-specific details, and opens the door to a systematic comparative analysis of, at least, ring-based structured overlay systems under churn.
△ Less
Submitted 1 October, 2007;
originally announced October 2007.
-
A Statistical Theory of Chord under Churn
Authors:
Supriya Krishnamurthy,
Sameh El-Ansary,
Erik Aurell,
Seif Haridi
Abstract:
Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena.…
▽ More
Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. Simulations are used to verify all theoretical predictions. We demonstrate the application of our methodology to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately predict the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict the performance and consistency of lookups under churn. We also discuss briefly how churn may actually be of different 'types' and the implications this will have for the functioning of DHTs in general.
△ Less
Submitted 24 January, 2005;
originally announced January 2005.
-
A Price Dynamics in Bandwidth Markets for Point-to-point Connections
Authors:
Lars Rasmusson,
Erik Aurell
Abstract:
We simulate a network of N routers and M network users making concurrent point-to-point connections by buying and selling router capacity from each other. The resources need to be acquired in complete sets, but there is only one spot market for each router. In order to describe the internal dynamics of the market, we model the observed prices by N-dimensional Ito-processes. Modeling using stocha…
▽ More
We simulate a network of N routers and M network users making concurrent point-to-point connections by buying and selling router capacity from each other. The resources need to be acquired in complete sets, but there is only one spot market for each router. In order to describe the internal dynamics of the market, we model the observed prices by N-dimensional Ito-processes. Modeling using stochastic processes is novel in this context of describing interactions between end-users in a system with shared resources, and allows a standard set of mathematical tools to be applied. The derived models can also be used to price contingent claims on network capacity and thus to price complex network services such as quality of service levels, multicast, etc.
△ Less
Submitted 15 February, 2001;
originally announced February 2001.