Search | arXiv e-print repository

The mighty force: statistical inference and high-dimensional statistics

Authors: Erik Aurell, Jean Barbier, Aurelien Decelle, Roberto Mulet

Abstract: This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k… ▽ More This is a review to appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific. It showcases a selection of contributions from the spin glass community at large to high-dimensional statistics, by focusing on three important graph-based models and methodologies having deeply impacted the field: inference of graphs (a.k.a. direct coupling analysis), inference from graphs (the community detection problem), and the dynamic cavity method, which in particular allows for inference from graphs encoding causal relations. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: To appear as a contribution to the edited volume "Spin Glass Theory & Far Beyond - Replica Symmetry Breaking after 40 Years", World Scientific

arXiv:1311.2914 [pdf, other]

A novel local search based on variable-focusing for random K-SAT

Authors: Rémi Lemoy, Mikko Alava, Erik Aurell

Abstract: We introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic… ▽ More We introduce a new local search algorithm for satisfiability problems. Usual approaches focus uniformly on unsatisfied clauses. The new method works by picking uniformly random variables in unsatisfied clauses. A Variable-based Focused Metropolis Search (V-FMS) is then applied to random 3-SAT. We show that it is quite comparable in performance to the clause-based FMS. Consequences for algorithmic design are discussed. △ Less

Submitted 12 December, 2013; v1 submitted 9 October, 2013; originally announced November 2013.

Comments: 7 pages, 3 figures

arXiv:1303.2413 [pdf, other]

doi 10.1103/PhysRevE.87.052807

Witness of unsatisfiability for a random 3-satisfiability formula

Authors: Lu-Lu Wu, Hai-Jun Zhou, Mikko Alava, Erik Aurell, Pekka Orponen

Abstract: The random 3-satisfiability (3-SAT) problem is in the unsatisfiable (UNSAT) phase when the clause density $α$ exceeds a critical value $α_s \approx 4.267$. However, rigorously proving the unsatisfiability of a given large 3-SAT instance is extremely difficult. In this paper we apply the mean-field theory of statistical physics to the unsatisfiability problem, and show that a specific type of UNSAT… ▽ More The random 3-satisfiability (3-SAT) problem is in the unsatisfiable (UNSAT) phase when the clause density $α$ exceeds a critical value $α_s \approx 4.267$. However, rigorously proving the unsatisfiability of a given large 3-SAT instance is extremely difficult. In this paper we apply the mean-field theory of statistical physics to the unsatisfiability problem, and show that a specific type of UNSAT witnesses (Feige-Kim-Ofek witnesses) can in principle be constructed when the clause density $α> 19$. We then construct Feige-Kim-Ofek witnesses for single 3-SAT instances through a simple random sampling algorithm and a focused local search algorithm. The random sampling algorithm works only when $α$ scales at least linearly with the variable number $N$, but the focused local search algorithm works for clause densty $α> c N^{b}$ with $b \approx 0.59$ and prefactor $c \approx 8$. The exponent $b$ can be further decreased by enlarging the single parameter $S$ of the focused local search algorithm. △ Less

Submitted 10 March, 2013; originally announced March 2013.

Comments: 9 pages, 7 figures included. Submitted to Physical Review E

Journal ref: Physical Review E 87, 052807 (2013)

arXiv:1204.4065 [pdf, other]

doi 10.1109/ITW.2012.6404757

Analysis of Sparse Representations Using Bi-Orthogonal Dictionaries

Authors: Mikko Vehkaperä, Yoshiyuki Kabashima, Saikat Chatterjee, Erik Aurell, Mikael Skoglund, Lars Rasmussen

Abstract: The sparse representation problem of recovering an N dimensional sparse vector x from M < N linear observations y = Dx given dictionary D is considered. The standard approach is to let the elements of the dictionary be independent and identically distributed (IID) zero-mean Gaussian and minimize the l1-norm of x under the constraint y = Dx. In this paper, the performance of l1-reconstruction is an… ▽ More The sparse representation problem of recovering an N dimensional sparse vector x from M < N linear observations y = Dx given dictionary D is considered. The standard approach is to let the elements of the dictionary be independent and identically distributed (IID) zero-mean Gaussian and minimize the l1-norm of x under the constraint y = Dx. In this paper, the performance of l1-reconstruction is analyzed, when the dictionary is bi-orthogonal D = [O1 O2], where O1,O2 are independent and drawn uniformly according to the Haar measure on the group of orthogonal M x M matrices. By an application of the replica method, we obtain the critical conditions under which perfect l1-recovery is possible with bi-orthogonal dictionaries. △ Less

Submitted 10 July, 2012; v1 submitted 18 April, 2012; originally announced April 2012.

Comments: 5 pages, 2 figures. The main result and numerical examples have been revised

arXiv:1004.4559 [pdf, ps, other]

The Accuracy of Tree-based Counting in Dynamic Networks

Authors: Supriya Krishnamurthy, John Ardelius, Erik Aurell, Mads Dam, Rolf Stadler, Fetahi Wuhib

Abstract: Tree-based protocols are ubiquitous in distributed systems. They are flexible, they perform generally well, and, in static conditions, their analysis is mostly simple. Under churn, however, node joins and failures can have complex global effects on the tree overlays, making analysis surprisingly subtle. To our knowledge, few prior analytic results for performance estimation of tree based protocols… ▽ More Tree-based protocols are ubiquitous in distributed systems. They are flexible, they perform generally well, and, in static conditions, their analysis is mostly simple. Under churn, however, node joins and failures can have complex global effects on the tree overlays, making analysis surprisingly subtle. To our knowledge, few prior analytic results for performance estimation of tree based protocols under churn are currently known. We study a simple Bellman-Ford-like protocol which performs network size estimation over a tree-shaped overlay. A continuous time Markov model is constructed which allows key protocol characteristics to be estimated, including the expected number of nodes at a given (perceived) distance to the root and, for each such node, the expected (perceived) size of the subnetwork rooted at that node. We validate the model by simulation, using a range of network sizes, node degrees, and churn-to-protocol rates, with convincing results. △ Less

Submitted 26 April, 2010; originally announced April 2010.

Comments: 15 pages, 3 figures

Report number: KTH Technical Report TRITA-EE 2010:011

arXiv:1004.2425 [pdf, ps, other]

Bounds on Thresholds Related to Maximum Satisfiability of Regular Random Formulas

Authors: Vishwambhar Rathi, Erik Aurell, Lars Rasmussen, Mikael Skoglund

Abstract: We consider the regular balanced model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad, Dubois, Interian, and Selman. We say that a formula is $p$-satisfying if there is a truth assignment satisfying $1-2^{-k}+p 2^{-k}$ fraction of clauses. Using the first moment method we determine upper bound on the threshold clause density such that there are no $p$-satisfying assi… ▽ More We consider the regular balanced model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad, Dubois, Interian, and Selman. We say that a formula is $p$-satisfying if there is a truth assignment satisfying $1-2^{-k}+p 2^{-k}$ fraction of clauses. Using the first moment method we determine upper bound on the threshold clause density such that there are no $p$-satisfying assignments with high probability above this upper bound. There are two aspects in deriving the lower bound using the second moment method. The first aspect is, given any $p \in (0,1)$ and $k$, evaluate the lower bound on the threshold. This evaluation is numerical in nature. The second aspect is to derive the lower bound as a function of $p$ for large enough $k$. We address the first aspect and evaluate the lower bound on the $p$-satisfying threshold using the second moment method. We observe that as $k$ increases the lower bound seems to converge to the asymptotically derived lower bound for uniform model of formula generation by Achlioptas, Naor, and Peres. △ Less

Submitted 14 April, 2010; originally announced April 2010.

Comments: 6th International symposium on turbo codes & iterative information processing, 2010

arXiv:1002.1290 [pdf, ps, other]

Bounds on Threshold of Regular Random $k$-SAT

Authors: Vishwambhar Rathi, Erik Aurell, Lars Rasmussen, Mikael Skoglund

Abstract: We consider the regular model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad et. al. We derive an upper bound on the satisfiability threshold and NAE-satisfiability threshold for regular random $k$-SAT for any $k \geq 3$. We show that these bounds matches with the corresponding bound for the uniform model of formula generation. We derive lower bound on the thresh… ▽ More We consider the regular model of formula generation in conjunctive normal form (CNF) introduced by Boufkhad et. al. We derive an upper bound on the satisfiability threshold and NAE-satisfiability threshold for regular random $k$-SAT for any $k \geq 3$. We show that these bounds matches with the corresponding bound for the uniform model of formula generation. We derive lower bound on the threshold by applying the second moment method to the number of satisfying assignments. For large $k$, we note that the obtained lower bounds on the threshold of a regular random formula converges to the lower bound obtained for the uniform model. Thus, we answer the question posed in \cite{AcM06} regarding the performance of the second moment method for regular random formulas. △ Less

Submitted 23 April, 2010; v1 submitted 5 February, 2010; originally announced February 2010.

Comments: Accepted to SAT 2010

arXiv:0905.0266 [pdf, ps, other]

doi 10.1209/0295-5075/87/68004

Gaussian Belief with dynamic data and in dynamic network

Authors: Erik Aurell, René Pfitzner

Abstract: In this paper we analyse Belief Propagation over a Gaussian model in a dynamic environment. Recently, this has been proposed as a method to average local measurement values by a distributed protocol ("Consensus Propagation", Moallemi & Van Roy, 2006), where the average is available for read-out at every single node. In the case that the underlying network is constant but the values to be average… ▽ More In this paper we analyse Belief Propagation over a Gaussian model in a dynamic environment. Recently, this has been proposed as a method to average local measurement values by a distributed protocol ("Consensus Propagation", Moallemi & Van Roy, 2006), where the average is available for read-out at every single node. In the case that the underlying network is constant but the values to be averaged fluctuate ("dynamic data"), convergence and accuracy are determined by the spectral properties of an associated Ruelle-Perron-Frobenius operator. For Gaussian models on Erdos-Renyi graphs, numerical computation points to a spectral gap remaining in the large-size limit, implying exceptionally good scalability. In a model where the underlying network also fluctuates ("dynamic network"), averaging is more effective than in the dynamic data case. Altogether, this implies very good performance of these methods in very large systems, and opens a new field of statistical physics of large (and dynamic) information systems. △ Less

Submitted 3 May, 2009; originally announced May 2009.

Comments: 5 pages, 7 figures

Journal ref: EPL (Europhysics Letters) 87, 68004, 2009

arXiv:0711.4902 [pdf, ps, other]

doi 10.1073/pnas.0712263105

Circumspect descent prevails in solving random constraint satisfaction problems

Authors: Mikko Alava, John Ardelius, Erik Aurell, Petteri Kaski, Supriya Krishnamurthy, Pekka Orponen, Sakari Seitz

Abstract: We study the performance of stochastic local search algorithms for random instances of the $K$-satisfiability ($K$-SAT) problem. We introduce a new stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a problem instance by {\em never going upwards} in energy. ChainSAT is a \emph{focused} algorithm in the sense that it considers only variables occurring in unsatisfi… ▽ More We study the performance of stochastic local search algorithms for random instances of the $K$-satisfiability ($K$-SAT) problem. We introduce a new stochastic local search algorithm, ChainSAT, which moves in the energy landscape of a problem instance by {\em never going upwards} in energy. ChainSAT is a \emph{focused} algorithm in the sense that it considers only variables occurring in unsatisfied clauses. We show by extensive numerical investigations that ChainSAT and other focused algorithms solve large $K$-SAT instances almost surely in linear time, up to high clause-to-variable ratios $α$; for example, for K=4 we observe linear-time performance well beyond the recently postulated clustering and condensation transitions in the solution space. The performance of ChainSAT is a surprise given that by design the algorithm gets trapped into the first local energy minimum it encounters, yet no such minima are encountered. We also study the geometry of the solution space as accessed by stochastic local search algorithms. △ Less

Submitted 30 November, 2007; originally announced November 2007.

Comments: 6 figures, about 17 pates

arXiv:0710.0386 [pdf, ps, other]

Comparing Maintenance Strategies for Overlays

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this un… ▽ More In this paper, we present an analytical tool for understanding the performance of structured overlay networks under churn based on the master-equation approach of physics. We motivate and derive an equation for the average number of hops taken by lookups during churn, for the Chord network. We analyse this equation in detail to understand the behaviour with and without churn. We then use this understanding to predict how lookups will scale for varying peer population as well as varying the sizes of the routing tables. We then consider a change in the maintenance algorithm of the overlay, from periodic stabilisation to a reactive one which corrects fingers only when a change is detected. We generalise our earlier analysis to underdstand how the reactive strategy compares with the periodic one. △ Less

Submitted 1 October, 2007; originally announced October 2007.

Comments: 10 pages, 8 figures

Report number: Tech. Report TR-2007-01, Swedish Institute of Computer Science

arXiv:0710.0270 [pdf, ps, other]

doi 10.1109/TNET.2007.905590

An Analytical Study of a Structured Overlay in the presence of Dynamic Membership

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as… ▽ More In this paper we present an analytical study of dynamic membership (aka churn) in structured peer-to-peer networks. We use a fluid model approach to describe steady-state or transient phenomena, and apply it to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately account for the functional form of the probability of network disconnection as well as the fraction of failed or incorrect successor and finger pointers. We show how we can use these quantities to predict both the performance and consistency of lookups under churn. All theoretical predictions match simulation results. The analysis includes both features that are generic to structured overlays deploying a ring as well as Chord-specific details, and opens the door to a systematic comparative analysis of, at least, ring-based structured overlay systems under churn. △ Less

Submitted 1 October, 2007; originally announced October 2007.

Comments: 12 pages, 14 figures, to appear in IEEE/ACM Transactions on Networking

arXiv:cs/0501069 [pdf, ps, other]

A Statistical Theory of Chord under Churn

Authors: Supriya Krishnamurthy, Sameh El-Ansary, Erik Aurell, Seif Haridi

Abstract: Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena.… ▽ More Most earlier studies of Distributed Hash Tables (DHTs) under churn have either depended on simulations as the primary investigation tool, or on establishing bounds for DHTs to function. In this paper, we present a complete analytical study of churn using a master-equation-based approach, used traditionally in non-equilibrium statistical mechanics to describe steady-state or transient phenomena. Simulations are used to verify all theoretical predictions. We demonstrate the application of our methodology to the Chord system. For any rate of churn and stabilization rates, and any system size, we accurately predict the fraction of failed or incorrect successor and finger pointers and show how we can use these quantities to predict the performance and consistency of lookups under churn. We also discuss briefly how churn may actually be of different 'types' and the implications this will have for the functioning of DHTs in general. △ Less

Submitted 24 January, 2005; originally announced January 2005.

Comments: 6 pages, In the 4th International Workshop on Peer-to- Peer Systems (IPTPS'05), Ithaca, New York, USA, 2005

ACM Class: I.6; G.3; E.1

arXiv:cs/0102011 [pdf, ps, other]

A Price Dynamics in Bandwidth Markets for Point-to-point Connections

Authors: Lars Rasmusson, Erik Aurell

Abstract: We simulate a network of N routers and M network users making concurrent point-to-point connections by buying and selling router capacity from each other. The resources need to be acquired in complete sets, but there is only one spot market for each router. In order to describe the internal dynamics of the market, we model the observed prices by N-dimensional Ito-processes. Modeling using stocha… ▽ More We simulate a network of N routers and M network users making concurrent point-to-point connections by buying and selling router capacity from each other. The resources need to be acquired in complete sets, but there is only one spot market for each router. In order to describe the internal dynamics of the market, we model the observed prices by N-dimensional Ito-processes. Modeling using stochastic processes is novel in this context of describing interactions between end-users in a system with shared resources, and allows a standard set of mathematical tools to be applied. The derived models can also be used to price contingent claims on network capacity and thus to price complex network services such as quality of service levels, multicast, etc. △ Less

Submitted 15 February, 2001; originally announced February 2001.

Comments: 18 pages, 10 postscript figures

ACM Class: C.2.3; C.4

Showing 1–13 of 13 results for author: Aurell, E