Search | arXiv e-print repository

Local weak convergence and its applications

Authors: Sayan Banerjee, Shankar Bhamidi, Jianan Shen, Seth Parker Young

Abstract: Motivated in part by understanding average case analysis of fundamental algorithms in computer science, and in part by the wide array of network data available over the last decade, a variety of random graph models, with corresponding processes on these objects, have been proposed over the last few years. The main goal of this paper is to give an overview of local weak convergence, which has emerg… ▽ More Motivated in part by understanding average case analysis of fundamental algorithms in computer science, and in part by the wide array of network data available over the last decade, a variety of random graph models, with corresponding processes on these objects, have been proposed over the last few years. The main goal of this paper is to give an overview of local weak convergence, which has emerged as a major technique for understanding large network asymptotics for a wide array of functionals and models. As opposed to a survey, the main goal is to try to explain some of the major concepts and their use to junior researchers in the field and indicate potential resources for further reading. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: 33 pages. Submitted to a special issue in honor of K.R. Parthasarathy

arXiv:2309.00578 [pdf, other]

Consistency of Lloyd's Algorithm Under Perturbations

Authors: Dhruv Patel, Hui Shen, Shankar Bhamidi, Yufeng Liu, Vladas Pipiras

Abstract: In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on $n$ independent samples from a sub-Gaussian mixture is… ▽ More In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on $n$ independent samples from a sub-Gaussian mixture is exponentially bounded after $O(\log(n))$ iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after $O(\log(n))$ iterations under the assumptions of proper initialization and that the perturbation is small relative to the sub-Gaussian noise. In canonical settings with ground truth clusters, we derive bounds for algorithms such as $k$-means$++$ to find good initializations and thus leading to the correctness of clustering via the main result. We show the implications of the results for pipelines measuring the statistical significance of derived clusters from data such as SigClust. We use these general results to derive implications in providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: Preprint version 1

MSC Class: 62E20; 60C05

arXiv:2307.09974 [pdf, ps, other]

Dynamic factor and VARMA models: equivalent representations, dimension reduction and nonlinear matrix equations

Authors: Shankar Bhamidi, Dhruv Patel, Vladas Pipiras

Abstract: A dynamic factor model with factor series following a VAR$(p)$ model is shown to have a VARMA$(p,p)$ model representation. Reduced-rank structures are identified for the VAR and VMA components of the resulting VARMA model. It is also shown how the VMA component parameters can be computed numerically from the original model parameters via the innovations algorithm, and connections of this approach… ▽ More A dynamic factor model with factor series following a VAR$(p)$ model is shown to have a VARMA$(p,p)$ model representation. Reduced-rank structures are identified for the VAR and VMA components of the resulting VARMA model. It is also shown how the VMA component parameters can be computed numerically from the original model parameters via the innovations algorithm, and connections of this approach to non-linear matrix equations are made. Some VAR models related to the resulting VARMA model are also discussed. △ Less

Submitted 19 July, 2023; originally announced July 2023.

MSC Class: Primary: 62M10. Secondary: 15A24; 65F45

arXiv:2307.09970 [pdf, other]

Correlation networks, dynamic factor models and community detection

Authors: Shankar Bhamidi, Dhruv Patel, Vladas Pipiras, Guorong Wu

Abstract: A dynamic factor model with a mixture distribution of the loadings is introduced and studied for multivariate, possibly high-dimensional time series. The correlation matrix of the model exhibits a block structure, reminiscent of correlation patterns for many real multivariate time series. A standard $k$-means algorithm on the loadings estimated through principal components is used to cluster compo… ▽ More A dynamic factor model with a mixture distribution of the loadings is introduced and studied for multivariate, possibly high-dimensional time series. The correlation matrix of the model exhibits a block structure, reminiscent of correlation patterns for many real multivariate time series. A standard $k$-means algorithm on the loadings estimated through principal components is used to cluster component time series into communities with accompanying bounds on the misclustering rate. This is one standard method of community detection applied to correlation matrices viewed as weighted networks. This work puts a mixture model, a dynamic factor model and network community detection in one interconnected framework. Performance of the proposed methodology is illustrated on simulated and real data. △ Less

Submitted 19 July, 2023; originally announced July 2023.

MSC Class: Primary: 62M10; 62H30; 05C22. Secondary: 62H20

arXiv:2304.08565 [pdf, ps, other]

Attribute network models, stochastic approximation, and network sampling and ranking algorithms

Authors: Nelson Antunes, Sayan Banerjee, Shankar Bhamidi, Vladas Pipiras

Abstract: We analyze dynamic random network models where younger vertices connect to older ones with probabilities proportional to their degrees as well as a propensity kernel governed by their attribute types. Using stochastic approximation techniques we show that, in the large network limit, such networks converge in the local weak sense to randomly stopped multitype branching processes whose explicit des… ▽ More We analyze dynamic random network models where younger vertices connect to older ones with probabilities proportional to their degrees as well as a propensity kernel governed by their attribute types. Using stochastic approximation techniques we show that, in the large network limit, such networks converge in the local weak sense to randomly stopped multitype branching processes whose explicit description allows for the derivation of asymptotics for a wide class of network functionals. These asymptotics imply that while degree distribution tail exponents depend on the attribute type (already derived by Jordan (2013)), Page-rank centrality scores have the \emph{same} tail exponent across attributes. Moreover, the mean behavior of the limiting Page-rank score distribution can be explicitly described and shown to depend on the attribute type. The limit results also give explicit formulae for the performance of various network sampling mechanisms. One surprising consequence is the efficacy of Page-rank and walk based network sampling schemes for directed networks in the setting of rare minorities. The results also allow one to evaluate the impact of various proposed mechanisms to increase degree centrality of minority attributes in the network, and to quantify the bias in inferring about the network from an observed sample. Further, we formalize the notion of resolvability of such models where, owing to propagation of chaos type phenomenon in the evolution dynamics for such models, one can set up a correspondence to models driven by continuous time branching process dynamics. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 48 pages

MSC Class: Primary: 60C05; 05C80

arXiv:2303.10082 [pdf, other]

Scaling limits and universality: Critical percolation on weighted graphs converging to an $L^3$ graphon

Authors: Jnaneshwar Baslingker, Shankar Bhamidi, Nicolas Broutin, Sanchayan Sen, Xuan Wang

Abstract: We develop a general universality technique for establishing metric scaling limits of critical random discrete structures exhibiting mean-field behavior that requires four ingredients: (i) from the barely subcritical regime to the critical window, components merge approximately like the multiplicative coalescent, (ii) asymptotics of the susceptibility functions are the same as that of the Erdos-Re… ▽ More We develop a general universality technique for establishing metric scaling limits of critical random discrete structures exhibiting mean-field behavior that requires four ingredients: (i) from the barely subcritical regime to the critical window, components merge approximately like the multiplicative coalescent, (ii) asymptotics of the susceptibility functions are the same as that of the Erdos-Renyi random graph, (iii) asymptotic negligibility of the maximal component size and the diameter in the barely subcritical regime, and (iv) macroscopic averaging of distances between vertices in the barely subcritical regime. As an application of the general universality theorem, we establish, under some regularity conditions, the critical percolation scaling limit of graphs that converge, in a suitable topology, to an $L^3$ graphon. In particular, we define a notion of the critical window in this setting. The $L^3$ assumption ensures that the model is in the Erdos-Renyi universality class and that the scaling limit is Brownian. Our results do not assume any specific functional form for the graphon. As a consequence of our results on graphons, we obtain the metric scaling limit for Aldous-Pittel's RGIV model [9] inside the critical window. Our universality principle has applications in a number of other problems including in the study of noise sensitivity of critical random graphs [52]. In [10], we use our universality theorem to establish the metric scaling limit of critical bounded size rules. Our method should yield the critical metric scaling limit of Rucinski and Wormald's random graph process with degree restrictions [56] provided an additional technical condition about the barely subcritical behavior of this model can be proved. △ Less

Submitted 20 June, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: 67 pages, 1 figure, to appear in Transactions of the American Mathematical Society. The universality principle (Theorem 3.4) from arXiv:1411.3417 has now been included in this paper

arXiv:2203.11877 [pdf, ps, other]

Co-evolving dynamic networks

Authors: Sayan Banerjee, Shankar Bhamidi, Xiangying Huang

Abstract: We propose a general class of co-evolving tree network models driven by local exploration where new vertices attach to the current network via randomly sampling a vertex and then exploring the graph for a random number of steps in the direction of the root, connecting to the terminal vertex. Specific choices of the exploration step distribution lead to the well-studied affine preferential attachme… ▽ More We propose a general class of co-evolving tree network models driven by local exploration where new vertices attach to the current network via randomly sampling a vertex and then exploring the graph for a random number of steps in the direction of the root, connecting to the terminal vertex. Specific choices of the exploration step distribution lead to the well-studied affine preferential attachment and uniform attachment models, as well as less well understood dynamic network models with global attachment functionals such as PageRank scores [Chebolu-Melsted (2008)]. We obtain local weak limits for such networks and use them to derive asymptotics for the limiting empirical degree and PageRank distribution. We also quantify asymptotics for the degree and PageRank of fixed vertices, including the root, and the height of the network. Two distinct regimes are seen to emerge, based on the expected exploration distance of incoming vertices, which we call the `fringe' and `non-fringe' regimes. These regimes are shown to exhibit different qualitative and quantitative properties. In particular, networks in the non-fringe regime undergo `condensation' where the root degree grows at the same rate as the network size. Networks in the fringe regime do not exhibit condensation. Non-trivial phase transition phenomena are displayed for the height and the PageRank distribution, the latter connecting to the well known power-law hypothesis. In the process, we develop a general set of techniques involving local limits, infinite-dimensional urn models, related multitype branching processes and corresponding Perron-Frobenius theory, branching random walks, and in particular relating tail exponents of various functionals to the scaling exponents of quasi-stationary distributions of associated random walks. These techniques are expected to shed light on a variety of other co-evolving network models. △ Less

Submitted 1 March, 2024; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 60 pages. To appear in Prob. Th. Rel. Flds

MSC Class: 60K35; 05C80

arXiv:2111.05267 [pdf, other]

Community detection using low-dimensional network embedding algorithms

Authors: Aman Barot, Shankar Bhamidi, Souvik Dhara

Abstract: With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learnin… ▽ More With the increasing relevance of large networks in important areas such as the study of contact networks for spread of disease, or social networks for their impact on geopolitics, it has become necessary to study machine learning tools that are scalable to very large networks, often containing millions of nodes. One major class of such scalable algorithms is known as network representation learning or network embedding. These algorithms try to learn representations of network functionals (e.g.~nodes) by first running multiple random walks and then using the number of co-occurrences of each pair of nodes in observed random walk segments to obtain a low-dimensional representation of nodes on some Euclidean space. The aim of this paper is to rigorously understand the performance of two major algorithms, DeepWalk and node2vec, in recovering communities for canonical network models with ground truth communities. Depending on the sparsity of the graph, we find the length of the random walk segments required such that the corresponding observed co-occurrence window is able to perform almost exact recovery of the underlying community assignments. We prove that, given some fixed co-occurrence window, node2vec using random walks with a low non-backtracking probability can succeed for much sparser networks compared to DeepWalk using simple random walks. Moreover, if the sparsity parameter is low, we provide evidence that these algorithms might not succeed in almost exact recovery. The analysis requires develo** general tools for path counting on random networks having an underlying low-rank structure, which are of independent interest. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2107.04103 [pdf, other]

Multiscale genesis of a tiny giant for percolation on scale-free random graphs

Authors: Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad

Abstract: We study the critical behavior for percolation on inhomogeneous random networks on $n$ vertices, where the weights of the vertices follow a power-law distribution with exponent $τ\in (2,3)$. Such networks, often referred to as scale-free networks, exhibit critical behavior when the percolation probability tends to zero at an appropriate rate, as $n\to\infty$. We identify the critical window for a… ▽ More We study the critical behavior for percolation on inhomogeneous random networks on $n$ vertices, where the weights of the vertices follow a power-law distribution with exponent $τ\in (2,3)$. Such networks, often referred to as scale-free networks, exhibit critical behavior when the percolation probability tends to zero at an appropriate rate, as $n\to\infty$. We identify the critical window for a host of scale-free random graph models such as the Norros-Reittu model, Chung-Lu model and generalized random graphs. Surprisingly, there exists a finite time inside the critical window, after which, we see a sudden emergence of a tiny giant component. This is a novel behavior which is in contrast with the critical behavior in other known universality classes with $τ\in (3,4)$ and $τ>4$. Precisely, for edge-retention probabilities $π_n = λn^{-(3-τ)/2}$, there is an explicitly computable $λ_c>0$ such that the critical window is of the form $λ\in (0,λ_c),$ where the largest clusters have size of order $n^β$ with $β=(τ^2-4τ+5)/[2(τ-1)]\in[\sqrt{2}-1, \tfrac{1}{2})$ and have non-degenerate scaling limits, while in the supercritical regime $λ> λ_c$, a unique `tiny giant' component of size $\sqrt{n}$ emerges. For $λ\in (0,λ_c),$ the scaling limit of the maximum component sizes can be described in terms of components of a one-dimensional inhomogeneous percolation model on $\mathbb{Z}_+$ studied in a seminal work by Durrett and Kesten. For $λ>λ_c$, we prove that the sudden emergence of the tiny giant is caused by a phase transition inside a smaller core of vertices of weight $Ω(\sqrt{n})$. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: 46 pages, 1 figure

MSC Class: 60C05; 05C80

arXiv:2009.10696 [pdf, other]

Geometry of the minimal spanning tree in the heavy-tailed regime: new universality classes

Authors: Shankar Bhamidi, Sanchayan Sen

Abstract: A well-known open problem on the behavior of optimal paths in random graphs in the strong disorder regime, formulated by statistical physicists, and supported by a large amount of numerical evidence over the last decade [31,32,38,70] is as follows: for a large class of random graph models with degree exponent $τ\in (3,4)$, the distance between two typical points on the minimal spanning tree (MST)… ▽ More A well-known open problem on the behavior of optimal paths in random graphs in the strong disorder regime, formulated by statistical physicists, and supported by a large amount of numerical evidence over the last decade [31,32,38,70] is as follows: for a large class of random graph models with degree exponent $τ\in (3,4)$, the distance between two typical points on the minimal spanning tree (MST) on the giant component in the supercritical regime scales like $n^{(τ-3)/(τ-1)}$. The aim of this paper is to make progress towards a proof of this conjecture. We consider a supercritical inhomogeneous random graph model with degree exponent $τ\in(3, 4)$ that is closely related to Aldous's multiplicative coalescent, and show that the MST constructed by assigning i.i.d. continuous weights to the edges in its giant component, endowed with the tree distance scaled by $n^{-(τ-3)/(τ-1)}$, converges in distribution with respect to the Gromov-Hausdorff topology to a random compact real tree. Further, almost surely, every point in this limiting space either has degree one (leaf), or two, or infinity (hub), both the set of leaves and the set of hubs are dense in this space, and the Minkowski dimension of this space equals $(τ-1)/(τ-3)$. The multiplicative coalescent, in an asymptotic sense, describes the evolution of the component sizes of various near-critical random graph processes. We expect the limiting spaces in this paper to be the candidates for the scaling limit of the MST constructed for a wide array of other heavy-tailed random graph models. △ Less

Submitted 12 January, 2024; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: 62 pages, 3 figures, to appear in Probability Theory and Related Fields

arXiv:2006.15609 [pdf, ps, other]

Root finding algorithms and persistence of Jordan centrality in growing random trees

Authors: Sayan Banerjee, Shankar Bhamidi

Abstract: We consider models of growing random trees $\{\mathcal{T}_f(n):n\geq 1\}$ with model dynamics driven by an attachment function $f:\mathbb{Z}_+\to \mathbb{R}_+$. At each stage a new vertex enters the system and connects to a vertex $v$ in the current tree with probability proportional to $f(\text{degree}(v))$. The main goal of this study is to understand the performance of root finding algorithms.… ▽ More We consider models of growing random trees $\{\mathcal{T}_f(n):n\geq 1\}$ with model dynamics driven by an attachment function $f:\mathbb{Z}_+\to \mathbb{R}_+$. At each stage a new vertex enters the system and connects to a vertex $v$ in the current tree with probability proportional to $f(\text{degree}(v))$. The main goal of this study is to understand the performance of root finding algorithms. A large body of work (e.g. the work of Bubeck, Devroye and Lugosi or Jog and Loh) has emerged in the last few years in using techniques based on the Jordan centrality measure and its variants to develop root finding algorithms. Given an unlabelled unrooted tree, one computes the Jordan centrality for each vertex in the tree and for a fixed budget $K$ outputs the optimal $K$ vertices (as measured by Jordan centrality). Under general conditions on the attachment function $f$, we derive necessary and sufficient bounds on the budget $K(ε)$ in order to recover the root with probability at least $1-ε$. For canonical examples such as linear preferential attachment and uniform attachment, these general results give matching upper and lower bounds for the budget. We also prove persistence of the optimal $K$ Jordan centers for any $K$, i.e. the existence of an almost surely finite random time $n^*$ such that for $n \geq n^*$ the identity of the $K$-optimal Jordan centers in $\{\mathcal{T}_f(n):n\geq n^*\}$ does not change, thus describing robustness properties of this measure. Key technical ingredients in the proofs of independent interest include sufficient conditions for the existence of exponential moments for limits of (appropriately normalized) continuous time branching processes within which the models $\{\mathcal{T}_f(n):n\geq n^*\}$ can be embedded, as well as rates of convergence results to these limits. △ Less

Submitted 9 August, 2021; v1 submitted 28 June, 2020; originally announced June 2020.

Comments: Final accepted version

MSC Class: 60C05; 05C80

arXiv:2006.03621 [pdf, ps, other]

Near Equilibrium Fluctuations for Supermarket Models with Growing Choices

Authors: Shankar Bhamidi, Amarjit Budhiraja, Miheer Dewaskar

Abstract: We consider the supermarket model in the usual Markovian setting where jobs arrive at rate $n λ_n$ for some $λ_n > 0$, with $n$ parallel servers each processing jobs in its queue at rate 1. An arriving job joins the shortest among $d_n \le n$ randomly selected service queues. We show that when $d_n \to \infty$ and $λ_n \to λ\in (0, \infty)$, under natural conditions on the initial queues, the stat… ▽ More We consider the supermarket model in the usual Markovian setting where jobs arrive at rate $n λ_n$ for some $λ_n > 0$, with $n$ parallel servers each processing jobs in its queue at rate 1. An arriving job joins the shortest among $d_n \le n$ randomly selected service queues. We show that when $d_n \to \infty$ and $λ_n \to λ\in (0, \infty)$, under natural conditions on the initial queues, the state occupancy process converges in probability, in a suitable path space, to the unique solution of an infinite system of constrained ordinary differential equations parametrized by $λ$. Our main interest is in the study of fluctuations of the state process about its near equilibrium state in the critical regime, namely when $λ_n \to 1$. Previous papers have considered the regime $\frac{d_n}{\sqrt{n}\log n} \to \infty$ while the objective of the current work is to develop diffusion approximations for the state occupancy process that allow for all possible rates of growth of $d_n$. In particular we consider the three canonical regimes (a) ${d_n}/{\sqrt{n}} \to 0$; (b) ${d_n}/{\sqrt{n}} \to c\in (0,\infty)$ and, (c) ${d_n}/{\sqrt{n}} \to \infty$. In all three regimes we show, by establishing suitable functional limit theorems, that (under conditions on $λ_n$) fluctuations of the state process about its near equilibrium are of order $n^{-1/2}$ and are governed asymptotically by a one dimensional Brownian motion. The forms of the limit processes in the three regimes are quite different; in the first case we get a linear diffusion; in the second case we get a diffusion with an exponential drift; and in the third case we obtain a reflected diffusion in a half space. In the special case ${d_n}/({\sqrt{n}\log n}) \to \infty$ our work gives alternative proofs for the universality results established by Mukherjee et al in 2018. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: 45 pages with a 4 page Appendix

MSC Class: 60K25; 68Q87

arXiv:2005.02566 [pdf, ps, other]

Global lower mass-bound for critical configuration models in the heavy-tailed regime

Authors: Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad, Sanchayan Sen

Abstract: We establish the global lower mass-bound property for the largest connected components in the critical window for the configuration model when the degree distribution has an infinite third moment. The scaling limit of the critical percolation clusters, viewed as measured metric spaces, was established in [7] with respect to the Gromov-weak topology. Our result extends those scaling limit results t… ▽ More We establish the global lower mass-bound property for the largest connected components in the critical window for the configuration model when the degree distribution has an infinite third moment. The scaling limit of the critical percolation clusters, viewed as measured metric spaces, was established in [7] with respect to the Gromov-weak topology. Our result extends those scaling limit results to the stronger Gromov-Hausdorff-Prokhorov topology under slightly stronger assumptions on the degree distribution. This implies the distributional convergence of global functionals such as the diameter of the largest critical components. Further, our result gives a sufficient condition for compactness of the random metric spaces that arise as scaling limits of critical clusters in the heavy-tailed regime. △ Less

Submitted 14 July, 2022; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: 28 pages

arXiv:2004.13785 [pdf, ps, other]

Persistence of hubs in growing random networks

Authors: Sayan Banerjee, Shankar Bhamidi

Abstract: We consider models of evolving networks $\{\mathcal{G}_n:n\geq 0\}$ modulated by two parameters: an attachment function $f:\mathbb{N}_0\to\mathbb{R}_+$ and a (possibly random) attachment sequence $\{m_i:i\geq 1\}$. Starting with a single vertex, at each discrete step $i\geq 1$ a new vertex $v_i$ enters the system with $m_i\geq 1$ edges which it sequentially connects to a pre-existing vertex… ▽ More We consider models of evolving networks $\{\mathcal{G}_n:n\geq 0\}$ modulated by two parameters: an attachment function $f:\mathbb{N}_0\to\mathbb{R}_+$ and a (possibly random) attachment sequence $\{m_i:i\geq 1\}$. Starting with a single vertex, at each discrete step $i\geq 1$ a new vertex $v_i$ enters the system with $m_i\geq 1$ edges which it sequentially connects to a pre-existing vertex $v\in \mathcal{G}_{i-1}$ with probability proportional to $f(\operatorname{degree}(v))$. We consider the problem of emergence of persistent hubs: existence of a finite (a.s.) time $n^*$ such that for all $n\geq n^*$ the identity of the maximal degree vertex (or in general the $K$ largest degree vertices for $K\geq 1$) does not change. We obtain general conditions on $f$ and $\{m_i:i\geq 1\}$ under which a persistent hub emerges, and also those under which a persistent hub fails to emerge. In the case of lack of persistence, for the specific case of trees ($m_i\equiv 1$ for all $i$), we derive asymptotics for the maximal degree and the index of the maximal degree vertex (time at which the vertex with current maximal degree entered the system) to understand the movement of the maximal degree vertex as the network evolves. A key role in the analysis is played by an inverse rate weighted martingale constructed from a continuous time embedding of the discrete time model. Asymptotics for this martingale, including concentration inequalities and moderate deviations, play a major role in the analysis of the model. △ Less

Submitted 14 May, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: Minor revision, 49 pages, to appear in Prob. Theory and Rel. Fields

MSC Class: 60C05; 60J85; 60J28

arXiv:2004.02697 [pdf, other]

Community modulated recursive trees and population dependent branching processes

Authors: Shankar Bhamidi, Ruituo Fan, Nicolas Fraiman, Andrew Nobel

Abstract: We consider random recursive trees that are grown via community modulated schemes that involve random attachment or degree based attachment. The aim of this paper is to derive general techniques based on continuous time embedding to study such models. The associated continuous time embeddings are not branching processes: individual reproductive rates at each time depend on the composition of the e… ▽ More We consider random recursive trees that are grown via community modulated schemes that involve random attachment or degree based attachment. The aim of this paper is to derive general techniques based on continuous time embedding to study such models. The associated continuous time embeddings are not branching processes: individual reproductive rates at each time depend on the composition of the entire population at that time. Using stochastic analytic techniques we show that various key macroscopic statistics of the continuous time embedding stabilize, allowing asymptotics for a host of functionals of the original models to be derived. △ Less

Submitted 4 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

MSC Class: 60C05; 05C80; 60F05; 60K35; 60H30; 60J70

arXiv:1912.04714 [pdf, ps, other]

Rare event asymptotics for exploration processes for random graphs

Authors: Shankar Bhamidi, Amarjit Budhiraja, Paul Dupuis, Ruoyu Wu

Abstract: Much work in the study of large deviations for random graph models is focused on the dense regime where the theory of graphons has emerged as a principal tool. These tools do not give a good approach to large deviation problems for random graph models in the sparse regime. The aim of this paper is to study an approach for large deviation problems in this regime by establishing Large Deviation Prin… ▽ More Much work in the study of large deviations for random graph models is focused on the dense regime where the theory of graphons has emerged as a principal tool. These tools do not give a good approach to large deviation problems for random graph models in the sparse regime. The aim of this paper is to study an approach for large deviation problems in this regime by establishing Large Deviation Principles (LDP) on suitable path spaces for certain exploration processes of the associated random graph sequence. Our work focuses on the study of one particular class of random graph models, namely the configuration model; however the general approach of using exploration processes for studying large deviation properties of sparse random graph models has broader applicability. The goal is to study asymptotics of probabilities of non-typical behavior in the large network limit. The first key step for this is to establish a LDP for an exploration process associated with the configuration model. A suitable exploration process here turns out to be an infinite dimensional Markov process with transition probability rates that diminish to zero in certain parts of the state space. Large deviation properties of such Markovian models is challenging due to poor regularity behavior of the associated local rate functions. Next, using the rate function in the LDP for the exploration process we formulate a calculus of variations problem associated with the asymptotics of component degree distributions. The second key ingredient in our study is a careful analysis of the infinite dimensional Euler-Lagrange equations associated with this calculus of variations problem. Exact solutions are identified which then provide explicit formulas for decay rates of probabilities of non-typical component degree distributions and related quantities. Please see the paper for the complete abstract. △ Less

Submitted 3 July, 2020; v1 submitted 8 December, 2019; originally announced December 2019.

Comments: 62 pages. arXiv admin note: substantial text overlap with arXiv:1708.01832

MSC Class: 60F10; 60C05; 05C80; 90B15

arXiv:1906.04582 [pdf, other]

Intertemporal Community Detection in Human Mobility Networks

Authors: Mark He, Joseph Glasser, Shankar Bhamidi, Nikhil Kaza

Abstract: We introduce a community detection method that finds clusters in network time-series by introducing an algorithm that finds significantly interconnected nodes across time. These connections are either increasing, decreasing, or constant over time. Significance of nodal connectivity within a set is judged using the Weighted Configuration Null Model at each time-point, then a novel significance-test… ▽ More We introduce a community detection method that finds clusters in network time-series by introducing an algorithm that finds significantly interconnected nodes across time. These connections are either increasing, decreasing, or constant over time. Significance of nodal connectivity within a set is judged using the Weighted Configuration Null Model at each time-point, then a novel significance-testing scheme is used to assess connectivity at all time points and the direction of its time-trend. We apply this method to bikeshare networks in New York City and Chicago and taxicab pickups and dropoffs in New York to find and illustrate patterns in human mobility in urban zones. Results show stark geographical patterns in clusters that are growing and declining in relative usage across time and potentially elucidate latent economic or demographic trends. △ Less

Submitted 3 April, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: 29 pages

arXiv:1903.06029 [pdf, other]

doi 10.1371/journal.pone.0230941

Demarcating Geographic Regions using Community Detection in Commuting Networks with Significant Self-Loops

Authors: Mark He, Joseph Glasser, Nathaniel Pritchard, Shankar Bhamidi, Nikhil Kaza

Abstract: We develop a method to identify statistically significant communities in a weighted network with a high proportion of self-loo** weights. We use this method to find overlap** agglomerations of U.S. counties by representing inter-county commuting as a weighted network. We identify three types of communities; non-nodal, nodal and monads, which correspond to different types of regions. The result… ▽ More We develop a method to identify statistically significant communities in a weighted network with a high proportion of self-loo** weights. We use this method to find overlap** agglomerations of U.S. counties by representing inter-county commuting as a weighted network. We identify three types of communities; non-nodal, nodal and monads, which correspond to different types of regions. The results suggest that traditional regional delineations that rely on ad hoc thresholds do not account for important and pervasive connections that extend far beyond expected metropolitan boundaries or megaregions. △ Less

Submitted 26 March, 2020; v1 submitted 13 March, 2019; originally announced March 2019.

Comments: 38 pages

arXiv:1902.03263 [pdf, other]

Survival and extinction of epidemics on random graphs with general degrees

Authors: Shankar Bhamidi, Danny Nam, Oanh Nguyen, Allan Sly

Abstract: In this paper, we establish the necessary and sufficient criterion for the contact process on Galton-Watson trees (resp. random graphs) to exhibit the phase of extinction (resp. short survival). We prove that the survival threshold $λ_1$ for a Galton-Watson tree is strictly positive if and only if its offspring distribution $ξ$ has an exponential tail, i.e., $\mathbb{E} e^{cξ}<\infty$ for some… ▽ More In this paper, we establish the necessary and sufficient criterion for the contact process on Galton-Watson trees (resp. random graphs) to exhibit the phase of extinction (resp. short survival). We prove that the survival threshold $λ_1$ for a Galton-Watson tree is strictly positive if and only if its offspring distribution $ξ$ has an exponential tail, i.e., $\mathbb{E} e^{cξ}<\infty$ for some $c>0$, settling a conjecture by Huang and Durrett [12]. On the random graph with degree distribution $μ$, we show that if $μ$ has an exponential tail, then for small enough $λ$ the contact process with the all-infected initial condition survives for $n^{1+o(1)}$-time w.h.p. (short survival), while for large enough $λ$ it runs over $e^{Θ(n)}$-time w.h.p. (long survival). When $μ$ is subexponential, we prove that the contact process w.h.p. displays long survival for any fixed $λ>0$. △ Less

Submitted 17 January, 2020; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: 39 pages

arXiv:1810.01300 [pdf, other]

Sampling-based Estimation of In-degree Distribution with Applications to Directed Complex Networks

Authors: Nelson Antunes, Shankar Bhamidi, Tianjian Guo, Vladas Pipiras, Bang Wang

Abstract: The focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-deg… ▽ More The focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-degree of that node is not observed. The suggested estimation of the in-degree distribution is based on two approaches. The inversion approach exploits the relation between the original and sample in-degree distributions, and can estimate the bulk of the in-degree distribution, but not the tail of the distribution. The tail of the in-degree distribution is estimated through an asymptotic approach, which itself has two versions: one assuming a power-law tail and the other for a tail of general form. The two estimation approaches are examined on synthetic and real networks, with good performance results, especially striking for the asymptotic approach. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Comments: 30 pages , 6 figures

arXiv:1808.02439 [pdf, ps, other]

Fluctuation Bounds for Continuous Time Branching Processes and Evolution of Growing Trees With a Change Point

Authors: Sayan Banerjee, Shankar Bhamidi, Iain Carmichael

Abstract: We consider dynamic random trees constructed using an attachment function $f : \mathbb{N} \to \mathbb{R}_+$ where, at each step of the evolution, a new vertex attaches to an existing vertex $v$ in the current tree with probability proportional to $f$(degree(v)). We explore the effect of a change point in the system; the dynamics are initially driven by a function f until the tree reaches size… ▽ More We consider dynamic random trees constructed using an attachment function $f : \mathbb{N} \to \mathbb{R}_+$ where, at each step of the evolution, a new vertex attaches to an existing vertex $v$ in the current tree with probability proportional to $f$(degree(v)). We explore the effect of a change point in the system; the dynamics are initially driven by a function f until the tree reaches size $τ(n) \in (0,n)$, at which point the attachment function switches to another function, $g$, until the tree reaches size $n$. Two change point time scales are considered, namely the standard model where $τ(n) = γn$, and the quick big bang model where $τ(n) = n^γ$, for some $0 < γ< 1$. In the former case, we obtain deterministic approximations for the evolution of the empirical degree distribution (EDF) in sup-norm and use these to devise a provably consistent non-parametric estimator for the change point $γ$. In the latter case, we show that the effect of pre-change point dynamics asymptotically vanishes in the EDF, although this effect persists in functionals such as the maximal degree. Our proofs rely on embedding the discrete time tree dynamics in an associated (time) inhomogeneous continuous time branching process (CTBP). In the course of proving the above results, we develop novel mathematical techniques to analyze both homogeneous and inhomogeneous CTBPs and obtain rates of convergence for functionals of such processes, which are of independent interest. △ Less

Submitted 15 August, 2022; v1 submitted 7 August, 2018; originally announced August 2018.

Comments: 57 pages, Accepted version in Annals of Applied Probability

MSC Class: 60C05; 05C80

arXiv:1708.05587 [pdf, ps, other]

doi 10.1007/s10955-018-2103-0

Weighted Exponential Random graph models: Scope and large network limits

Authors: Shankar Bhamidi, Suman Chakraborty, Skyler Cranmer, Bruce Desmarais

Abstract: We study models of weighted exponential random graphs in the large network limit. These models have recently been proposed to model weighted network data arising from a host of applications including socio-econometric data such as migration flows and neuroscience. Analogous to fundamental results derived for standard (unweighted) exponential random graph models in the work of Chatterjee and Diacon… ▽ More We study models of weighted exponential random graphs in the large network limit. These models have recently been proposed to model weighted network data arising from a host of applications including socio-econometric data such as migration flows and neuroscience. Analogous to fundamental results derived for standard (unweighted) exponential random graph models in the work of Chatterjee and Diaconis, we derive limiting results for the structure of these models as the number of nodes goes to infinity. Our results are applicable for a wide variety of base measures including measures with unbounded support. We also derive sufficient conditions for continuity of functionals in the specification of the model including conditions on nodal covariates. Finally we include a number of open problems to spur further understanding of this model especially in the context of applications. △ Less

Submitted 10 July, 2018; v1 submitted 18 August, 2017; originally announced August 2017.

Comments: 27 pages

MSC Class: 60C05; 05C80

arXiv:1708.01832 [pdf, ps, other]

Large Deviation Principle for the Exploration Process of the Configuration Model

Authors: Shankar Bhamidi, Amarjit Budhiraja, Paul Dupuis, Ruoyu Wu

Abstract: The configuration model is a sequence of random graphs constructed such that in the large network limit the degree distribution converges to a pre-specified probability distribution. The component structure of such random graphs can be obtained from an infinite dimensional Markov chain referred to as the exploration process. We establish a large deviation principle for the exploration process asso… ▽ More The configuration model is a sequence of random graphs constructed such that in the large network limit the degree distribution converges to a pre-specified probability distribution. The component structure of such random graphs can be obtained from an infinite dimensional Markov chain referred to as the exploration process. We establish a large deviation principle for the exploration process associated with the configuration model. Proofs rely on a representation of the exploration process as a system of stochastic differential equations driven by Poisson random measures and variational formulas for moments of nonnegative functionals of Poisson random measures. Uniqueness results for certain controlled systems of deterministic equations play a key role in the analysis. Applications of the large deviation results, for studying asymptotic behavior of the degree sequence in large components of the random graphs, are discussed. △ Less

Submitted 10 December, 2019; v1 submitted 5 August, 2017; originally announced August 2017.

Comments: 36 pages; this submission has now been replaced with new url arXiv:1912.04714 with new results

MSC Class: 60F10; 60C05; 05C80; 90B15

arXiv:1703.09908 [pdf, ps, other]

A probabilistic approach to the leader problem in random graphs

Authors: Louigi Addario-Berry, Shankar Bhamidi, Sanchayan Sen

Abstract: We study the fixation time of the identity of the leader, i.e., the most massive component, in the general setting of Aldous's multiplicative coalescent [4, 5], which in an asymptotic sense describes the evolution of the component sizes of a wide array of near-critical coalescent processes, including the classical Erdős-Rényi process. We show tightness of the fixation time in the "Brownian" regi… ▽ More We study the fixation time of the identity of the leader, i.e., the most massive component, in the general setting of Aldous's multiplicative coalescent [4, 5], which in an asymptotic sense describes the evolution of the component sizes of a wide array of near-critical coalescent processes, including the classical Erdős-Rényi process. We show tightness of the fixation time in the "Brownian" regime, explicitly determining the median value of the fixation time to within an optimal $O(1)$ window. This generalizes Łuczak's result [31] for the Erdős-Rényi random graph using completely different techniques. In the heavy-tailed case, in which the limit of the component sizes can be encoded using a thinned pure-jump Lévy process, we prove that only one-sided tightness holds. This shows a genuine difference in the possible behavior in the two regimes. The solution to the leader problem in the setting of the Erdős-Rényi random graph played an important role in the study of the scaling limit of the minimal spanning tree on the complete graph [2]. We believe that analogous results, such as those proved herein, will be useful in establishing universality of the intrinsic geometry of the minimal spanning tree across a large class of models. △ Less

Submitted 26 May, 2020; v1 submitted 29 March, 2017; originally announced March 2017.

Comments: 31 pages; to appear in Random Structures & Algorithms

MSC Class: 60C05; 05C80

arXiv:1703.07145 [pdf, other]

doi 10.1214/19-EJP408

Universality for critical heavy-tailed network models: Metric structure of maximal components

Authors: Shankar Bhamidi, Souvik Dhara, Remco van der Hofstad, Sanchayan Sen

Abstract: We study limits of the largest connected components (viewed as metric spaces) obtained by critical percolation on uniformly chosen graphs and configuration models with heavy-tailed degrees. For rank-one inhomogeneous random graphs, such results were derived by Bhamidi, van der Hofstad, Sen [Probab. Theory Relat. Fields 2018]. We develop general principles under which the identical scaling limits a… ▽ More We study limits of the largest connected components (viewed as metric spaces) obtained by critical percolation on uniformly chosen graphs and configuration models with heavy-tailed degrees. For rank-one inhomogeneous random graphs, such results were derived by Bhamidi, van der Hofstad, Sen [Probab. Theory Relat. Fields 2018]. We develop general principles under which the identical scaling limits as the rank-one case can be obtained. Of independent interest, we derive refined asymptotics for various susceptibility functions and the maximal diameter in the barely subcritical regime. △ Less

Submitted 7 May, 2020; v1 submitted 21 March, 2017; originally announced March 2017.

Comments: Final published version. 47 pages, 6 figures

Journal ref: Electron. J. Probab. 25, no. 47, 1-57 (2020)

arXiv:1612.00801 [pdf, ps, other]

Weakly interacting particle systems on inhomogeneous random graphs

Authors: Shankar Bhamidi, Amarjit Budhiraja, Ruoyu Wu

Abstract: We consider weakly interacting diffusions on time varying random graphs. The system consists of a large number of nodes in which the state of each node is governed by a diffusion process that is influenced by the neighboring nodes. The collection of neighbors of a given node changes dynamically over time and is determined through a time evolving random graph process. A law of large numbers and a p… ▽ More We consider weakly interacting diffusions on time varying random graphs. The system consists of a large number of nodes in which the state of each node is governed by a diffusion process that is influenced by the neighboring nodes. The collection of neighbors of a given node changes dynamically over time and is determined through a time evolving random graph process. A law of large numbers and a propagation of chaos result is established for a multi-type population setting where at each instant the interaction between nodes is given by an inhomogeneous random graph which may change over time. This result covers the setting in which the edge probabilities between any two nodes is allowed to decay to $0$ as the size of the system grows. A central limit theorem is established for the single-type population case under stronger conditions on the edge probability function. △ Less

Submitted 15 February, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

Comments: 31 pages

arXiv:1610.06511 [pdf, other]

Community extraction in multilayer networks with heterogeneous community structure

Authors: James D. Wilson, John Palowitch, Shankar Bhamidi, Andrew B. Nobel

Abstract: Multilayer networks are a useful way to capture and model multiple, binary or weighted relationships among a fixed group of objects. While community detection has proven to be a useful exploratory technique for the analysis of single-layer networks, the development of community detection methods for multilayer networks is still in its infancy. We propose and investigate a procedure, called Multila… ▽ More Multilayer networks are a useful way to capture and model multiple, binary or weighted relationships among a fixed group of objects. While community detection has proven to be a useful exploratory technique for the analysis of single-layer networks, the development of community detection methods for multilayer networks is still in its infancy. We propose and investigate a procedure, called Multilayer Extraction, that identifies densely connected vertex-layer sets in multilayer networks. Multilayer Extraction makes use of a significance based score that quantifies the connectivity of an observed vertex-layer set through comparison with a fixed degree random graph model. Multilayer Extraction directly handles networks with heterogeneous layers where community structure may be different from layer to layer. The procedure can capture overlap** communities, as well as background vertex-layer pairs that do not belong to any community. We establish consistency of the vertex-layer set optimizer of our proposed multilayer score under the multilayer stochastic block model. We investigate the performance of Multilayer Extraction on three applications and a test bed of simulations. Our theoretical and numerical evaluations suggest that Multilayer Extraction is an effective exploratory tool for analyzing complex multilayer networks. Publicly available R software for Multilayer Extraction is available at https://github.com/jdwilson4/MultilayerExtraction. △ Less

Submitted 7 November, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

Comments: 46 pages. Accepted at the Journal of Machine Learning Research (11/17)

arXiv:1610.03762 [pdf, other]

Large subgraphs in pseudo-random graphs

Authors: Anirban Basak, Shankar Bhamidi, Suman Chakraborty, Andrew Nobel

Abstract: We consider classes of pseudo-random graphs on $n$ vertices for which the degree of every vertex and the co-degree between every pair of vertices are in the intervals $(np - Cn^δ,np+Cn^δ)$ and $(np^2- C n^δ, np^2 +C n^δ)$ respectively, for some absolute constant $C$, and $p, δ\in (0,1)$. We show that for such pseudo-random graphs the number of induced isomorphic copies of subgraphs of size $s$ are… ▽ More We consider classes of pseudo-random graphs on $n$ vertices for which the degree of every vertex and the co-degree between every pair of vertices are in the intervals $(np - Cn^δ,np+Cn^δ)$ and $(np^2- C n^δ, np^2 +C n^δ)$ respectively, for some absolute constant $C$, and $p, δ\in (0,1)$. We show that for such pseudo-random graphs the number of induced isomorphic copies of subgraphs of size $s$ are approximately same as that of an Erdős-Réyni random graph with edge connectivity probability $p$ as long as $s \le (((1-δ)\wedge \frac{1}{2})-o(1))\log n/\log (1/p)$, when $p \in (0,1/2]$. When $p \in (1/2,1)$ we obtain a similar result. Our result is applicable for a large class of random and deterministic graphs including exponential random graph models (ERGMs), thresholded graphs from high-dimensional correlation networks, Erdős-Réyni random graphs conditioned on large cliques, random $d$-regular graphs and graphs obtained from vector spaces over binary fields. In the context of the last example, the results obtained are optimal. Straight-forward extensions using the proof techniques in this paper imply strengthening of the above results in the context of larger motifs if a model allows control over higher co-degree type functionals. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: 52 pages

arXiv:1608.07153 [pdf, other]

Geometry of the vacant set left by random walk on random graphs, Wright's constants, and critical random graphs with prescribed degrees

Authors: Shankar Bhamidi, Sanchayan Sen

Abstract: We provide an explicit algorithm for sampling a uniform simple connected random graph with a given degree sequence. By products of this central result include: (i) continuum scaling limits of uniform simple connected graphs with given degree sequence and asymptotics for the number of simple connected graphs with given degree sequence under some regularity conditions, and (ii) scaling limits fo… ▽ More We provide an explicit algorithm for sampling a uniform simple connected random graph with a given degree sequence. By products of this central result include: (i) continuum scaling limits of uniform simple connected graphs with given degree sequence and asymptotics for the number of simple connected graphs with given degree sequence under some regularity conditions, and (ii) scaling limits for the metric space structure of the maximal components in the critical regime of both the configuration model and the uniform simple random graph model with prescribed degree sequence under finite third moment assumption on the degree sequence. As a substantive application we answer a question raised by Cerny and Teixeira by obtaining the metric space scaling limit of maximal components in the vacant set left by random walks on random regular graphs. △ Less

Submitted 19 June, 2019; v1 submitted 25 August, 2016; originally announced August 2016.

Comments: 44 pages, 5 figures; to appear in Random Structures & Algorithms

MSC Class: 60C05; 05C80

arXiv:1601.05630 [pdf, other]

Significance-based community detection in weighted networks

Authors: John Palowitch, Shankar Bhamidi, Andrew B. Nobel

Abstract: Community detection is the process of grou** strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical signficance. In this paper, we introduce a null for weighted networks called the continuous configuration model. We use… ▽ More Community detection is the process of grou** strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical signficance. In this paper, we introduce a null for weighted networks called the continuous configuration model. We use the model both as a tool for community detection and for simulating weighted networks with null nodes. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework incorporating the null to plant "background" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlap** communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems. △ Less

Submitted 23 October, 2017; v1 submitted 21 January, 2016; originally announced January 2016.

Comments: Code and supplemental info available at http://stats.johnpalowitch.com/ccme. V3 changes: based on lengthy referee revision process, new theoretical sections added, + major organizational changes. V2 changes: grant info added, 1 reference added, bibliography section moved to end, condensed bib line spacing, corrected typos

arXiv:1508.04645 [pdf, other]

The multiplicative coalescent, inhomogeneous continuum random trees, and new universality classes for critical random graphs

Authors: Shankar Bhamidi, Remco van der Hofstad, Sanchayan Sen

Abstract: One major open conjecture in the area of critical random graphs, formulated by statistical physicists, and supported by a large amount of numerical evidence over the last decade [23, 24, 28, 63] is as follows: for a wide array of random graph models with degree exponent $τ\in (3,4)$, distances between typical points both within maximal components in the critical regime as well as on the minimal sp… ▽ More One major open conjecture in the area of critical random graphs, formulated by statistical physicists, and supported by a large amount of numerical evidence over the last decade [23, 24, 28, 63] is as follows: for a wide array of random graph models with degree exponent $τ\in (3,4)$, distances between typical points both within maximal components in the critical regime as well as on the minimal spanning tree on the giant component in the supercritical regime scale like $n^{(τ-3)/(τ-1)}$. In this paper we study the metric space structure of maximal components of the multiplicative coalescent, in the regime where the sizes converge to excursions of Lévy processes "without replacement" [10], yielding a completely new class of limiting random metric spaces. A by-product of the analysis yields the continuum scaling limit of one fundamental class of random graph models with degree exponent $τ\in (3,4)$ where edges are rescaled by $n^{-(τ-3)/(τ-1)}$ yielding the first rigorous proof of the above conjecture. The limits in this case are compact "tree-like" random fractals with finite fractal dimensions and with a dense collection of hubs (infinite degree vertices) a finite number of which are identified with leaves to form shortcuts. In a special case, we show that the Minkowski dimension of the limiting spaces equal $(τ-2)/(τ-3)$ a.s., in stark contrast to the Erdős-Rényi scaling limit whose Minkowski dimension is 2 a.s. It is generally believed that dynamic versions of a number of fundamental random graph models, as one moves from the barely subcritical to the critical regime can be approximated by the multiplicative coalescent. In work in progress, the general theory developed in this paper is used to prove analogous limit results for other random graph models with degree exponent $τ\in (3,4)$. △ Less

Submitted 14 January, 2017; v1 submitted 19 August, 2015; originally announced August 2015.

Comments: 71 pages, 5 figures, To appear in Probability Theory and Related Fields

MSC Class: 60C05; 05C80

arXiv:1508.02043 [pdf, other]

Change point detection in Network models: Preferential attachment and long range dependence

Authors: Shankar Bhamidi, Jimmy **, Andrew Nobel

Abstract: Inspired by empirical data on real world complex networks, the last few years have seen an explosion in proposed generative models to understand and explain observed properties of real world networks, including power law degree distribution and "small world" distance scaling. In this context, a natural question is the phenomenon of {\it change point}, understanding how abrupt changes in parameters… ▽ More Inspired by empirical data on real world complex networks, the last few years have seen an explosion in proposed generative models to understand and explain observed properties of real world networks, including power law degree distribution and "small world" distance scaling. In this context, a natural question is the phenomenon of {\it change point}, understanding how abrupt changes in parameters driving the network model change structural properties of the network. We study this phenomenon in one popular class of dynamically evolving networks: preferential attachment models. We derive asymptotic properties of various functionals of the network including the degree distribution as well as maximal degree asymptotics, in essence showing that the change point does effect the degree distribution but does {\bf not} change the degree exponent. This provides further evidence for long range dependence and sensitive dependence of the evolution of the process on the initial evolution of the process in such self-reinforced systems. We then propose an estimator for the change point and prove consistency properties of this estimator. The methodology developed highlights the effect of the non-ergodic nature of the evolution of the network on classical change point estimators. △ Less

Submitted 9 August, 2015; originally announced August 2015.

Comments: 35 pages, 6 figures

MSC Class: 60C05; 05C80

arXiv:1506.02811 [pdf, other]

Exceptional rotations of random graphs: a VC theory

Authors: Louigi Addario-Berry, Shankar Bhamidi, Sébastien Bubeck, Luc Devroye, Gabor Lugosi, Roberto Imbuzeiro Oliveira

Abstract: In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random grap… ▽ More In this paper we explore maximal deviations of large random structures from their typical behavior. We introduce a model for a high-dimensional random graph process and ask analogous questions to those of Vapnik and Chervonenkis for deviations of averages: how "rich" does the process have to be so that one sees atypical behavior. In particular, we study a natural process of Erdős-Rényi random graphs indexed by unit vectors in $\mathbb{R}^d$. We investigate the deviations of the process with respect to three fundamental properties: clique number, chromatic number, and connectivity. In all cases we establish upper and lower bounds for the minimal dimension $d$ that guarantees the existence of "exceptional directions" in which the random graph behaves atypically with respect to the property. For each of the three properties, four theorems are established, to describe upper and lower bounds for the threshold dimension in the subcritical and supercritical regimes. △ Less

Submitted 9 June, 2015; originally announced June 2015.

arXiv:1505.04015 [pdf, other]

Stochastic Weighted Graphs: Flexible Model Specification and Simulation

Authors: James D. Wilson, Matthew J. Denny, Shankar Bhamidi, Skyler Cranmer, Bruce Desmarais

Abstract: In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a jo… ▽ More In most domains of network analysis researchers consider networks that arise in nature with weighted edges. Such networks are routinely dichotomized in the interest of using available methods for statistical inference with networks. The generalized exponential random graph model (GERGM) is a recently proposed method used to simulate and model the edges of a weighted graph. The GERGM specifies a joint distribution for an exponential family of graphs with continuous-valued edge weights. However, current estimation algorithms for the GERGM only allow inference on a restricted family of model specifications. To address this issue, we develop a Metropolis--Hastings method that can be used to estimate any GERGM specification, thereby significantly extending the family of weighted graphs that can be modeled with the GERGM. We show that new flexible model specifications are capable of avoiding likelihood degeneracy and efficiently capturing network structure in applications where such models were not previously available. We demonstrate the utility of this new class of GERGMs through application to two real network data sets, and we further assess the effectiveness of our proposed methodology by simulating non-degenerate model specifications from the well-studied two-stars model. A working R version of the GERGM code is available in the supplement and will be incorporated in the gergm CRAN package. △ Less

Submitted 9 November, 2016; v1 submitted 15 May, 2015; originally announced May 2015.

Comments: 33 pages, 6 figures. To appear in Social Networks

arXiv:1411.3417 [pdf, ps, other]

Scaling limits of random graph models at criticality: Universality and the basin of attraction of the Erdős-Rényi random graph

Authors: Shankar Bhamidi, Nicolas Broutin, Sanchayan Sen, Xuan Wang

Abstract: A wide array of random graph models have been postulated to understand properties of observed networks. Typically these models have a parameter $t$ and a critical time $t_c$ when a giant component emerges. It is conjectured that for a large class of models, the nature of this emergence is similar to that of the Erdős-Rényi random graph, in the sense that (a) the sizes of the maximal components in… ▽ More A wide array of random graph models have been postulated to understand properties of observed networks. Typically these models have a parameter $t$ and a critical time $t_c$ when a giant component emerges. It is conjectured that for a large class of models, the nature of this emergence is similar to that of the Erdős-Rényi random graph, in the sense that (a) the sizes of the maximal components in the critical regime scale like $n^{2/3}$, and (b) the structure of the maximal components at criticality (rescaled by $n^{-1/3}$) converges to random fractals. To date, (a) has been proven for a number of models using different techniques. This paper develops a general program for proving (b) that requires three ingredients: (i) in the critical scaling window, components merge approximately like the multiplicative coalescent, (ii) scaling exponents of susceptibility functions are the same as that of the Erdős-Rényi random graph, and (iii) macroscopic averaging of distances between vertices in the barely subcritical regime. We show that these apply to two fundamental random graph models: the configuration model and inhomogeneous random graphs with a finite ground space. For these models, we also obtain new results for component sizes at criticality and structural properties in the barely subcritical regime. △ Less

Submitted 13 June, 2021; v1 submitted 12 November, 2014; originally announced November 2014.

Comments: 68 pages. The section on bounded-size rules has been omitted. It will be included in a future paper

MSC Class: 60C05; 05C80

arXiv:1404.4118 [pdf, other]

Continuum limit of critical inhomogeneous random graphs

Authors: Shankar Bhamidi, Sanchayan Sen, Xuan Wang

Abstract: Motivated by applications, the last few years have witnessed tremendous interest in understanding the structure as well as the behavior of dynamics for inhomogeneous random graph models. In this study we analyze the maximal components at criticality of one famous class of such models, the rank-one inhomogeneous random graph model. Viewing these components as measured random metric spaces, under fi… ▽ More Motivated by applications, the last few years have witnessed tremendous interest in understanding the structure as well as the behavior of dynamics for inhomogeneous random graph models. In this study we analyze the maximal components at criticality of one famous class of such models, the rank-one inhomogeneous random graph model. Viewing these components as measured random metric spaces, under finite moment assumptions for the weight distribution, we show that the components in the critical scaling window with distances scaled by $n^{-1/3}$ converge in the Gromov-Haussdorf-Prokhorov metric to rescaled versions of the limit objects identified for the Erdős-Rényi random graph components at criticality Addario-Berry, Broutin and Goldschmidt (2012). A key step is the construction of connected components of the random graph through an appropriate tilt of a famous class of random trees called $\mathbf{p}$-trees (studied previously by Aldous, Miermont and Pitman (2004) and by Camarri and Pitman (2000)). This is the first step in rigorously understanding the scaling limits of objects such as the Minimal spanning tree and other strong disorder models from statistical physics (see Braunstein et al., 2003) for such graph models. By asymptotic equivalence (Janson, 2010), the same results are true for the Chung-Lu model and the Britton-Deijfen-Lof model. A crucial ingredient of the proof of independent interest is tail bounds for the height of $\mathbf{p}$-trees. The techniques developed in this paper form the main technical bedrock for proving continuum scaling limits in the critical regime for a wide array of other random graph models (Bhamidi, Broutin, Sen and Wang, 2014) including the configuration model and inhomogeneous random graphs with general kernels which were introduced by Bollobas, Janson and Riordan (2007). △ Less

Submitted 29 July, 2016; v1 submitted 15 April, 2014; originally announced April 2014.

Comments: 57 pages, 2 figures. To appear in Probability Theory and Related Fields

MSC Class: 60C05; 05C80

arXiv:1401.0914 [pdf, ps, other]

doi 10.1239/jap/1417528470

The front of the epidemic spread and first passage percolation

Authors: Shankar Bhamidi, Remco van der Hofstad, Julia Komjathy

Abstract: In this paper we establish a connection between epidemic models on random networks with general infection times considered in Barbour and Reinert 2013 and first passage percolation. Using techniques developed in Bhamidi, van der Hofstad, Hooghiemstra 2012, when each vertex has infinite contagious periods, we extend results on the epidemic curve in Barbour Reinert 2013 from bounded degree graphs to… ▽ More In this paper we establish a connection between epidemic models on random networks with general infection times considered in Barbour and Reinert 2013 and first passage percolation. Using techniques developed in Bhamidi, van der Hofstad, Hooghiemstra 2012, when each vertex has infinite contagious periods, we extend results on the epidemic curve in Barbour Reinert 2013 from bounded degree graphs to general sparse random graphs with degrees having finite third moments as the number of vertices tends to infinity. We also study the epidemic trail between the source and typical vertices in the graph. This connection to first passage percolation can be also be used to study epidemic models with general contagious periods as in Barbour Reinert 2013 without bounded degree assumptions. △ Less

Submitted 5 January, 2014; originally announced January 2014.

Comments: 14 pages

Journal ref: Journal of Applied Probability Volume 51A (2014), 101-121

arXiv:1310.5672 [pdf, ps, other]

doi 10.1214/14-AAP1036

Degree distribution of shortest path trees and bias of network sampling algorithms

Authors: Shankar Bhamidi, Jesse Goodman, Remco van der Hofstad, Júlia Komjáthy

Abstract: In this article, we explicitly derive the limiting degree distribution of the shortest path tree from a single source on various random network models with edge weights. We determine the asymptotics of the degree distribution for large degrees of this tree and compare it to the degree distribution of the original graph. We perform this analysis for the complete graph with edge weights that are pow… ▽ More In this article, we explicitly derive the limiting degree distribution of the shortest path tree from a single source on various random network models with edge weights. We determine the asymptotics of the degree distribution for large degrees of this tree and compare it to the degree distribution of the original graph. We perform this analysis for the complete graph with edge weights that are powers of exponential random variables (weak disorder in the stochastic mean-field model of distance), as well as on the configuration model with edge-weights drawn according to any continuous distribution. In the latter, the focus is on settings where the degrees obey a power law, and we show that the shortest path tree again obeys a power law with the same degree power-law exponent. We also consider random $r$-regular graphs for large $r$, and show that the degree distribution of the shortest path tree is closely related to the shortest path tree for the stochastic mean-field model of distance. We use our results to shed light on an empirically observed bias in network sampling methods. This is part of a general program initiated in previous works by Bhamidi, van der Hofstad and Hooghiemstra [Ann. Appl. Probab. 20 (2010) 1907-1965], [Combin. Probab. Comput. 20 (2011) 683-707], [Adv. in Appl. Probab. 42 (2010) 706-738] of analyzing the effect of attaching random edge lengths on the geometry of random network models. △ Less

Submitted 19 June, 2015; v1 submitted 21 October, 2013; originally announced October 2013.

Comments: Published at http://dx.doi.org/10.1214/14-AAP1036 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP1036

Journal ref: Annals of Applied Probability 2015, Vol. 25, No. 4, 1780-1826

arXiv:1308.0777 [pdf, ps, other]

doi 10.1214/14-AOAS760

A testing based extraction algorithm for identifying significant communities in networks

Authors: James D. Wilson, Simi Wang, Peter J. Mucha, Shankar Bhamidi, Andrew B. Nobel

Abstract: A common and important problem arising in the study of networks is how to divide the vertices of a given network into one or more groups, called communities, in such a way that vertices of the same community are more interconnected than vertices belonging to different ones. We propose and investigate a testing based community detection procedure called Extraction of Statistically Significant Commu… ▽ More A common and important problem arising in the study of networks is how to divide the vertices of a given network into one or more groups, called communities, in such a way that vertices of the same community are more interconnected than vertices belonging to different ones. We propose and investigate a testing based community detection procedure called Extraction of Statistically Significant Communities (ESSC). The ESSC procedure is based on $p$-values for the strength of connection between a single vertex and a set of vertices under a reference distribution derived from a conditional configuration network model. The procedure automatically selects both the number of communities in the network and their size. Moreover, ESSC can handle overlap** communities and, unlike the majority of existing methods, identifies "background" vertices that do not belong to a well-defined community. The method has only one parameter, which controls the stringency of the hypothesis tests. We investigate the performance and potential use of ESSC and compare it with a number of existing methods, through a validation study using four real network data sets. In addition, we carry out a simulation study to assess the effectiveness of ESSC in networks with various types of community structure, including networks with overlap** communities and those with background vertices. These results suggest that ESSC is an effective exploratory tool for the discovery of relevant community structure in complex network systems. Data and software are available at \urlhttp://www.unc.edu/~jameswd/research.html. △ Less

Submitted 3 December, 2014; v1 submitted 4 August, 2013; originally announced August 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS760 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS760

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 3, 1853-1891

arXiv:1306.0208 [pdf, ps, other]

Diameter of the stochastic mean-field model of distance

Authors: Shankar Bhamidi, Remco van der Hofstad

Abstract: We consider the complete graph $\cK_n$ on $n$ vertices with exponential mean $n$ edge lengths. Writing $C_{ij}$ for the weight of the smallest-weight path between vertex $i,j\in [n]$, Janson showed that $\max_{i,j\in [n]} C_{ij}/\log{n}$ converges in probability to 3. We extend this result by showing that $\max_{i,j\in [n]} C_{ij} - 3\log{n}$ converges in distribution to a limiting random variable… ▽ More We consider the complete graph $\cK_n$ on $n$ vertices with exponential mean $n$ edge lengths. Writing $C_{ij}$ for the weight of the smallest-weight path between vertex $i,j\in [n]$, Janson showed that $\max_{i,j\in [n]} C_{ij}/\log{n}$ converges in probability to 3. We extend this result by showing that $\max_{i,j\in [n]} C_{ij} - 3\log{n}$ converges in distribution to a limiting random variable that can be identified via a maximization procedure on a limiting infinite random structure. Interestingly, this limiting random variable has also appeared as the weak limit of the re-centered graph diameter of the barely supercritical Erdős-Rényi random graph in work by Riordan and Wormald. △ Less

Submitted 2 June, 2013; originally announced June 2013.

Comments: 27 pages

MSC Class: 60C05; 05C80; 90B15

arXiv:1302.6551 [pdf, ps, other]

The importance sampling technique for understanding rare events in Erdős-Rényi random graphs

Authors: Shankar Bhamidi, Jan Hannig, Chia Ying Lee, James Nolen

Abstract: In dense Erdős-Rényi random graphs, we are interested in the events where large numbers of a given subgraph occur. The mean behavior of subgraph counts is known, and only recently were the related large deviations results discovered. Consequently, it is natural to ask, can one develop efficient numerical schemes to estimate the probability of an Erdős-Rényi graph containing an excessively large nu… ▽ More In dense Erdős-Rényi random graphs, we are interested in the events where large numbers of a given subgraph occur. The mean behavior of subgraph counts is known, and only recently were the related large deviations results discovered. Consequently, it is natural to ask, can one develop efficient numerical schemes to estimate the probability of an Erdős-Rényi graph containing an excessively large number of a fixed given subgraph? Using the large deviation principle we study an importance sampling scheme as a method to numerically compute the small probabilities of large triangle counts occurring within Erdős-Rényi graphs. We show that the exponential tilt suggested directly by the large deviation principle does not always yield an optimal scheme. The exponential tilt used in the importance sampling scheme comes from a generalized class of exponential random graphs. Asymptotic optimality, a measure of the efficiency of the importance sampling scheme, is achieved by a special choice of the parameters in the exponential random graph that makes it indistinguishable from an Erdős-Rényi graph conditioned to have many triangles in the large network limit. We show how this choice can be made for the conditioned Erdős-Rényi graphs both in the replica symmetric phase as well as in parts of the replica breaking phase to yield asymptotically optimal numerical schemes to estimate this rare event probability. △ Less

Submitted 2 April, 2014; v1 submitted 26 February, 2013; originally announced February 2013.

Comments: 31 pages, 4 figures, 4 tables

MSC Class: 65C05; 05C80; 60F10

arXiv:1212.5493 [pdf, ps, other]

The augmented multiplicative coalescent and critical dynamic random graph models

Authors: Shankar Bhamidi, Amarjit Budhiraja, Xuan Wang

Abstract: Random graph models with limited choice have been studied extensively with the goal of understanding the mechanism of the emergence of the giant component. One of the standard models are the Achlioptas random graph processes on a fixed set of $n$ vertices. Here at each step, one chooses two edges uniformly at random and then decides which one to add to the existing configuration according to some… ▽ More Random graph models with limited choice have been studied extensively with the goal of understanding the mechanism of the emergence of the giant component. One of the standard models are the Achlioptas random graph processes on a fixed set of $n$ vertices. Here at each step, one chooses two edges uniformly at random and then decides which one to add to the existing configuration according to some criterion. An important class of such rules are the bounded-size rules where for a fixed $K\geq 1$, all components of size greater than $K$ are treated equally. While a great deal of work has gone into analyzing the subcritical and supercritical regimes, the nature of the critical scaling window, the size and complexity (deviation from trees) of the components in the critical regime and nature of the merging dynamics has not been well understood. In this work we study such questions for general bounded-size rules. Our first main contribution is the construction of an extension of Aldous's standard multiplicative coalescent process which describes the asymptotic evolution of the vector of sizes and surplus of all components. We show that this process, referred to as the standard augmented multiplicative coalescent (AMC) is `nearly' Feller with a suitable topology on the state space. Our second main result proves the convergence of suitably scaled component size and surplus vector, for any bounded-size rule, to the standard AMC. The key ingredients here are a precise analysis of the asymptotic behavior of various susceptibility functions near criticality and certain bounds from [8], on the size of the largest component in the barely subcritical regime. △ Less

Submitted 21 December, 2012; originally announced December 2012.

Comments: 49 pages

MSC Class: 60C05; 05C80; 90B15

arXiv:1212.5480 [pdf, ps, other]

doi 10.1017/S0963548314000261

Bounded-size rules: The barely subcritical regime

Authors: Shankar Bhamidi, Amarjit Budhiraja, Xuan Wang

Abstract: Bounded-size rules are dynamic random graph processes which incorporate limited choice along with randomness in the evolution of the system. One starts with the empty graph and at each stage two edges are chosen uniformly at random. One of the two edges is then placed into the system according to a decision rule based on the sizes of the components containing the four vertices. For bounded-size ru… ▽ More Bounded-size rules are dynamic random graph processes which incorporate limited choice along with randomness in the evolution of the system. One starts with the empty graph and at each stage two edges are chosen uniformly at random. One of the two edges is then placed into the system according to a decision rule based on the sizes of the components containing the four vertices. For bounded-size rules, all components of size greater than some fixed $K\geq 1$ are accorded the same treatment. Writing $\BS(t)$ for the state of the system with nt/2 edges, Spencer and Wormald proved that for such rules, there exists a critical time t_c such that when t< t_c the size of the largest component is of order $\log{n}$ while for $t> t_c$, the size of the largest component is of order $n$. In this work we obtain upper bounds (that hold with high probability) of order $n^{2γ} \log ^4 n$, on the size of the largest component, at time instants $t_n = t_c-n^{-γ}$, where $γ\in (0,1/4)$. This result for the barely subcritical regime forms a key ingredient in the study undertaken in \cite{amc-2012}, of the asymptotic dynamic behavior of the process describing the vector of component sizes and associated complexity of the components for such random graph models in the critical scaling window. The proof uses a coupling of BSR processes with a certain family of inhomogeneous random graphs with vertices in the type space $\Rbold_+\times \cD([0,\infty):\NNN_0)$ where $\cD([0,\infty):\NNN_0)$ is the Skorohod $D$-space of functions that are right continuous and have left limits equipped with the usual Skorohod topology. The coupling construction also gives an alternative characterization (than the usual explosion time of the susceptibility function) of the critical time $t_c$ for the emergence of the giant component in terms of the operator norm of integral operators on certain $L^2$ spaces. △ Less

Submitted 21 December, 2012; originally announced December 2012.

Comments: 28 pages

MSC Class: 60C05; 05C80; 90B15

Journal ref: Combinator. Probab. Comp. 23 (2014) 505-538

arXiv:1211.3090 [pdf, ps, other]

doi 10.1214/14-AAP1053

Twitter event networks and the Superstar model

Authors: Shankar Bhamidi, J. Michael Steele, Tauhid Zaman

Abstract: Condensation phenomenon is often observed in social networks such as Twitter where one "superstar" vertex gains a positive fraction of the edges, while the remaining empirical degree distribution still exhibits a power law tail. We formulate a mathematically tractable model for this phenomenon that provides a better fit to empirical data than the standard preferential attachment model across an ar… ▽ More Condensation phenomenon is often observed in social networks such as Twitter where one "superstar" vertex gains a positive fraction of the edges, while the remaining empirical degree distribution still exhibits a power law tail. We formulate a mathematically tractable model for this phenomenon that provides a better fit to empirical data than the standard preferential attachment model across an array of networks observed in Twitter. Using embeddings in an equivalent continuous time version of the process, and adapting techniques from the stable age-distribution theory of branching processes, we prove limit results for the proportion of edges that condense around the superstar, the degree distribution of the remaining vertices, maximal nonsuperstar degree asymptotics and height of these random trees in the large network limit. △ Less

Submitted 9 September, 2015; v1 submitted 13 November, 2012; originally announced November 2012.

Comments: Published at http://dx.doi.org/10.1214/14-AAP1053 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP1053

Journal ref: Annals of Applied Probability 2015, Vol. 25, No. 5, 2462-2502

arXiv:1211.2284 [pdf, other]

Energy Landscape for large average submatrix detection problems in Gaussian random matrices

Authors: Shankar Bhamidi, Partha S. Dey, Andrew B. Nobel

Abstract: The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an $n \times n$ Gaussian random matrix. The first part of the paper addresses global maxima. For fixed $k$ we identify the… ▽ More The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from a variety of disciplines, ranging from genomics to social sciences. In this paper we provide a detailed asymptotic analysis of large average submatrices of an $n \times n$ Gaussian random matrix. The first part of the paper addresses global maxima. For fixed $k$ we identify the average and the joint distribution of the $k \times k$ submatrix having largest average value. As a dual result, we establish that the size of the largest square sub-matrix with average bigger than a fixed positive constant is, with high probability, equal to one of two consecutive integers that depend on the threshold and the matrix dimension $n$. The second part of the paper addresses local maxima. Specifically we consider submatrices with dominant row and column sums that arise as the local optima of iterative search procedures for large average submatrices. For fixed $k$, we identify the limiting average value and joint distribution of a $k \times k$ submatrix conditioned to be a local maxima. In order to understand the density of such local optima and explain the quick convergence of such iterative procedures, we analyze the number $L_n(k)$ of local maxima, beginning with exact asymptotic expressions for the mean and fluctuation behavior of $L_n(k)$. For fixed $k$, the mean of $L_{n}(k)$ is $Θ(n^{k}/(\log{n})^{(k-1)/2})$ while the standard deviation is $Θ(n^{2k^2/(k+1)}/(\log{n})^{k^2/(k+1)})$. Our principal result is a Gaussian central limit theorem for $L_n(k)$ that is based on a new variant of Stein's method. △ Less

Submitted 13 June, 2013; v1 submitted 9 November, 2012; originally announced November 2012.

Comments: Proofs simplified, 49 pages, 3 figures

MSC Class: 62G32; 60F05; 60G70

arXiv:1210.6839 [pdf, ps, other]

Universality for first passage percolation on sparse random graphs

Authors: Shankar Bhamidi, Remco van der Hofstad, Gerard Hooghiemstra

Abstract: We consider first passage percolation on sparse random graphs with prescribed degree distributions and general independent and identically distributed edge weights assumed to have a density. Assuming that the degree distribution satisfies a uniform X^2\log{X}-condition, we analyze the asymptotic distribution for the minimal weight path between a pair of typical vertices, as well the number of edge… ▽ More We consider first passage percolation on sparse random graphs with prescribed degree distributions and general independent and identically distributed edge weights assumed to have a density. Assuming that the degree distribution satisfies a uniform X^2\log{X}-condition, we analyze the asymptotic distribution for the minimal weight path between a pair of typical vertices, as well the number of edges on this path or hopcount. The hopcount satisfies a central limit theorem where the norming constants are expressible in terms of the parameters of an associated continuous-time branching process. Centered by a multiple of \log{n}, where the constant is the inverse of the Malthusian rate of growth of the associated branching process, the minimal weight converges in distribution. The limiting random variable equals the sum of the logarithms of the martingale limits of the branching processes that measure the relative growth of neighborhoods about the two vertices, and a Gumbel random variable, and thus shows a remarkably universal behavior. The proofs rely on a refined coupling between the shortest path problems on these graphs and continuous-time branching processes, and on a Poisson point process limit for the potential closing edges of shortest-weight paths between the source and destination. The results extend to a host of related random graph models, ranging from random r-regular graphs, inhomogeneous random graphs and uniform random graphs with a prescribed degree sequence. △ Less

Submitted 25 October, 2012; originally announced October 2012.

MSC Class: 60C05; 05C80; 90B15

arXiv:1106.1022 [pdf, ps, other]

Bohman-Frieze processes at criticality and emergence of the giant component

Authors: Shankar Bhamidi, Amarjit Budhiraja, Xuan Wang

Abstract: The evolution of the usual Erdős-Rényi random graph model on n vertices can be described as follows: At time 0 start with the empty graph, with n vertices and no edges. Now at each time k, choose 2 vertices uniformly at random and attach an edge between these two vertices. Let \bfG_n(k) be the graph obtained at step k. Refined analysis in random graph theory now shows that for fixed t\in \Rbold, w… ▽ More The evolution of the usual Erdős-Rényi random graph model on n vertices can be described as follows: At time 0 start with the empty graph, with n vertices and no edges. Now at each time k, choose 2 vertices uniformly at random and attach an edge between these two vertices. Let \bfG_n(k) be the graph obtained at step k. Refined analysis in random graph theory now shows that for fixed t\in \Rbold, when k(n) = n/2+ n^{2/3} t/2, the sizes of the components in \bfG_n(k(n)) scale like n^{2/3} and rescaled component sizes converge to the standard multiplicative coalescent at time $t$. The last decade has seen variants of this process introduced, under the name Achlioptas processes, to understand the effect of simple changes in the edge formation scheme on the emergence of the giant component. Stimulated by a question of Achlioptas, one of the simplest and most popular of such models is the Bohman Frieze (BF) model wherein at each stage $k$, 2 edges e_1(k)=(v_1,v_2) and e_2(k) = (v_3, v_4) are chosen uniformly at random. If at this time v_1, v_2 are both isolated then this edge is added, otherwise e_2 is added. Then \cite{bohman2001avoiding} (and further analysis in \cite{spencer2007birth}) show that once again there is a critical parameter, which is larger than 1, above and below which the asymptotic behavior is as in the Erdős-Rényi setting. While an intense study for this and related models seems to suggest that at criticality, this model should be in the same universality class as the original Erdős-Rényi process, a precise mathematical treatment of the dynamics in the critical window has to date escaped analysis. In this work we study the component structure of the BF model in the critical window and show that at criticality the sizes of components properly rescaled and re-centered converge to the standard multiplicative coalescent. △ Less

Submitted 8 June, 2011; v1 submitted 6 June, 2011; originally announced June 2011.

Comments: version 2, 54 pages, new references added

MSC Class: 60C05; 05C80; 90B15

arXiv:1009.4025 [pdf, ps, other]

doi 10.3150/11-BEJ402

Weak disorder in the stochastic mean-field model of distance II

Authors: Shankar Bhamidi, Remco van der Hofstad, Gerard Hooghiemstra

Abstract: In this paper, we study the complete graph $K_n$ with n vertices, where we attach an independent and identically distributed (i.i.d.) weight to each of the n(n-1)/2 edges. We focus on the weight $W_n$ and the number of edges $H_n$ of the minimal weight path between vertex 1 and vertex n. It is shown in (Ann. Appl. Probab. 22 (2012) 29-69) that when the weights on the edges are i.i.d. with distribu… ▽ More In this paper, we study the complete graph $K_n$ with n vertices, where we attach an independent and identically distributed (i.i.d.) weight to each of the n(n-1)/2 edges. We focus on the weight $W_n$ and the number of edges $H_n$ of the minimal weight path between vertex 1 and vertex n. It is shown in (Ann. Appl. Probab. 22 (2012) 29-69) that when the weights on the edges are i.i.d. with distribution equal to that of $E^s$, where $s>0$ is some parameter, and E has an exponential distribution with mean 1, then $H_n$ is asymptotically normal with asymptotic mean $s\log n$ and asymptotic variance $s^2\log n$. In this paper, we analyze the situation when the weights have distribution $E^{-s},s>0$, in which case the behavior of $H_n$ is markedly different as $H_n$ is a tight sequence of random variables. More precisely, we use the method of Stein-Chen for Poisson approximations to show that, for almost all $s>0$, the hopcount $H_n$ converges in probability to the nearest integer of s+1 greater than or equal to 2, and identify the limiting distribution of the recentered and rescaled minimal weight. For a countable set of special s values denoted by $\mathcal{S}=\{s_j\}_{j\geq2}$, the hopcount $H_n$ takes on the values j and j+1 each with positive probability. △ Less

Submitted 20 March, 2013; v1 submitted 21 September, 2010; originally announced September 2010.

Comments: Published in at http://dx.doi.org/10.3150/11-BEJ402 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ402

Journal ref: Bernoulli 2013, Vol. 19, No. 2, 363-386

arXiv:1005.4104 [pdf, ps, other]

First passage percolation on the Erdős-Rényi random graph

Authors: Shankar Bhamidi, Remco van der Hofstad, Gerard Hooghiemstra

Abstract: In this paper we explore first passage percolation (FPP) on the Erdős-Rényi random graph $G_n(p_n)$, where each edge is given an independent exponential edge weight with rate 1. In the sparse regime, i.e., when $np_n\to λ>1,$ we find refined asymptotics both for the minimal weight of the path between uniformly chosen vertices in the giant component, as well as for the hopcount (i.e., the number of… ▽ More In this paper we explore first passage percolation (FPP) on the Erdős-Rényi random graph $G_n(p_n)$, where each edge is given an independent exponential edge weight with rate 1. In the sparse regime, i.e., when $np_n\to λ>1,$ we find refined asymptotics both for the minimal weight of the path between uniformly chosen vertices in the giant component, as well as for the hopcount (i.e., the number of edges) on this minimal weight path. More precisely, we prove a central limit theorem for the hopcount, with asymptotic mean and variance both equal to $λ/(λ-1)\log{n}$. Furthermore, we prove that the minimal weight centered by $\log{n}/(λ-1)$ converges in distribution. We also investigate the dense regime, where $np_n \to \infty$. We find that although the base graph is a {\it ultra small} (meaning that graph distances between uniformly chosen vertices are $o(\log{n})$), attaching random edge weights changes the geometry of the network completely. Indeed, the hopcount $H_n$ satisfies the universality property that whatever be the value of $p_n$, \ $H_n/\log{n}\to 1$ in probability and, more precisely, $(H_n-β_n\log{n})/\sqrt{\log{n}}$, where $β_n=λ_n/(λ_n-1)$, has a limiting standard normal distribution. The constant $β_n$ can be replaced by 1 precisely when $λ_n\gg \sqrt{\log{n}}$, a case that has appeared in the literature (under stronger conditions on $λ_n$). We also find bounds for the maximal weight and maximal hopcount between vertices in the graph. This paper continues the investigation of FPP initiated by the authors. Compared to the setting on the configuration model studied in \cite{BHHS08}, the proofs presented here are much simpler due to a direct relation between FPP on the Erdős-Rényi random graph and thinned continuous-time branching processes. △ Less

Submitted 22 May, 2010; originally announced May 2010.

MSC Class: 60C05; 05C80; 90B15

arXiv:1002.4362 [pdf, ps, other]

doi 10.1214/10-AAP753

Weak disorder asymptotics in the stochastic mean-field model of distance

Authors: Shankar Bhamidi, Remco van der Hofstad

Abstract: In the recent past, there has been a concerted effort to develop mathematical models for real-world networks and to analyze various dynamics on these models. One particular problem of significant importance is to understand the effect of random edge lengths or costs on the geometry and flow transporting properties of the network. Two different regimes are of great interest, the weak disorder regim… ▽ More In the recent past, there has been a concerted effort to develop mathematical models for real-world networks and to analyze various dynamics on these models. One particular problem of significant importance is to understand the effect of random edge lengths or costs on the geometry and flow transporting properties of the network. Two different regimes are of great interest, the weak disorder regime where optimality of a path is determined by the sum of edge weights on the path and the strong disorder regime where optimality of a path is determined by the maximal edge weight on the path. In the context of the stochastic mean-field model of distance, we provide the first mathematically tractable model of weak disorder and show that no transition occurs at finite temperature. Indeed, we show that for every finite temperature, the number of edges on the minimal weight path (i.e., the hopcount) is $Θ(\log{n})$ and satisfies a central limit theorem with asymptotic means and variances of order $Θ(\log{n})$, with limiting constants expressible in terms of the Malthusian rate of growth and the mean of the stable-age distribution of an associated continuous-time branching process. More precisely, we take independent and identically distributed edge weights with distribution $E^s$ for some parameter $s>0$, where $E$ is an exponential random variable with mean 1. Then the asymptotic mean and variance of the central limit theorem for the hopcount are $s\log{n}$ and $s^2\log{n}$, respectively. We also find limiting distributional asymptotics for the value of the minimal weight path in terms of extreme value distributions and martingale limits of branching processes. △ Less

Submitted 7 March, 2012; v1 submitted 23 February, 2010; originally announced February 2010.

Comments: Published in at http://dx.doi.org/10.1214/10-AAP753 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AAP-AAP753

Journal ref: Annals of Applied Probability 2012, Vol. 22, No. 1, 29-69

Showing 1–50 of 59 results for author: Bhamidi, S