Search | arXiv e-print repository

Computational Inference for Directions in Canonical Correlation Analysis

Authors: Daniel Kessler, Elizaveta Levina

Abstract: Canonical Correlation Analysis (CCA) is a method for analyzing pairs of random vectors; it learns a sequence of paired linear transformations such that the resultant canonical variates are maximally correlated within pairs while uncorrelated across pairs. CCA outputs both canonical correlations as well as the canonical directions which define the transformations. While inference for canonical corr… ▽ More Canonical Correlation Analysis (CCA) is a method for analyzing pairs of random vectors; it learns a sequence of paired linear transformations such that the resultant canonical variates are maximally correlated within pairs while uncorrelated across pairs. CCA outputs both canonical correlations as well as the canonical directions which define the transformations. While inference for canonical correlations is well developed, conducting inference for canonical directions is more challenging and not well-studied, but is key to interpretability. We propose a computational bootstrap method (combootcca) for inference on CCA directions. We conduct thorough simulation studies that range from simple and well-controlled to complex but realistic and validate the statistical properties of combootcca while comparing it to several competitors. We also apply the combootcca method to a brain imaging dataset and discover linked patterns in brain connectivity and behavioral scores. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2305.08791 [pdf, other]

Fair Information Spread on Social Networks with Community Structure

Authors: Octavio Mesner, Elizaveta Levina, Ji Zhu

Abstract: Information spread through social networks is ubiquitous. Influence maximiza- tion (IM) algorithms aim to identify individuals who will generate the greatest spread through the social network if provided with information, and have been largely devel- oped with marketing in mind. In social networks with community structure, which are very common, IM algorithms focused solely on maximizing spread ma… ▽ More Information spread through social networks is ubiquitous. Influence maximiza- tion (IM) algorithms aim to identify individuals who will generate the greatest spread through the social network if provided with information, and have been largely devel- oped with marketing in mind. In social networks with community structure, which are very common, IM algorithms focused solely on maximizing spread may yield signifi- cant disparities in information coverage between communities, which is problematic in settings such as public health messaging. While some IM algorithms aim to remedy disparity in information coverage using node attributes, none use the empirical com- munity structure within the network itself, which may be beneficial since communities directly affect the spread of information. Further, the use of empirical network struc- ture allows us to leverage community detection techniques, making it possible to run fair-aware algorithms when there are no relevant node attributes available, or when node attributes do not accurately capture network community structure. In contrast to other fair IM algorithms, this work relies on fitting a model to the social network which is then used to determine a seed allocation strategy for optimal fair information spread. We develop an algorithm to determine optimal seed allocations for expected fair coverage, defined through maximum entropy, provide some theoretical guarantees under appropriate conditions, and demonstrate its empirical accuracy on both simu- lated and real networks. Because this algorithm relies on a fitted network model and not on the network directly, it is well-suited for partially observed and noisy social networks. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2303.05909 [pdf, other]

A pseudo-likelihood approach to community detection in weighted networks

Authors: Andressa Cerqueira, Elizaveta Levina

Abstract: Community structure is common in many real networks, with nodes clustered in groups sharing the same connections patterns. While many community detection methods have been developed for networks with binary edges, few of them are applicable to networks with weighted edges, which are common in practice. We propose a pseudo-likelihood community estimation algorithm derived under the weighted stochas… ▽ More Community structure is common in many real networks, with nodes clustered in groups sharing the same connections patterns. While many community detection methods have been developed for networks with binary edges, few of them are applicable to networks with weighted edges, which are common in practice. We propose a pseudo-likelihood community estimation algorithm derived under the weighted stochastic block model for networks with normally distributed edge weights, extending the pseudo-likelihood algorithm for binary networks, which offers some of the best combinations of accuracy and computational efficiency. We prove that the estimates obtained by the proposed method are consistent under the assumption of homogeneous networks, a weighted analogue of the planted partition model, and show that they work well in practice for both homogeneous and heterogeneous networks. We illustrate the method on simulated networks and on a fMRI dataset, where edge weights represent connectivity between brain regions and are expected to be close to normal in distribution by construction. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.10095 [pdf, other]

Conformal Prediction for Network-Assisted Regression

Authors: Robert Lunde, Elizaveta Levina, Ji Zhu

Abstract: An important problem in network analysis is predicting a node attribute using both network covariates, such as graph embedding coordinates or local subgraph counts, and conventional node covariates, such as demographic characteristics. While standard regression methods that make use of both types of covariates may be used for prediction, statistical inference is complicated by the fact that the no… ▽ More An important problem in network analysis is predicting a node attribute using both network covariates, such as graph embedding coordinates or local subgraph counts, and conventional node covariates, such as demographic characteristics. While standard regression methods that make use of both types of covariates may be used for prediction, statistical inference is complicated by the fact that the nodal summary statistics are often dependent in complex ways. We show that under a mild joint exchangeability assumption, a network analog of conformal prediction achieves finite sample validity for a wide range of network covariates. We also show that a form of asymptotic conditional validity is achievable. The methods are illustrated on both simulated networks and a citation network dataset. △ Less

Submitted 22 February, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Typos in Appendix corrected

arXiv:2210.17519 [pdf, other]

Predicting Responses from Weighted Networks with Node Covariates in an Application to Neuroimaging

Authors: Daniel Kessler, Keith Levin, Elizaveta Levina

Abstract: We consider the setting where many networks are observed on a common node set, and each observation comprises edge weights of a network, covariates observed at each node, and an overall response. The goal is to use the edge weights and node covariates to predict the response while identifying an interpretable set of predictive features. Our motivating application is neuroimaging, where edge weight… ▽ More We consider the setting where many networks are observed on a common node set, and each observation comprises edge weights of a network, covariates observed at each node, and an overall response. The goal is to use the edge weights and node covariates to predict the response while identifying an interpretable set of predictive features. Our motivating application is neuroimaging, where edge weights encode functional connectivity measured between brain regions, node covariates encode task activations at each brain region, and the response is disease status or score on a behavioral task. We propose an approach that constructs feature groups based on assumed community structure (naturally occurring in neuroimaging applications). We propose two feature grou** schemes that incorporate both edge weights and node covariates, and we derive algorithms for optimization using an overlap** group LASSO penalty. Empirical results on synthetic data show that our method, relative to competing approaches, has similar or improved prediction error along with superior support recovery, enabling a more interpretable and potentially more accurate understanding of the underlying process. We also apply the method to neuroimaging data from the Human Connectome Project. Our approach is widely applicable in neuroimaging where interpretability is highly desired. △ Less

Submitted 22 August, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.07491 [pdf, other]

Latent process models for functional network data

Authors: Peter W. MacDonald, Elizaveta Levina, Ji Zhu

Abstract: Network data are often sampled with auxiliary information or collected through the observation of a complex system over time, leading to multiple network snapshots indexed by a continuous variable. Many methods in statistical network analysis are traditionally designed for a single network, and can be applied to an aggregated network in this setting, but that approach can miss important functional… ▽ More Network data are often sampled with auxiliary information or collected through the observation of a complex system over time, leading to multiple network snapshots indexed by a continuous variable. Many methods in statistical network analysis are traditionally designed for a single network, and can be applied to an aggregated network in this setting, but that approach can miss important functional structure. Here we develop an approach to estimating the expected network explicitly as a function of a continuous index, be it time or another indexing variable. We parameterize the network expectation through low dimensional latent processes, whose components we represent with a fixed, finite-dimensional functional basis. We derive a gradient descent estimation algorithm, establish theoretical guarantees for recovery of the low-dimensional structure, compare our method to competitors, and apply it to a dataset of international political interactions over time, showing our proposed method to adapt well to data, outperform competitors, and provide interpretable and meaningful results. △ Less

Submitted 30 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: 64 pages, 19 figures; typos corrected, literature review updated

arXiv:2206.13088 [pdf, other]

Network resampling for estimating uncertainty

Authors: Qianhua Shan, Elizaveta Levina

Abstract: With network data becoming ubiquitous in many applications, many models and algorithms for network analysis have been proposed. Yet methods for providing uncertainty estimates in addition to point estimates of network parameters are much less common. While bootstrap and other resampling procedures have been an effective general tool for estimating uncertainty from i.i.d. samples, adapting them to… ▽ More With network data becoming ubiquitous in many applications, many models and algorithms for network analysis have been proposed. Yet methods for providing uncertainty estimates in addition to point estimates of network parameters are much less common. While bootstrap and other resampling procedures have been an effective general tool for estimating uncertainty from i.i.d. samples, adapting them to networks is highly nontrivial. In this work, we study three different network resampling procedures for uncertainty estimation, and propose a general algorithm to construct confidence intervals for network parameters through network resampling. We also propose an algorithm for selecting the sampling fraction, which has a substantial effect on performance. We find that, unsurprisingly, no one procedure is empirically best for all tasks, but that selecting an appropriate sampling fraction substantially improves performance in many cases. We illustrate this on simulated networks and on Facebook data. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2205.14220 [pdf, other]

Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging

Authors: Snigdha Panigrahi, Natasha Stewart, Chandra Sekhar Sripada, Elizaveta Levina

Abstract: Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertaint… ▽ More Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Brain Cognitive Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods. △ Less

Submitted 9 August, 2023; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: 46 Pages, 11 Figures, 3 Tables

arXiv:2012.14409 [pdf, other]

Latent space models for multiplex networks with shared structure

Authors: Peter W. MacDonald, Elizaveta Levina, Ji Zhu

Abstract: Latent space models are frequently used for modeling single-layer networks and include many popular special cases, such as the stochastic block model and the random dot product graph. However, they are not well-developed for more complex network structures, which are becoming increasingly common in practice. Here we propose a new latent space model for multiplex networks: multiple, heterogeneous n… ▽ More Latent space models are frequently used for modeling single-layer networks and include many popular special cases, such as the stochastic block model and the random dot product graph. However, they are not well-developed for more complex network structures, which are becoming increasingly common in practice. Here we propose a new latent space model for multiplex networks: multiple, heterogeneous networks observed on a shared node set. Multiplex networks can represent a network sample with shared node labels, a network evolving over time, or a network with multiple types of edges. The key feature of our model is that it learns from data how much of the network structure is shared between layers and pools information across layers as appropriate. We establish identifiability, develop a fitting procedure using convex optimization in combination with a nuclear norm penalty, and prove a guarantee of recovery for the latent positions as long as there is sufficient separation between the shared and the individual latent subspaces. We compare the model to competing methods in the literature on simulated networks and on a multiplex network describing the worldwide trade of agricultural products. △ Less

Submitted 7 July, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

Comments: 41 pages, 8 figures

arXiv:2009.10641 [pdf, other]

doi 10.1007/s13171-021-00245-4

Overlap** community detection in networks via sparse spectral decomposition

Authors: Jesús Arroyo, Elizaveta Levina

Abstract: We consider the problem of estimating overlap** community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient,… ▽ More We consider the problem of estimating overlap** community memberships in a network, where each node can belong to multiple communities. More than a few communities per node are difficult to both estimate and interpret, so we focus on sparse node membership vectors. Our algorithm is based on sparse principal subspace estimation with iterative thresholding. The method is computationally efficient, with a computational cost equivalent to estimating the leading eigenvectors of the adjacency matrix, and does not require an additional clustering step, unlike spectral clustering methods. We show that a fixed point of the algorithm corresponds to correct node memberships under a version of the stochastic block model. The methods are evaluated empirically on simulated and real-world networks, showing good statistical performance and computational efficiency. △ Less

Submitted 15 February, 2021; v1 submitted 20 September, 2020; originally announced September 2020.

arXiv:2008.03652 [pdf, other]

Community models for networks observed through edge nominations

Authors: Tianxi Li, Elizaveta Levina, Ji Zhu

Abstract: Communities are a common and widely studied structure in networks, typically under the assumption that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sampled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noi… ▽ More Communities are a common and widely studied structure in networks, typically under the assumption that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sampled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noise and bias which can obscure the community structure and invalidate assumptions underlying standard community detection methods. We propose a general model for a class of network sampling mechanisms based on recording edges via querying nodes, designed to improve community detection for network data collected in this fashion. We model edge sampling probabilities as a function of both individual preferences and community parameters, and show community detection can be performed by spectral clustering under this general class of models. We also propose, as a special case of the general framework, a parametric model for directed networks we call the nomination stochastic block model, which allows for meaningful parameter interpretations and can be fitted by the method of moments. Both spectral clustering and the method of moments in this case are computationally efficient and come with theoretical guarantees of consistency. We evaluate the proposed model in simulation studies on both unweighted and weighted networks and apply it to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools. △ Less

Submitted 18 March, 2021; v1 submitted 9 August, 2020; originally announced August 2020.

arXiv:2002.01645 [pdf, other]

Simultaneous prediction and community detection for networks with application to neuroimaging

Authors: Jesús Arroyo, Elizaveta Levina

Abstract: Community structure in networks is observed in many different domains, and unsupervised community detection has received a lot of attention in the literature. Increasingly the focus of network analysis is shifting towards using network information in some other prediction or inference task rather than just analyzing the network itself. In particular, in neuroimaging applications brain networks are… ▽ More Community structure in networks is observed in many different domains, and unsupervised community detection has received a lot of attention in the literature. Increasingly the focus of network analysis is shifting towards using network information in some other prediction or inference task rather than just analyzing the network itself. In particular, in neuroimaging applications brain networks are available for multiple subjects and the goal is often to predict a phenotype of interest. Community structure is well known to be a feature of brain networks, typically corresponding to different regions of the brain responsible for different functions. There are standard parcellations of the brain into such regions, usually obtained by applying clustering methods to brain connectomes of healthy subjects. However, when the goal is predicting a phenotype or distinguishing between different conditions, these static communities from an unrelated set of healthy subjects may not be the most useful for prediction. Here we present a method for supervised community detection, aiming to find a partition of the network into communities that is most useful for predicting a particular response. We use a block-structured regularization penalty combined with a prediction loss function, and compute the solution with a combination of a spectral method and an ADMM optimization algorithm. We show that the spectral clustering method recovers the correct communities under a weighted stochastic block model. The method performs well on both simulated and real brain networks, providing support for the idea of task-dependent brain regions. △ Less

Submitted 27 February, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

arXiv:1910.07434 [pdf, other]

Matrix Means and a Novel High-Dimensional Shrinkage Phenomenon

Authors: Asad Lodhia, Keith Levin, Elizaveta Levina

Abstract: Many statistical settings call for estimating a population parameter, most typically the population mean, based on a sample of matrices. The most natural estimate of the population mean is the arithmetic mean, but there are many other matrix means that may behave differently, especially in high dimensions. Here we consider the matrix harmonic mean as an alternative to the arithmetic matrix mean. W… ▽ More Many statistical settings call for estimating a population parameter, most typically the population mean, based on a sample of matrices. The most natural estimate of the population mean is the arithmetic mean, but there are many other matrix means that may behave differently, especially in high dimensions. Here we consider the matrix harmonic mean as an alternative to the arithmetic matrix mean. We show that in certain high-dimensional regimes, the harmonic mean yields an improvement over the arithmetic mean in estimation error as measured by the operator norm. Counter-intuitively, studying the asymptotic behavior of these two matrix means in a spiked covariance estimation problem, we find that this improvement in operator norm error does not imply better recovery of the leading eigenvector. We also show that a Rao-Blackwellized version of the harmonic mean is equivalent to a linear shrinkage estimator studied previously in the high-dimensional covariance estimation literature, while applying a similar Rao-Blackwellization to regularized sample covariance matrices yields a novel nonlinear shrinkage estimator. Simulations complement the theoretical results, illustrating the conditions under which the harmonic matrix mean yields an empirically better estimate. △ Less

Submitted 15 July, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: 29 pages, 5 figures

MSC Class: 60B20; 62H12; 47N30

arXiv:1907.10821 [pdf, other]

Bootstrap** Networks with Latent Space Structure

Authors: Keith Levin, Elizaveta Levina

Abstract: A core problem in statistical network analysis is to develop network analogues of classical techniques. The problem of bootstrap** network data stands out as especially challenging, since typically one observes only a single network, rather than a sample. Here we propose two methods for obtaining bootstrap samples for networks drawn from latent space models. The first method generates bootstrap… ▽ More A core problem in statistical network analysis is to develop network analogues of classical techniques. The problem of bootstrap** network data stands out as especially challenging, since typically one observes only a single network, rather than a sample. Here we propose two methods for obtaining bootstrap samples for networks drawn from latent space models. The first method generates bootstrap replicates of network statistics that can be represented as U-statistics in the latent positions, and avoids actually constructing new bootstrapped networks. The second method generates bootstrap replicates of whole networks, and thus can be used for bootstrap** any network function. Commonly studied network quantities that can be represented as U-statistics include many popular summaries, such as average degree and subgraph counts, but other equally popular summaries, such as the clustering coefficient, are not expressible as U-statistics and thus require the second bootstrap method. Under the assumption of a random dot product graph, a type of latent space network model, we show consistency of the proposed bootstrap methods. We give motivating examples throughout and demonstrate the effectiveness of our methods on synthetic data. △ Less

Submitted 11 October, 2021; v1 submitted 24 July, 2019; originally announced July 2019.

arXiv:1907.02443 [pdf, other]

High-dimensional Gaussian graphical model for network-linked data

Authors: Tianxi Li, Cheng Qian, Elizaveta Levina, Ji Zhu

Abstract: Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to… ▽ More Graphical models are commonly used to represent conditional dependence relationships between variables. There are multiple methods available for exploring them from high-dimensional data, but almost all of them rely on the assumption that the observations are independent and identically distributed. At the same time, observations connected by a network are becoming increasingly common, and tend to violate these assumptions. Here we develop a Gaussian graphical model for observations connected by a network with potentially different mean vectors, varying smoothly over the network. We propose an efficient estimation algorithm and demonstrate its effectiveness on both simulated and real data, obtaining meaningful and interpretable results on a statistics coauthorship network. We also prove that our method estimates both the inverse covariance matrix and the corresponding graph structure correctly under the assumption of network âcohesionâ, which refers to the empirically observed phenomenon of network neighbors sharing similar traits. △ Less

Submitted 21 April, 2020; v1 submitted 4 July, 2019; originally announced July 2019.

arXiv:1906.07265 [pdf, other]

Recovering shared structure from multiple networks with unknown edge distributions

Authors: Keith Levin, Asad Lodhia, Elizaveta Levina

Abstract: In increasingly many settings, data sets consist of multiple samples from a population of networks, with vertices aligned across these networks. For example, brain connectivity networks in neuroscience consist of measures of interaction between brain regions that have been aligned to a common template. We consider the setting where the observed networks have a shared expectation, but may differ in… ▽ More In increasingly many settings, data sets consist of multiple samples from a population of networks, with vertices aligned across these networks. For example, brain connectivity networks in neuroscience consist of measures of interaction between brain regions that have been aligned to a common template. We consider the setting where the observed networks have a shared expectation, but may differ in the noise structure on their edges. Our approach exploits the shared mean structure to denoise edge-level measurements of the observed networks and estimate the underlying population-level parameters. We also explore the extent to which edge-level errors influence estimation and downstream inference. We establish a finite-sample concentration inequality for the low-rank eigenvalue truncation of a random weighted adjacency matrix that may be of independent interest. The proposed approach is illustrated on synthetic networks and on data from an fMRI study of schizophrenia. △ Less

Submitted 8 May, 2021; v1 submitted 12 June, 2019; originally announced June 2019.

arXiv:1903.02129 [pdf, other]

Graph-aware Modeling of Brain Connectivity Networks

Authors: Yura Kim, Daniel Kessler, Elizaveta Levina

Abstract: Functional connections in the brain are frequently represented by weighted networks, with nodes representing locations in the brain, and edges representing the strength of connectivity between these locations. One challenge in analyzing such data is that inference at the individual edge level is not particularly biologically meaningful; interpretation is more useful at the level of so-called funct… ▽ More Functional connections in the brain are frequently represented by weighted networks, with nodes representing locations in the brain, and edges representing the strength of connectivity between these locations. One challenge in analyzing such data is that inference at the individual edge level is not particularly biologically meaningful; interpretation is more useful at the level of so-called functional regions, or groups of nodes and connections between them; this is often called "graph-aware" inference in the neuroimaging literature. However, pooling over functional regions leads to significant loss of information and lower accuracy. Another challenge is correlation among edge weights within a subject, which makes inference based on independence assumptions unreliable. We address both these challenges with a linear mixed effects model, which accounts for functional regions and for edge dependence, while still modeling individual edge weights to avoid loss of information. The model allows for comparing two populations, such as patients and healthy controls, both at the functional regions level and at individual edge level, leading to biologically meaningful interpretations. We fit this model to a resting state fMRI data on schizophrenics and healthy controls, obtaining interpretable results consistent with the schizophrenia literature. △ Less

Submitted 26 September, 2022; v1 submitted 5 March, 2019; originally announced March 2019.

arXiv:1810.01509 [pdf, other]

Hierarchical community detection by recursive partitioning

Authors: Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Koen Van den Berge, Purnamrita Sarkar, Peter J. Bickel, Elizaveta Levina

Abstract: The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and se… ▽ More The problem of community detection in networks is usually formulated as finding a single partition of the network into some "correct" number of communities. We argue that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead. This can be done with a simple top-down recursive partitioning algorithm, starting with a single community and separating the nodes into two communities by spectral clustering repeatedly, until a stop** rule suggests there are no further communities. This class of algorithms is model-free, computationally efficient, and requires no tuning other than selecting a stop** rule. We show that there are regimes where this approach outperforms K-way spectral clustering, and propose a natural framework for analyzing the algorithm's theoretical performance, the binary tree stochastic block model. Under this model, we prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. We apply the algorithm to a gene network based on gene co-occurrence in 1580 research papers on anemia, and identify six clusters of genes in a meaningful hierarchy. We also illustrate the algorithm on a dataset of statistics papers. △ Less

Submitted 14 May, 2020; v1 submitted 2 October, 2018; originally announced October 2018.

arXiv:1803.04084 [pdf, other]

Link prediction for egocentrically sampled networks

Authors: Yun-Jhong Wu, Elizaveta Levina, Ji Zhu

Abstract: Link prediction in networks is typically accomplished by estimating or ranking the probabilities of edges for all pairs of nodes. In practice, especially for social networks, the data are often collected by egocentric sampling, which means selecting a subset of nodes and recording all of their edges. This sampling mechanism requires different prediction tools than the typical assumption of links m… ▽ More Link prediction in networks is typically accomplished by estimating or ranking the probabilities of edges for all pairs of nodes. In practice, especially for social networks, the data are often collected by egocentric sampling, which means selecting a subset of nodes and recording all of their edges. This sampling mechanism requires different prediction tools than the typical assumption of links missing at random. We propose a new computationally efficient link prediction algorithm for egocentrically sampled networks, which estimates the underlying probability matrix by estimating its row space. For networks created by sampling rows, our method outperforms many popular link prediction and graphon estimation techniques. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:1801.08724 [pdf, ps, other]

Concentration of random graphs and application to community detection

Authors: Can M. Le, Elizaveta Levina, Roman Vershynin

Abstract: Random matrix theory has played an important role in recent work on statistical network analysis. In this paper, we review recent results on regimes of concentration of random graphs around their expectation, showing that dense graphs concentrate and sparse graphs concentrate after regularization. We also review relevant network models that may be of interest to probabilists considering directions… ▽ More Random matrix theory has played an important role in recent work on statistical network analysis. In this paper, we review recent results on regimes of concentration of random graphs around their expectation, showing that dense graphs concentrate and sparse graphs concentrate after regularization. We also review relevant network models that may be of interest to probabilists considering directions for new random matrix theory developments, and random matrix theory tools that may be of interest to statisticians looking to prove properties of network algorithms. Applications of concentration results to the problem of community detection in networks are discussed in detail. △ Less

Submitted 26 January, 2018; originally announced January 2018.

Comments: Submission for International Congress of Mathematicians, Rio de Janeiro, Brazil 2018

arXiv:1710.04765 [pdf, ps, other]

Estimating a network from multiple noisy realizations

Authors: Can M. Le, Keith Levin, Elizaveta Levina

Abstract: Complex interactions between entities are often represented as edges in a network. In practice, the network is often constructed from noisy measurements and inevitably contains some errors. In this paper we consider the problem of estimating a network from multiple noisy observations where edges of the original network are recorded with both false positives and false negatives. This problem is mot… ▽ More Complex interactions between entities are often represented as edges in a network. In practice, the network is often constructed from noisy measurements and inevitably contains some errors. In this paper we consider the problem of estimating a network from multiple noisy observations where edges of the original network are recorded with both false positives and false negatives. This problem is motivated by neuroimaging applications where brain networks of a group of patients with a particular brain condition could be viewed as noisy versions of an unobserved true network corresponding to the disease. The key to optimally leveraging these multiple observations is to take advantage of network structure, and here we focus on the case where the true network contains communities. Communities are common in real networks in general and in particular are believed to be presented in brain networks. Under a community structure assumption on the truth, we derive an efficient method to estimate the noise levels and the original network, with theoretical guarantees on the convergence of our estimates. We show on synthetic networks that the performance of our method is close to an oracle method using the true parameter values, and apply our method to fMRI brain data, demonstrating that it constructs stable and plausible estimates of the population network. △ Less

Submitted 10 December, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

MSC Class: 62H12; 62H30; 62F12

arXiv:1705.06772 [pdf, other]

Generalized linear models with low rank effects for network data

Authors: Yun-Jhong Wu, Elizaveta Levina, Ji Zhu

Abstract: Networks are a useful representation for data on connections between units of interests, but the observed connections are often noisy and/or include missing values. One common approach to network analysis is to treat the network as a realization from a random graph model, and estimate the underlying edge probability matrix, which is sometimes referred to as network denoising. Here we propose a gen… ▽ More Networks are a useful representation for data on connections between units of interests, but the observed connections are often noisy and/or include missing values. One common approach to network analysis is to treat the network as a realization from a random graph model, and estimate the underlying edge probability matrix, which is sometimes referred to as network denoising. Here we propose a generalized linear model with low rank effects to model network edges. This model can be applied to various types of networks, including directed and undirected, binary and weighted, and it can naturally utilize additional information such as node and/or edge covariates. We develop an efficient projected gradient ascent algorithm to fit the model, establish asymptotic consistency, and demonstrate empirical performance of the method on both simulated and real networks. △ Less

Submitted 18 May, 2017; originally announced May 2017.

arXiv:1701.08140 [pdf, other]

doi 10.1214/19-AOAS1252

Network classification with applications to brain connectomics

Authors: Jesús D. Arroyo-Relión, Daniel Kessler, Elizaveta Levina, Stephan F. Taylor

Abstract: While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imagin… ▽ More While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has shown the potential of such networks to distinguish between various brain disorders, giving rise to a network classification problem. Existing approaches tend to either treat all edge weights as a long vector, ignoring the network structure, or focus on graph topology as represented by summary measures while ignoring the edge weights. Our goal is to design a classification method that uses both the individual edge information and the network structure of the data in a computationally efficient way, and that can produce a parsimonious and interpretable representation of differences in brain connectivity patterns between classes. We propose a graph classification method that uses edge weights as predictors but incorporates the network nature of the data via penalties that promote sparsity in the number of nodes, in addition to the usual sparsity penalties that encourage selection of edges. We implement the method via efficient convex optimization and provide a detailed analysis of data from two fMRI studies of schizophrenia. △ Less

Submitted 1 February, 2019; v1 submitted 27 January, 2017; originally announced January 2017.

arXiv:1612.04717 [pdf, other]

Network cross-validation by edge sampling

Authors: Tianxi Li, Elizaveta Levina, Ji Zhu

Abstract: While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. Here we propose a new netw… ▽ More While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. Here we propose a new network resampling strategy based on splitting node pairs rather than nodes applicable to cross-validation for a wide range of network model selection tasks. We provide a theoretical justification for our method in a general setting and examples of how our method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a citation network of statisticians show that this cross-validation approach works well for model selection. △ Less

Submitted 1 May, 2020; v1 submitted 14 December, 2016; originally announced December 2016.

arXiv:1602.01192 [pdf, other]

Prediction models for network-linked data

Authors: Tianxi Li, Elizaveta Levina, Ji Zhu

Abstract: Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects' social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends… ▽ More Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects' social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account in prediction models should allow us to improve their performance. Here we propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over traditional models both theoretically and empirically when network cohesion is present. The penalty can be used with many loss-based prediction methods, such as regression, generalized linear models, and Cox's proportional hazard model. Applications to predicting levels of recreational activity and marijuana usage among teenagers from the AddHealth study based on both demographic covariates and friendship networks are discussed in detail and show that our approach to taking friendships into account can significantly improve predictions of behavior while providing interpretable estimates of covariate effects. △ Less

Submitted 25 June, 2018; v1 submitted 3 February, 2016; originally announced February 2016.

arXiv:1509.08588 [pdf, other]

Estimating network edge probabilities by neighborhood smoothing

Authors: Yuan Zhang, Elizaveta Levina, Ji Zhu

Abstract: The estimation of probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but this is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally eff… ▽ More The estimation of probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but this is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally efficient method, based on neighborhood smoothing to estimate the expectation of the adjacency matrix directly, without making the structural assumptions that graphon estimation requires. The neighborhood smoothing method requires little tuning, has a competitive mean-squared error rate, and outperforms many benchmark methods on link prediction in simulated and real networks. △ Less

Submitted 8 July, 2017; v1 submitted 29 September, 2015; originally announced September 2015.

Comments: 22 pages, 4 figures, 3 table

arXiv:1509.04828 [pdf, ps, other]

doi 10.1214/13-AOAS700

Estimating heterogeneous graphical models for discrete data with an application to roll call voting

Authors: Jian Guo, Jie Cheng, Elizaveta Levina, George Michailidis, Ji Zhu

Abstract: We consider the problem of jointly estimating a collection of graphical models for discrete data, corresponding to several categories that share some common structure. An example for such a setting is voting records of legislators on different issues, such as defense, energy, and healthcare. We develop a Markov graphical model to characterize the heterogeneous dependence structures arising from su… ▽ More We consider the problem of jointly estimating a collection of graphical models for discrete data, corresponding to several categories that share some common structure. An example for such a setting is voting records of legislators on different issues, such as defense, energy, and healthcare. We develop a Markov graphical model to characterize the heterogeneous dependence structures arising from such data. The model is fitted via a joint estimation method that preserves the underlying common graph structure, but also allows for differences between the networks. The method employs a group penalty that targets the common zero interaction effects across all the networks. We apply the method to describe the internal networks of the U.S. Senate on several important issues. Our analysis reveals individual structure for each issue, distinct from the underlying well-known bipartisan structure common to all categories which we are able to extract separately. We also establish consistency of the proposed method both for parameter estimation and model selection, and evaluate its numerical performance on a number of simulated examples. △ Less

Submitted 16 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/13-AOAS700 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS700

Journal ref: Annals of Applied Statistics 2015, Vol. 9, No. 2, 821-848

arXiv:1509.01173 [pdf, other]

Community Detection in Networks with Node Features

Authors: Yuan Zhang, Elizaveta Levina, Ji Zhu

Abstract: Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that uses both the network edge information and the node features to detect community structures. One advantage our method has over existing joint d… ▽ More Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that uses both the network edge information and the node features to detect community structures. One advantage our method has over existing joint detection approaches is the flexibility of learning the impact of different features which may differ across communities. Another advantage is the flexibility of choosing the amount of influence the feature information has on communities. The method is asymptotically consistent under the block model with additional assumptions on the feature distributions, and performs well on simulated and real networks. △ Less

Submitted 3 September, 2015; originally announced September 2015.

Comments: 16 pages, 5 pages

Journal ref: Electronic Journal of Statistics, Volume 10, Number 2 (2016), 3153-3178

arXiv:1507.00827 [pdf, ps, other]

Estimating the number of communities in networks by spectral methods

Authors: Can M. Le, Elizaveta Levina

Abstract: Community detection is a fundamental problem in network analysis with many methods available to estimate communities. Most of these methods assume that the number of communities is known, which is often not the case in practice. We study a simple and very fast method for estimating the number of communities based on the spectral properties of certain graph operators, such as the non-backtracking m… ▽ More Community detection is a fundamental problem in network analysis with many methods available to estimate communities. Most of these methods assume that the number of communities is known, which is often not the case in practice. We study a simple and very fast method for estimating the number of communities based on the spectral properties of certain graph operators, such as the non-backtracking matrix and the Bethe Hessian matrix. We show that the method performs well under several models and a wide range of parameters, and is guaranteed to be consistent under several asymptotic regimes. We compare this method to several existing methods for estimating the number of communities and show that it is both more accurate and more computationally efficient. △ Less

Submitted 14 November, 2019; v1 submitted 3 July, 2015; originally announced July 2015.

MSC Class: 62H12; 62H30

arXiv:1506.00669 [pdf, other]

Concentration and regularization of random graphs

Authors: Can M. Le, Elizaveta Levina, Roman Vershynin

Abstract: This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erdös-Rényi random graphs on $n$ vertices, where edges form independently and possibly with different probabilities $p_{ij}$. Sparse random graphs whose expected degrees are… ▽ More This paper studies how close random graphs are typically to their expectations. We interpret this question through the concentration of the adjacency and Laplacian matrices in the spectral norm. We study inhomogeneous Erdös-Rényi random graphs on $n$ vertices, where edges form independently and possibly with different probabilities $p_{ij}$. Sparse random graphs whose expected degrees are $o(\log n)$ fail to concentrate; the obstruction is caused by vertices with abnormally high and low degrees. We show that concentration can be restored if we regularize the degrees of such vertices, and one can do this in various ways. As an example, let us reweight or remove enough edges to make all degrees bounded above by $O(d)$ where $d=\max np_{ij}$. Then we show that the resulting adjacency matrix $A'$ concentrates with the optimal rate: $\|A' - \mathbb{E} A\| = O(\sqrt{d})$. Similarly, if we make all degrees bounded below by $d$ by adding weight $d/n$ to all edges, then the resulting Laplacian concentrates with the optimal rate: $\|L(A') - L(\mathbb{E} A')\| = O(1/\sqrt{d})$. Our approach is based on Grothendieck-Pietsch factorization, using which we construct a new decomposition of random graphs. We illustrate the concentration results with an application to the community detection problem in the analysis of networks. △ Less

Submitted 9 August, 2016; v1 submitted 1 June, 2015; originally announced June 2015.

Comments: 21 pages. Elizaveta Levina is added as a co-author. Application to community detection of networks is expanded

MSC Class: 05C80; 60B20; 05C85

arXiv:1502.03049 [pdf, other]

Sparse random graphs: regularization and concentration of the Laplacian

Authors: Can M. Le, Elizaveta Levina, Roman Vershynin

Abstract: We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the graph adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of node degrees. It has been empirically observed that simply adding a constant of order $1/n$ to each entry of th… ▽ More We study random graphs with possibly different edge probabilities in the challenging sparse regime of bounded expected degrees. Unlike in the dense case, neither the graph adjacency matrix nor its Laplacian concentrate around their expectations due to the highly irregular distribution of node degrees. It has been empirically observed that simply adding a constant of order $1/n$ to each entry of the adjacency matrix substantially improves the behavior of Laplacian. Here we prove that this regularization indeed forces Laplacian to concentrate even in sparse graphs. As an immediate consequence in network analysis, we establish the validity of one of the simplest and fastest approaches to community detection -- regularized spectral clustering, under the stochastic block model. Our proof of concentration of regularized Laplacian is based on Grothendieck's inequality and factorization, combined with paving arguments. △ Less

Submitted 23 April, 2015; v1 submitted 10 February, 2015; originally announced February 2015.

Comments: Added references

MSC Class: 05C80; 05C85; 60B20; 62H30

arXiv:1412.3432 [pdf, other]

Detecting Overlap** Communities in Networks Using Spectral Methods

Authors: Yuan Zhang, Elizaveta Levina, Ji Zhu

Abstract: Community detection is a fundamental problem in network analysis which is made more challenging by overlaps between communities which often occur in practice. Here we propose a general, flexible, and interpretable generative model for overlap** communities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectral algorithm for e… ▽ More Community detection is a fundamental problem in network analysis which is made more challenging by overlaps between communities which often occur in practice. Here we propose a general, flexible, and interpretable generative model for overlap** communities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectral algorithm for estimating the community memberships, which deals with the overlaps by employing the K-medians algorithm rather than the usual K-means for clustering in the spectral domain. We show that the algorithm is asymptotically consistent when networks are not too sparse and the overlaps between communities not too large. Numerical experiments on both simulated networks and many real social networks demonstrate that our method performs very well compared to a number of benchmark methods for overlap** community detection. △ Less

Submitted 12 March, 2015; v1 submitted 10 December, 2014; originally announced December 2014.

Comments: 29 pages, 2 figures, 3 tables

arXiv:1406.5647 [pdf, ps, other]

On semidefinite relaxations for the block model

Authors: Arash A. Amini, Elizaveta Levina

Abstract: The stochastic block model (SBM) is a popular tool for community detection in networks, but fitting it by maximum likelihood (MLE) involves a computationally infeasible optimization problem. We propose a new semidefinite programming (SDP) solution to the problem of fitting the SBM, derived as a relaxation of the MLE. We put ours and previously proposed SDPs in a unified framework, as relaxations o… ▽ More The stochastic block model (SBM) is a popular tool for community detection in networks, but fitting it by maximum likelihood (MLE) involves a computationally infeasible optimization problem. We propose a new semidefinite programming (SDP) solution to the problem of fitting the SBM, derived as a relaxation of the MLE. We put ours and previously proposed SDPs in a unified framework, as relaxations of the MLE over various sub-classes of the SBM, revealing a connection to sparse PCA. Our main relaxation, which we call SDP-1, is tighter than other recently proposed SDP relaxations, and thus previously established theoretical guarantees carry over. However, we show that SDP-1 exactly recovers true communities over a wider class of SBMs than those covered by current results. In particular, the assumption of strong assortativity of the SBM, implicit in consistency conditions for previously proposed SDPs, can be relaxed to weak assortativity for our approach, thus significantly broadening the class of SBMs covered by the consistency results. We also show that strong assortativity is indeed a necessary condition for exact recovery for previously proposed SDP approaches and not an artifact of the proofs. Our analysis of SDPs is based on primal-dual witness constructions, which provides some insight into the nature of the solutions of various SDPs. We show how to combine features from SDP-1 and already available SDPs to achieve the most flexibility in terms of both assortativity and block-size constraints, as our relaxation has the tendency to produce communities of similar sizes. This tendency makes it the ideal tool for fitting network histograms, a method gaining popularity in the graphon estimation literature, as we illustrate on an example of a social networks of dolphins. We also provide empirical evidence that SDPs outperform spectral methods for fitting SBMs with a large number of blocks. △ Less

Submitted 16 March, 2016; v1 submitted 21 June, 2014; originally announced June 2014.

arXiv:1406.0067 [pdf, other]

Optimization via Low-rank Approximation for Community Detection in Networks

Authors: Can M. Le, Elizaveta Levina, Roman Vershynin

Abstract: Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-ca… ▽ More Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our method performs well for multiple problems over a wide range of parameters. △ Less

Submitted 10 May, 2015; v1 submitted 31 May, 2014; originally announced June 2014.

Comments: 45 pages, 7 figures; added discussions about computational complexity and extension to more than two communities

MSC Class: 62E10; 62G05

arXiv:1311.0416 [pdf, ps, other]

Structured functional regression models for high-dimensional spatial spectroscopy data

Authors: Arash A. Amini, Elizaveta Levina, Kerby A. Shedden

Abstract: Modeling and analysis of spectroscopy data is an active area of research with applications to chemistry and biology. This paper focuses on analyzing Raman spectra obtained from a bone fracture healing experiment, although the functional regression model for predicting a scalar response from high-dimensional tensors can be applied to any spectroscopy data. The regression model is built on a sparse… ▽ More Modeling and analysis of spectroscopy data is an active area of research with applications to chemistry and biology. This paper focuses on analyzing Raman spectra obtained from a bone fracture healing experiment, although the functional regression model for predicting a scalar response from high-dimensional tensors can be applied to any spectroscopy data. The regression model is built on a sparse functional representation of the spectra, and accommodates multiple spatial dimensions. We apply our models to the task of predicting bone-mineral-density (BMD), an important indicator of fracture healing, from Raman spectra, in both the in vivo and ex vivo settings of the bone fracture healing experiment. To illustrate the general applicability of the method, we also use it to predict lipoprotein concentrations from spectra obtained by nuclear magnetic resonance (NMR) spectroscopy. △ Less

Submitted 2 November, 2013; originally announced November 2013.

arXiv:1304.2810 [pdf, other]

High-dimensional Mixed Graphical Models

Authors: Jie Cheng, Tianxi Li, Elizaveta Levina, Ji Zhu

Abstract: While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models linking both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional dat… ▽ More While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models linking both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted $\ell_1$ penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation data set (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to categorical variables such as genre, emotions, and usage associated with particular songs. While we focus on binary discrete variables, we also show that the proposed methodology can be easily extended to general discrete variables. △ Less

Submitted 19 August, 2016; v1 submitted 9 April, 2013; originally announced April 2013.

arXiv:1301.7047 [pdf, ps, other]

Link prediction for partially observed networks

Authors: Yunpeng Zhao, Elizaveta Levina, Ji Zhu

Abstract: Link prediction is one of the fundamental problems in network analysis. In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches. We develop a new method which treats the observed network as a sample of the true network with different sampling rates for… ▽ More Link prediction is one of the fundamental problems in network analysis. In many applications, notably in genetics, a partially observed network may not contain any negative examples of absent edges, which creates a difficulty for many existing supervised learning approaches. We develop a new method which treats the observed network as a sample of the true network with different sampling rates for positive and negative examples. We obtain a relative ranking of potential links by their probabilities, utilizing information on node covariates as well as on network topology. Empirically, the method performs well under many settings, including when the observed network is sparse. We apply the method to a protein-protein interaction network and a school friendship network. △ Less

Submitted 29 January, 2013; originally announced January 2013.

arXiv:1209.6342 [pdf, other]

Sparse Ising Models with Covariates

Authors: Jie Cheng, Elizaveta Levina, Pei Wang, Ji Zhu

Abstract: There has been a lot of work fitting Ising models to multivariate binary data in order to understand the conditional dependency relationships between the variables. However, additional covariates are frequently recorded together with the binary data, and may influence the dependence relationships. Motivated by such a dataset on genomic instability collected from tumor samples of several types, we… ▽ More There has been a lot of work fitting Ising models to multivariate binary data in order to understand the conditional dependency relationships between the variables. However, additional covariates are frequently recorded together with the binary data, and may influence the dependence relationships. Motivated by such a dataset on genomic instability collected from tumor samples of several types, we propose a sparse covariate dependent Ising model to study both the conditional dependency within the binary data and its relationship with the additional covariates. This results in subject-specific Ising models, where the subject's covariates influence the strength of association between the genes. As in all exploratory data analysis, interpretability of results is important, and we use L1 penalties to induce sparsity in the fitted graphs and in the number of selected covariates. Two algorithms to fit the model are proposed and compared on a set of simulated data, and asymptotic results are established. The results on the tumor dataset and their biological significance are discussed in detail. △ Less

Submitted 27 September, 2012; originally announced September 2012.

Comments: 32 pages (including 5 pages of appendix), 3 figures, 2 tables

arXiv:1207.2340 [pdf, ps, other]

doi 10.1214/13-AOS1138

Pseudo-likelihood methods for community detection in large sparse networks

Authors: Arash A. Amini, Aiyou Chen, Peter J. Bickel, Elizaveta Levina

Abstract: Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudo-likelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms… ▽ More Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudo-likelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudo-likelihood. We prove that pseudo-likelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two communities. △ Less

Submitted 5 November, 2013; v1 submitted 10 July, 2012; originally announced July 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1138 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1138

Journal ref: Annals of Statistics 2013, Vol. 41, No. 4, 2097-2122

arXiv:1202.5101 [pdf, ps, other]

doi 10.1214/11-AOS904

The method of moments and degree distributions for network models

Authors: Peter J. Bickel, Aiyou Chen, Elizaveta Levina

Abstract: Probability models on graphs are becoming increasingly important in many applications, but statistical tools for fitting such models are not yet well developed. Here we propose a general method of moments approach that can be used to fit a large class of probability models through empirical counts of certain patterns in a graph. We establish some general asymptotic properties of empirical graph mo… ▽ More Probability models on graphs are becoming increasingly important in many applications, but statistical tools for fitting such models are not yet well developed. Here we propose a general method of moments approach that can be used to fit a large class of probability models through empirical counts of certain patterns in a graph. We establish some general asymptotic properties of empirical graph moments and prove consistency of the estimates as the graph size grows for all ranges of the average degree including $Ω(1)$. Additional results are obtained for the important special case of degree distributions. △ Less

Submitted 23 February, 2012; originally announced February 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS904 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS904

Journal ref: Annals of Statistics 2011, Vol. 39, No. 5, 2280-2301

arXiv:1110.3854 [pdf, ps, other]

doi 10.1214/12-AOS1036

Consistency of community detection in networks under degree-corrected stochastic block models

Authors: Yunpeng Zhao, Elizaveta Levina, Ji Zhu

Abstract: Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are… ▽ More Community detection is a fundamental problem in network analysis, with applications in many diverse areas. The stochastic block model is a common tool for model-based community detection, and asymptotic tools for checking consistency of community detection under the block model have been recently developed. However, the block model is limited by its assumption that all nodes within a community are stochastically equivalent, and provides a poor fit to networks with hubs or highly varying node degrees within communities, which are common in practice. The degree-corrected stochastic block model was proposed to address this shortcoming and allows variation in node degrees within a community while preserving the overall block community structure. In this paper we establish general theory for checking consistency of community detection under the degree-corrected stochastic block model and compare several community detection criteria under both the standard and the degree-corrected models. We show which criteria are consistent under which models and constraints, as well as compare their relative performance in practice. We find that methods based on the degree-corrected block model, which includes the standard block model as a special case, are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not. On the other hand, in practice, the degree correction involves estimating many more parameters, and empirically we find it is only worth doing if the node degrees within communities are indeed highly variable. We illustrate the methods on simulated networks and on a network of political blogs. △ Less

Submitted 17 March, 2015; v1 submitted 17 October, 2011; originally announced October 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1036 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). With Corrections

Report number: IMS-AOS-AOS1036

Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 2266-2292

arXiv:1008.1716 [pdf, ps, other]

Partial estimation of covariance matrices

Authors: Elizaveta Levina, Roman Vershynin

Abstract: A classical approach to accurately estimating the covariance matrix Σof a p-variate normal distribution is to draw a sample of size n > p and form a sample covariance matrix. However, many modern applications operate with much smaller sample sizes, thus calling for estimation guarantees in the regime n << p. We show that a sample of size n = O(m log^6 p) is sufficient to accurately estimate in ope… ▽ More A classical approach to accurately estimating the covariance matrix Σof a p-variate normal distribution is to draw a sample of size n > p and form a sample covariance matrix. However, many modern applications operate with much smaller sample sizes, thus calling for estimation guarantees in the regime n << p. We show that a sample of size n = O(m log^6 p) is sufficient to accurately estimate in operator norm an arbitrary symmetric part of Σconsisting of m < n entries per row. This follows from a general result on estimating Hadamard products M.Σ, where M is an arbitrary symmetric matrix. △ Less

Submitted 11 February, 2011; v1 submitted 10 August, 2010; originally announced August 2010.

Comments: 15 pages, to appear in PTRF. Small changes in light of comments from the referee

MSC Class: 62H12 (primary); 60B20 (secondary)

Journal ref: Probability Theory and Related Fields 153 (2012), 405--419

arXiv:1005.3265 [pdf, ps, other]

doi 10.1073/pnas.1006642108

Community extraction for social networks

Authors: Yunpeng Zhao, Elizaveta Levina, Ji Zhu

Abstract: Analysis of networks and in particular discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities an… ▽ More Analysis of networks and in particular discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, many networks contain nodes that do not fit in with any of the communities, and forcing every node into a community can distort results. Here we propose a new framework that focuses on community extraction instead of partition, extracting one community at a time. The main idea behind extraction is that the strength of a community should not depend on ties between members of other communities, but only on ties within that community and its ties to the outside world. We show that the new extraction criterion performs well on simulated and real networks, and establish asymptotic consistency of our method under the block model assumption. △ Less

Submitted 18 May, 2010; originally announced May 2010.

arXiv:0903.0645 [pdf, ps, other]

A new approach to Cholesky-based covariance regularization in high dimensions

Authors: Adam J. Rothman, Elizaveta Levina, Ji Zhu

Abstract: In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretati… ▽ More In this paper we propose a new regression interpretation of the Cholesky factor of the covariance matrix, as opposed to the well known regression interpretation of the Cholesky factor of the inverse covariance, which leads to a new class of regularized covariance estimators suitable for high-dimensional problems. Regularizing the Cholesky factor of the covariance via this regression interpretation always results in a positive definite estimator. In particular, one can obtain a positive definite banded estimator of the covariance matrix at the same computational cost as the popular banded estimator proposed by Bickel and Levina (2008b), which is not guaranteed to be positive definite. We also establish theoretical connections between banding Cholesky factors of the covariance matrix and its inverse and constrained maximum likelihood estimation under the banding constraint, and compare the numerical performance of several methods in simulations and on a sonar data example. △ Less

Submitted 3 March, 2009; originally announced March 2009.

Comments: Submitted for publication on Feb. 28, 2009

arXiv:0901.3079 [pdf, ps, other]

doi 10.1214/08-AOS600

Covariance regularization by thresholding

Authors: Peter J. Bickel, Elizaveta Levina

Abstract: This paper considers regularizing a covariance matrix of $p$ variables estimated from $n$ observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and $(\log p)/n\to0$, and obtain explicit rates. The results are uniform over families… ▽ More This paper considers regularizing a covariance matrix of $p$ variables estimated from $n$ observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and $(\log p)/n\to0$, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. △ Less

Submitted 20 January, 2009; originally announced January 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOS600 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS600 MSC Class: 62H12 (Primary) 62F12; 62G09 (Secondary)

Journal ref: Annals of Statistics 2008, Vol. 36, No. 6, 2577-2604

arXiv:0803.3872 [pdf, ps, other]

doi 10.1214/07-AOAS139

Sparse estimation of large covariance matrices via a nested Lasso penalty

Authors: Elizaveta Levina, Adam Rothman, Ji Zhu

Abstract: The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, b… ▽ More The paper proposes a new covariance estimator for large covariance matrices when the variables have a natural ordering. Using the Cholesky decomposition of the inverse, we impose a banded structure on the Cholesky factor, and select the bandwidth adaptively for each row of the Cholesky factor, using a novel penalty we call nested Lasso. This structure has more flexibility than regular banding, but, unlike regular Lasso applied to the entries of the Cholesky factor, results in a sparse estimator for the inverse of the covariance matrix. An iterative algorithm for solving the optimization problem is developed. The estimator is compared to a number of other covariance estimators and is shown to do best, both in simulations and on a real data example. Simulations show that the margin by which the estimator outperforms its competitors tends to increase with dimension. △ Less

Submitted 27 March, 2008; originally announced March 2008.

Comments: Published in at http://dx.doi.org/10.1214/07-AOAS139 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS139

Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 1, 245-263

arXiv:0803.1909 [pdf, ps, other]

doi 10.1214/009053607000000758

Regularized estimation of large covariance matrices

Authors: Peter J. Bickel, Elizaveta Levina

Abstract: This paper considers estimating a covariance matrix of $p$ variables from $n$ observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as $(\log p)/n\to0$, and obtain explicit rates. The results are uniform over some fairly natural well-conditi… ▽ More This paper considers estimating a covariance matrix of $p$ variables from $n$ observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as $(\log p)/n\to0$, and obtain explicit rates. The results are uniform over some fairly natural well-conditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and well-conditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to non-Gaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data. △ Less

Submitted 13 March, 2008; originally announced March 2008.

Comments: Published in at http://dx.doi.org/10.1214/009053607000000758 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0298 MSC Class: 62H12 (Primary) 62F12; 62G09 (Secondary)

Journal ref: Annals of Statistics 2008, Vol. 36, No. 1, 199-227

arXiv:0801.4837 [pdf, ps, other]

doi 10.1214/08-EJS176

Sparse permutation invariant covariance estimation

Authors: Adam J. Rothman, Peter J. Bickel, Elizaveta Levina, Ji Zhu

Abstract: The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty. We establish a rate of convergence in the Frobenius norm as both data dimension $p$ and sample size $n$ are allowed to grow, and show that the ra… ▽ More The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in high-dimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lasso-type penalty. We establish a rate of convergence in the Frobenius norm as both data dimension $p$ and sample size $n$ are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlation-based version of the method exhibits better rates in the operator norm. We also derive a fast iterative algorithm for computing the estimator, which relies on the popular Cholesky decomposition of the inverse but produces a permutation-invariant estimator. The method is compared to other estimators on simulated data and on a real data example of tumor tissue classification using gene expression data. △ Less

Submitted 26 June, 2008; v1 submitted 31 January, 2008; originally announced January 2008.

Comments: Published in at http://dx.doi.org/10.1214/08-EJS176 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-EJS-EJS_2008_176 MSC Class: 62H20 (Primary) 62H12 (Secondary)

Journal ref: Electronic Journal of Statistics 2008, Vol. 2, 494-515

arXiv:0709.2108 [pdf, ps, other]

doi 10.1103/PhysRevE.77.046119

Robustness of community structure in networks

Authors: Brian Karrer, Elizaveta Levina, M. E. J. Newman

Abstract: The discovery of community structure is a common challenge in the analysis of network data. Many methods have been proposed for finding community structure, but few have been proposed for determining whether the structure found is statistically significant or whether, conversely, it could have arisen purely as a result of chance. In this paper we show that the significance of community structure… ▽ More The discovery of community structure is a common challenge in the analysis of network data. Many methods have been proposed for finding community structure, but few have been proposed for determining whether the structure found is statistically significant or whether, conversely, it could have arisen purely as a result of chance. In this paper we show that the significance of community structure can be effectively quantified by measuring its robustness to small perturbations in network structure. We propose a suitable method for perturbing networks and a measure of the resulting change in community structure and use them to assess the significance of community structure in a variety of networks, both real and computer generated. △ Less

Submitted 13 September, 2007; originally announced September 2007.

Comments: 10 pages, 2 figures

Journal ref: Phys. Rev. E 77, 046119 (2008)

arXiv:math/0611258 [pdf, ps, other]

doi 10.1214/009053606000000588

Texture synthesis and nonparametric resampling of random fields

Authors: Elizaveta Levina, Peter J. Bickel

Abstract: This paper introduces a nonparametric algorithm for bootstrap** a stationary random field and proves certain consistency properties of the algorithm for the case of mixing random fields. The motivation for this paper comes from relating a heuristic texture synthesis algorithm popular in computer vision to general nonparametric bootstrap** of stationary random fields. We give a formal resampl… ▽ More This paper introduces a nonparametric algorithm for bootstrap** a stationary random field and proves certain consistency properties of the algorithm for the case of mixing random fields. The motivation for this paper comes from relating a heuristic texture synthesis algorithm popular in computer vision to general nonparametric bootstrap** of stationary random fields. We give a formal resampling scheme for the heuristic texture algorithm and prove that it produces a consistent estimate of the joint distribution of pixels in a window of certain size under mixing and regularity conditions on the random field. The joint distribution of pixels is the quantity of interest here because theories of human perception of texture suggest that two textures with the same joint distribution of pixel values in a suitably chosen window will appear similar to a human. Thus we provide theoretical justification for an algorithm that has already been very successful in practice, and suggest an explanation for its perceptually good results. △ Less

Submitted 9 November, 2006; originally announced November 2006.

Comments: Published at http://dx.doi.org/10.1214/009053606000000588 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0179 MSC Class: 62M40 (Primary) 62G09 (Secondary)

Journal ref: Annals of Statistics 2006, Vol. 34, No. 4, 1751-1773

Showing 1–50 of 50 results for author: Levina, E