Search | arXiv e-print repository

Scalable Bayesian Transformed Gaussian Processes

Authors: Xinran Zhu, Leo Huang, Cameron Ibrahim, Eric Hans Lee, David Bindel

Abstract: The Bayesian transformed Gaussian process (BTG) model, proposed by Kedem and Oliviera, is a fully Bayesian counterpart to the warped Gaussian process (WGP) and marginalizes out a joint prior over input war** and kernel hyperparameters. This fully Bayesian treatment of hyperparameters often provides more accurate regression estimates and superior uncertainty propagation, but is prohibitively expe… ▽ More The Bayesian transformed Gaussian process (BTG) model, proposed by Kedem and Oliviera, is a fully Bayesian counterpart to the warped Gaussian process (WGP) and marginalizes out a joint prior over input war** and kernel hyperparameters. This fully Bayesian treatment of hyperparameters often provides more accurate regression estimates and superior uncertainty propagation, but is prohibitively expensive. The BTG posterior predictive distribution, itself estimated through high-dimensional integration, must be inverted in order to perform model prediction. To make the Bayesian approach practical and comparable in speed to maximum-likelihood estimation (MLE), we propose principled and fast techniques for computing with BTG. Our framework uses doubly sparse quadrature rules, tight quantile bounds, and rank-one matrix algebra to enable both fast model prediction and model selection. These scalable methods allow us to regress over higher-dimensional datasets and apply BTG with layered transformations that greatly improve its expressibility. We demonstrate that BTG achieves superior empirical performance over MLE-based models. △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2111.06580 [pdf, other]

On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

Authors: Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

Abstract: Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it be… ▽ More Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data. △ Less

Submitted 12 November, 2021; originally announced November 2021.

arXiv:2110.14972 [pdf, ps, other]

Streaming Local Community Detection through Approximate Conductance

Authors: Yanhao Yang, Meng Wang, David Bindel, Kun He

Abstract: Community is a universal structure in various complex networks, and community detection is a fundamental task for network analysis. With the rapid growth of network scale, networks are massive, changing rapidly and could naturally be modeled as graph streams. Due to the limited memory and access constraint in graph streams, existing non-streaming community detection methods are no longer applicabl… ▽ More Community is a universal structure in various complex networks, and community detection is a fundamental task for network analysis. With the rapid growth of network scale, networks are massive, changing rapidly and could naturally be modeled as graph streams. Due to the limited memory and access constraint in graph streams, existing non-streaming community detection methods are no longer applicable. This raises an emerging need for online approaches. In this work, we consider the problem of uncovering the local community containing a few query nodes in graph streams, termed streaming local community detection. This is a new problem raised recently that is more challenging for community detection and only a few works address this online setting. Correspondingly, we design an online single-pass streaming local community detection approach. Inspired by the "local" property of communities, our method samples the local structure around the query nodes in graph streams, and extracts the target community on the sampled subgraph using our proposed metric called the approximate conductance. Comprehensive experiments show that our method remarkably outperforms the streaming baseline on both effectiveness and efficiency, and even achieves similar accuracy comparing to the state-of-the-art non-streaming local community detection methods that use static and complete graphs. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2109.14811 [pdf, other]

Surveillance Evasion Through Bayesian Reinforcement Learning

Authors: Dong** Qi, David Bindel, Alexander Vladimirsky

Abstract: We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We… ▽ More We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We introduce a Bayesian reinforcement learning algorithm that relies on a Gaussian Process regression (to model the surveillance intensity function based on the information from prior episodes), numerical methods for Hamilton-Jacobi PDEs (to plan the best continuous trajectories based on the current model), and Confidence Bounds (to balance the exploration vs exploitation). We use numerical experiments and regret metrics to highlight the significant advantages of our approach compared to traditional graph-based algorithms of reinforcement learning. △ Less

Submitted 23 February, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 6 pages, 3 figures; accepted for presentation publication at AISTATS 2023

MSC Class: 93E35; 49L20; 68W27; 68T37; 60G15; 62N02

arXiv:2107.04061 [pdf, other]

Scaling Gaussian Processes with Derivative Information Using Variational Inference

Authors: Misha Padidar, Xinran Zhu, Leo Huang, Jacob R. Gardner, David Bindel

Abstract: Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even… ▽ More Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available. △ Less

Submitted 8 July, 2021; originally announced July 2021.

arXiv:2010.11341 [pdf, other]

Density of States Graph Kernels

Authors: Leo Huang, Andrew Graven, David Bindel

Abstract: A fundamental problem on graph-structured data is that of quantifying similarity between graphs. Graph kernels are an established technique for such tasks; in particular, those based on random walks and return probabilities have proven to be effective in wide-ranging applications, from bioinformatics to social networks to computer vision. However, random walk kernels generally suffer from slowness… ▽ More A fundamental problem on graph-structured data is that of quantifying similarity between graphs. Graph kernels are an established technique for such tasks; in particular, those based on random walks and return probabilities have proven to be effective in wide-ranging applications, from bioinformatics to social networks to computer vision. However, random walk kernels generally suffer from slowness and tottering, an effect which causes walks to overemphasize local graph topology, undercutting the importance of global structure. To correct for these issues, we recast return probability graph kernels under the more general framework of density of states -- a framework which uses the lens of spectral analysis to uncover graph motifs and properties hidden within the interior of the spectrum -- and use our interpretation to construct scalable, composite density of states based graph kernels which balance local and global information, leading to higher classification accuracies on a host of benchmark datasets. △ Less

Submitted 20 January, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

arXiv:2003.08310 [pdf, other]

On the Distribution of Minima in Intrinsic-Metric Rotation Averaging

Authors: Kyle Wilson, David Bindel

Abstract: Rotation Averaging is a non-convex optimization problem that determines orientations of a collection of cameras from their images of a 3D scene. The problem has been studied using a variety of distances and robustifiers. The intrinsic (or geodesic) distance on SO(3) is geometrically meaningful; but while some extrinsic distance-based solvers admit (conditional) guarantees of correctness, no compar… ▽ More Rotation Averaging is a non-convex optimization problem that determines orientations of a collection of cameras from their images of a 3D scene. The problem has been studied using a variety of distances and robustifiers. The intrinsic (or geodesic) distance on SO(3) is geometrically meaningful; but while some extrinsic distance-based solvers admit (conditional) guarantees of correctness, no comparable results have been found under the intrinsic metric. In this paper, we study the spatial distribution of local minima. First, we do a novel empirical study to demonstrate sharp transitions in qualitative behavior: as problems become noisier, they transition from a single (easy-to-find) dominant minimum to a cost surface filled with minima. In the second part of this paper we derive a theoretical bound for when this transition occurs. This is an extension of the results of [24], which used local convexity as a proxy to study the difficulty of problem. By recognizing the underlying quotient manifold geometry of the problem we achieve an n-fold improvement over prior work. Incidentally, our analysis also extends the prior $l_2$ work to general $l_p$ costs. Our results suggest using algebraic connectivity as an indicator of problem difficulty. △ Less

Submitted 18 March, 2020; originally announced March 2020.

Comments: To be published in CVPR2020

arXiv:2002.10539 [pdf, other]

Efficient Rollout Strategies for Bayesian Optimization

Authors: Eric Hans Lee, David Eriksson, Bolong Cheng, Michael McCourt, David Bindel

Abstract: Bayesian optimization (BO) is a class of sample-efficient global optimization methods, where a probabilistic model conditioned on previous observations is used to determine future evaluations via the optimization of an acquisition function. Most acquisition functions are myopic, meaning that they only consider the impact of the next function evaluation. Non-myopic acquisition functions consider th… ▽ More Bayesian optimization (BO) is a class of sample-efficient global optimization methods, where a probabilistic model conditioned on previous observations is used to determine future evaluations via the optimization of an acquisition function. Most acquisition functions are myopic, meaning that they only consider the impact of the next function evaluation. Non-myopic acquisition functions consider the impact of the next $h$ function evaluations and are typically computed through rollout, in which $h$ steps of BO are simulated. These rollout acquisition functions are defined as $h$-dimensional integrals, and are expensive to compute and optimize. We show that a combination of quasi-Monte Carlo, common random numbers, and control variates significantly reduce the computational burden of rollout. We then formulate a policy-search based approach that removes the need to optimize the rollout acquisition function. Finally, we discuss the qualitative behavior of rollout policies in the setting of multi-modal objectives and model error. △ Less

Submitted 18 June, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: To appear in UAI 2020

arXiv:1912.12834 [pdf, other]

Randomly Projected Additive Gaussian Processes for Regression

Authors: Ian A. Delbridge, David S. Bindel, Andrew Gordon Wilson

Abstract: Gaussian processes (GPs) provide flexible distributions over functions, with inductive biases controlled by a kernel. However, in many applications Gaussian processes can struggle with even moderate input dimensionality. Learning a low dimensional projection can help alleviate this curse of dimensionality, but introduces many trainable hyperparameters, which can be cumbersome, especially in the sm… ▽ More Gaussian processes (GPs) provide flexible distributions over functions, with inductive biases controlled by a kernel. However, in many applications Gaussian processes can struggle with even moderate input dimensionality. Learning a low dimensional projection can help alleviate this curse of dimensionality, but introduces many trainable hyperparameters, which can be cumbersome, especially in the small data regime. We use additive sums of kernels for GP regression, where each kernel operates on a different random projection of its inputs. Surprisingly, we find that as the number of random projections increases, the predictive performance of this approach quickly converges to the performance of a kernel operating on the original full dimensional inputs, over a wide range of data sets, even if we are projecting into a single dimension. As a consequence, many problems can remarkably be reduced to one dimensional input spaces, without learning a transformation. We prove this convergence and its rate, and additionally propose a deterministic approach that converges more quickly than purely random projections. Moreover, we demonstrate our approach can achieve faster inference and improved predictive accuracy for high-dimensional inputs compared to kernels in the original input space. △ Less

Submitted 30 December, 2019; originally announced December 2019.

arXiv:1908.00420 [pdf, other]

pySOT and POAP: An event-driven asynchronous framework for surrogate optimization

Authors: David Eriksson, David Bindel, Christine A. Shoemaker

Abstract: This paper describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies, designed for global optimization of expensive functions where concurrent function evaluations are useful. POAP consists of three components: a worker pool capable… ▽ More This paper describes Plumbing for Optimization with Asynchronous Parallelism (POAP) and the Python Surrogate Optimization Toolbox (pySOT). POAP is an event-driven framework for building and combining asynchronous optimization strategies, designed for global optimization of expensive functions where concurrent function evaluations are useful. POAP consists of three components: a worker pool capable of function evaluations, strategies to propose evaluations or other actions, and a controller that mediates the interaction between the workers and strategies. pySOT is a collection of synchronous and asynchronous surrogate optimization strategies, implemented in the POAP framework. We support the stochastic RBF method by Regis and Shoemaker along with various extensions of this method, and a general surrogate optimization strategy that covers most Bayesian optimization methods. We have implemented many different surrogate models, experimental designs, acquisition functions, and a large set of test problems. We make an extensive comparison between synchronous and asynchronous parallelism and find that the advantage of asynchronous computation increases as the variance of the evaluation time or number of processors increases. We observe a close to linear speed-up with 4, 8, and 16 processors in both the synchronous and asynchronous setting. △ Less

Submitted 30 July, 2019; originally announced August 2019.

arXiv:1905.09758 [pdf, other]

doi 10.1145/3292500.3330891

Network Density of States

Authors: Kun Dong, Austin R. Benson, David Bindel

Abstract: Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme e… ▽ More Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues - the spectral density - is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: 10 pages, 7 figures

arXiv:1810.12283 [pdf, other]

Scaling Gaussian Process Regression with Derivatives

Authors: David Eriksson, Kun Dong, Eric Hans Lee, David Bindel, Andrew Gordon Wilson

Abstract: Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ c… ▽ More Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at $n$ points in $d$ dimensions requires linear solves and log determinants with an ${n(d+1) \times n(d+1)}$ positive definite matrix -- leading to prohibitive $\mathcal{O}(n^3d^3)$ computations for standard direct methods. We propose iterative solvers using fast $\mathcal{O}(nd)$ matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: Appears at Advances in Neural Information Processing Systems 32 (NIPS), 2018

Journal ref: Advances in Neural Information Processing Systems 32 (NIPS), 2018

arXiv:1809.11165 [pdf, other]

GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

Authors: Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger, Andrew Gordon Wilson

Abstract: Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and infe… ▽ More Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from $O(n^3)$ to $O(n^2)$. Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch. △ Less

Submitted 29 June, 2021; v1 submitted 28 September, 2018; originally announced September 2018.

Comments: NeurIPS 2018. Most recent version includes additional details on preconditioned BBMM

arXiv:1712.04823 [pdf, other]

Krylov Subspace Approximation for Local Community Detection in Large Networks

Authors: Kun He, Pan Shi, David Bindel, John E. Hopcroft

Abstract: Community detection is an important information mining task to uncover modular structures in large networks. For increasingly common large network data sets, global community detection is prohibitively expensive, and attention has shifted to methods that mine local communities, i.e. identifying all latent members of a particular community from a few labeled seed members. To address such semi-super… ▽ More Community detection is an important information mining task to uncover modular structures in large networks. For increasingly common large network data sets, global community detection is prohibitively expensive, and attention has shifted to methods that mine local communities, i.e. identifying all latent members of a particular community from a few labeled seed members. To address such semi-supervised mining task, we systematically develop a local spectral subspace-based community detection method, called LOSP. We define a family of local spectral subspaces based on Krylov subspaces, and seek a sparse indicator for the target community via an $\ell_1$ norm minimization over the Krylov subspace. Variants of LOSP depend on type of random walks with different diffusion speeds, type of random walks, dimension of the local spectral subspace and step of diffusions. The effectiveness of the proposed LOSP approach is theoretically analyzed based on Rayleigh quotients, and it is experimentally verified on a wide variety of real-world networks across social, production and biological domains, as well as on an extensive set of synthetic LFR benchmark datasets. △ Less

Submitted 16 May, 2019; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: Submitted to ACM Transactions on Knowledge Discovery from Data (under revision)

arXiv:1711.07065 [pdf, other]

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

Authors: Moontae Lee, David Bindel, David Mimno

Abstract: Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently T… ▽ More Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition (PADD) that is capable of learning document-specific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data. △ Less

Submitted 19 November, 2017; originally announced November 2017.

arXiv:1711.03481 [pdf, other]

Scalable Log Determinants for Gaussian Process Kernel Learning

Authors: Kun Dong, David Eriksson, Hannes Nickisch, David Bindel, Andrew Gordon Wilson

Abstract: For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations. We propose novel $\mathcal{O}(n)$ approaches to estimating these quantities… ▽ More For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an $n \times n$ positive definite matrix, and its derivatives - leading to prohibitive $\mathcal{O}(n^3)$ computations. We propose novel $\mathcal{O}(n)$ approaches to estimating these quantities from only fast matrix vector multiplications (MVMs). These stochastic approximations are based on Chebyshev, Lanczos, and surrogate models, and converge quickly even for kernel matrices that have challenging spectra. We leverage these approximations to develop a scalable Gaussian process approach to kernel learning. We find that Lanczos is generally superior to Chebyshev for kernel learning, and that a surrogate approach can be highly efficient and accurate with popular kernels. △ Less

Submitted 9 November, 2017; originally announced November 2017.

Comments: Appears at Advances in Neural Information Processing Systems 30 (NIPS), 2017

Journal ref: Advances in Neural Information Processing Systems 30 (NIPS), 2017

arXiv:1611.00175 [pdf, other]

Robust Spectral Inference for Joint Stochastic Matrix Factorization

Authors: Moontae Lee, David Bindel, David Mimno

Abstract: Spectral inference provides fast algorithms and provable optimality for latent topic analysis. But for real data these algorithms require additional ad-hoc heuristics, and even then often produce unusable results. We explain this poor performance by casting the problem of topic inference in the framework of Joint Stochastic Matrix Factorization (JSMF) and showing that previous methods violate the… ▽ More Spectral inference provides fast algorithms and provable optimality for latent topic analysis. But for real data these algorithms require additional ad-hoc heuristics, and even then often produce unusable results. We explain this poor performance by casting the problem of topic inference in the framework of Joint Stochastic Matrix Factorization (JSMF) and showing that previous methods violate the theoretical conditions necessary for a good solution to exist. We then propose a novel rectification method that learns high quality topics and their interactions even on small, noisy data. This method achieves results comparable to probabilistic techniques in several domains while maintaining scalability and provable optimality. △ Less

Submitted 1 November, 2016; originally announced November 2016.

arXiv:1509.08065 [pdf, other]

Detecting Overlap** Communities from Local Spectral Subspaces

Authors: Kun He, Yiwei Sun, David Bindel, John E Hopcroft, Yixuan Li

Abstract: Based on the definition of local spectral subspace, we propose a novel approach called LOSP for local overlap** community detection. Instead of using the invariant subspace spanned by the dominant eigenvectors of the entire network, we run the power method for a few steps to approximate the leading eigenvectors that depict the embedding of the local neighborhood structure around seeds of interes… ▽ More Based on the definition of local spectral subspace, we propose a novel approach called LOSP for local overlap** community detection. Instead of using the invariant subspace spanned by the dominant eigenvectors of the entire network, we run the power method for a few steps to approximate the leading eigenvectors that depict the embedding of the local neighborhood structure around seeds of interest. We then seek a sparse approximate indicator vector in the local spectral subspace spanned by these vectors such that the seeds are in its support. We evaluate LOSP on five large real world networks across various domains with labeled ground-truth communities and compare the results with the state-of-the-art community detection approaches. LOSP identifies the members of a target community with high accuracy from very few seed members, and outperforms the local Heat Kernel or PageRank diffusions as well as the global baselines. Two candidate definitions of the local spectral subspace are analyzed, and different community scoring functions for determining the community boundary, including two new metrics, are thoroughly evaluated. The structural properties of different seed sets and the impact of the seed set size are discussed. We observe low degree seeds behave better, and LOSP is robust even when started from a single random seed. Using LOSP as a subroutine and starting from each ego connected component, we try the harder yet significant task of identifying all communities a single vertex is in. Experiments show that the proposed method achieves high F1 measures on the detected multiple local overlap** communities containing the seed vertex. △ Less

Submitted 27 September, 2015; originally announced September 2015.

Comments: 11 pages, 8 figures

ACM Class: H.3.3; G.2.2

Journal ref: ICDM 2015

arXiv:1509.07996 [pdf, other]

Overlap** Community Detection via Local Spectral Clustering

Authors: Yixuan Li, Kun He, David Bindel, John Hopcroft

Abstract: Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is cr… ▽ More Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure in large networks. A growing body of work has been adopting local expansion methods in order to identify the community members from a few exemplary seed members. In this paper, we propose a novel approach for finding overlap** communities called LEMON (Local Expansion via Minimum One Norm). The algorithm finds the community by seeking a sparse vector in the span of the local spectra such that the seeds are in its support. We show that LEMON can achieve the highest detection accuracy among state-of-the-art proposals. The running time depends on the size of the community rather than that of the entire graph. The algorithm is easy to implement, and is highly parallelizable. We further provide theoretical analysis on the local spectral properties, bounding the measure of tightness of extracted community in terms of the eigenvalues of graph Laplacian. Moreover, given that networks are not all similar in nature, a comprehensive analysis on how the local expansion approach is suited for uncovering communities in different networks is still lacking. We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the seed set quality and quantity would affect the performance are provided. △ Less

Submitted 26 September, 2015; originally announced September 2015.

Comments: Extended version to the conference proceeding in WWW'15

ACM Class: I.5.3, G.2.2

arXiv:1509.07715 [pdf, other]

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Authors: Yixuan Li, Kun He, David Bindel, John Hopcroft

Abstract: Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is cr… ▽ More Large graphs arise in a number of contexts and understanding their structure and extracting information from them is an important research area. Early algorithms on mining communities have focused on the global structure, and often run in time functional to the size of the entire graph. Nowadays, as we often explore networks with billions of vertices and find communities of size hundreds, it is crucial to shift our attention from macroscopic structure to microscopic structure when dealing with large networks. A growing body of work has been adopting local expansion methods in order to identify the community from a few exemplary seed members. In this paper, we propose a novel approach for finding overlap** communities called LEMON (Local Expansion via Minimum One Norm). Different from PageRank-like diffusion methods, LEMON finds the community by seeking a sparse vector in the span of the local spectra such that the seeds are in its support. We show that LEMON can achieve the highest detection accuracy among state-of-the-art proposals. The running time depends on the size of the community rather than that of the entire graph. The algorithm is easy to implement, and is highly parallelizable. Moreover, given that networks are not all similar in nature, a comprehensive analysis on how the local expansion approach is suited for uncovering communities in different networks is still lacking. We thoroughly evaluate our approach using both synthetic and real-world datasets across different domains, and analyze the empirical variations when applying our method to inherently different networks in practice. In addition, the heuristics on how the quality and quantity of the seed set would affect the performance are provided. △ Less

Submitted 25 September, 2015; originally announced September 2015.

Comments: 10pages, published in WWW2015 proceedings

ACM Class: G.2.2; H.3.3

arXiv:1203.2973 [pdf, ps, other]

How Bad is Forming Your Own Opinion?

Authors: David Bindel, Jon Kleinberg, Sigal Oren

Abstract: The question of how people form their opinion has fascinated economists and sociologists for quite some time. In many of the models, a group of people in a social network, each holding a numerical opinion, arrive at a shared opinion through repeated averaging with their neighbors in the network. Motivated by the observation that consensus is rarely reached in real opinion dynamics, we study a rela… ▽ More The question of how people form their opinion has fascinated economists and sociologists for quite some time. In many of the models, a group of people in a social network, each holding a numerical opinion, arrive at a shared opinion through repeated averaging with their neighbors in the network. Motivated by the observation that consensus is rarely reached in real opinion dynamics, we study a related sociological model in which individuals' intrinsic beliefs counterbalance the averaging process and yield a diversity of opinions. By interpreting the repeated averaging as best-response dynamics in an underlying game with natural payoffs, and the limit of the process as an equilibrium, we are able to study the cost of disagreement in these models relative to a social optimum. We provide a tight bound on the cost at equilibrium relative to the optimum; our analysis draws a connection between these agreement models and extremal problems that lead to generalized eigenvalues. We also consider a natural network design problem in this setting: which links can we add to the underlying network to reduce the cost of disagreement at equilibrium? △ Less

Submitted 13 March, 2012; originally announced March 2012.

Showing 1–21 of 21 results for author: Bindel, D