-
Local2Global: A distributed approach for scaling representation learning on graphs
Authors:
Lucas G. S. Jeub,
Giovanni Colavizza,
Xiaowen Dong,
Marya Bazzi,
Mihai Cucuringu
Abstract:
We propose a decentralised "local2global"' approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlap** subgraphs (or "patches") and training local representations for each patch independently. In a second step, we combine the local representations into a globally consist…
▽ More
We propose a decentralised "local2global"' approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlap** subgraphs (or "patches") and training local representations for each patch independently. In a second step, we combine the local representations into a globally consistent representation by estimating the set of rigid motions that best align the local representations using information from the patch overlaps, via group synchronization. A key distinguishing feature of local2global relative to existing work is that patches are trained independently without the need for the often costly parameter synchronization during distributed training. This allows local2global to scale to large-scale industrial applications, where the input graph may not even fit into memory and may be stored in a distributed manner. We apply local2global on data sets of different sizes and show that our approach achieves a good trade-off between scale and accuracy on edge reconstruction and semi-supervised classification. We also consider the downstream task of anomaly detection and show how one can use local2global to highlight anomalies in cybersecurity networks.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Local2Global: Scaling global representation learning on graphs via local training
Authors:
Lucas G. S. Jeub,
Giovanni Colavizza,
Xiaowen Dong,
Marya Bazzi,
Mihai Cucuringu
Abstract:
We propose a decentralised "local2global" approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlap** subgraphs (or "patches") and training local representations for each patch independently. In a second step, we combine the local representations into a globally consiste…
▽ More
We propose a decentralised "local2global" approach to graph representation learning, that one can a-priori use to scale any embedding technique. Our local2global approach proceeds by first dividing the input graph into overlap** subgraphs (or "patches") and training local representations for each patch independently. In a second step, we combine the local representations into a globally consistent representation by estimating the set of rigid motions that best align the local representations using information from the patch overlaps, via group synchronization. A key distinguishing feature of local2global relative to existing work is that patches are trained independently without the need for the often costly parameter synchronisation during distributed training. This allows local2global to scale to large-scale industrial applications, where the input graph may not even fit into memory and may be stored in a distributed manner. Preliminary results on medium-scale data sets (up to $\sim$7K nodes and $\sim$200K edges) are promising, with a graph reconstruction performance for local2global that is comparable to that of globally trained embeddings. A thorough evaluation of local2global on large scale data and applications to downstream tasks, such as node classification and link prediction, constitutes ongoing work.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Weight Thresholding on Complex Networks
Authors:
Xiaoran Yan,
Lucas G. S. Jeub,
Alessandro Flammini,
Filippo Radicchi,
Santo Fortunato
Abstract:
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation…
▽ More
Weight thresholding is a simple technique that aims at reducing the number of edges in weighted networks that are otherwise too dense for the application of standard graph theoretical methods. We show that the group structure of real weighted networks is very robust under weight thresholding, as it is maintained even when most of the edges are removed. This appears to be related to the correlation between topology and weight that characterizes real networks. On the other hand, the behavior of other properties is generally system dependent.
△ Less
Submitted 5 October, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
Multiresolution Consensus Clustering in Networks
Authors:
Lucas G. S. Jeub,
Olaf Sporns,
Santo Fortunato
Abstract:
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based o…
▽ More
Networks often exhibit structure at disparate scales. We propose a method for identifying community structure at different scales based on multiresolution modularity and consensus clustering. Our contribution consists of two parts. First, we propose a strategy for sampling the entire range of possible resolutions for the multiresolution modularity quality function. Our approach is directly based on the properties of modularity and, in particular, provides a natural way of avoiding the need to increase the resolution parameter by several orders of magnitude to break a few remaining small communities, necessitating the introduction of ad-hoc limits to the resolution range with standard sampling approaches. Second, we propose a hierarchical consensus clustering procedure, based on a modified modularity, that allows one to construct a hierarchical consensus structure given a set of input partitions. While here we are interested in its application to partitions sampled using multiresolution modularity, this consensus clustering procedure can be applied to the output of any clustering algorithm. As such, we see many potential applications of the individual parts of our multiresolution consensus clustering procedure in addition to using the procedure itself to identify hierarchical structure in networks.
△ Less
Submitted 30 January, 2018; v1 submitted 5 October, 2017;
originally announced October 2017.
-
A Framework for the Construction of Generative Models for Mesoscale Structure in Multilayer Networks
Authors:
Marya Bazzi,
Lucas G. S. Jeub,
Alex Arenas,
Sam D. Howison,
Mason A. Porter
Abstract:
Multilayer networks allow one to represent diverse and coupled connectivity patterns --- e.g., time-dependence, multiple subsystems, or both --- that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense set…
▽ More
Multilayer networks allow one to represent diverse and coupled connectivity patterns --- e.g., time-dependence, multiple subsystems, or both --- that arise in many applications and which are difficult or awkward to incorporate into standard network representations. In the study of multilayer networks, it is important to investigate mesoscale (i.e., intermediate-scale) structures, such as dense sets of nodes known as communities, to discover network features that are not apparent at the microscale or the macroscale. The ill-defined nature of mesoscale structure and its ubiquity in empirical networks make it crucial to develop generative models that can produce the features that one encounters in empirical networks. Key purposes of such generative models include generating synthetic networks with empirical properties of interest, benchmarking mesoscale-detection methods and algorithms, and inferring structure in empirical multilayer networks. In this paper, we introduce a framework for the construction of generative models for mesoscale structures in multilayer networks. Our framework provides a standardized set of generative models, together with an associated set of principles from which they are derived, for studies of mesoscale structures in multilayer networks. It unifies and generalizes many existing models for mesoscale structures in fully-ordered (e.g., temporal) and unordered (e.g., multiplex) multilayer networks. One can also use it to construct generative models for mesoscale structures in partially-ordered multilayer networks (e.g., networks that are both temporal and multiplex). Our framework has the ability to produce many features of empirical multilayer networks, and it explicitly incorporates a user-specified dependency structure between layers.
△ Less
Submitted 11 December, 2019; v1 submitted 22 August, 2016;
originally announced August 2016.
-
A Local Perspective on Community Structure in Multilayer Networks
Authors:
Lucas G. S. Jeub,
Michael W. Mahoney,
Peter J. Mucha,
Mason A. Porter
Abstract:
The analysis of multilayer networks is among the most active areas of network science, and there are now several methods to detect dense "communities" of nodes in multilayer networks. One way to define a community is as a set of nodes that trap a diffusion-like dynamical process (usually a random walk) for a long time. In this view, communities are sets of nodes that create bottlenecks to the spre…
▽ More
The analysis of multilayer networks is among the most active areas of network science, and there are now several methods to detect dense "communities" of nodes in multilayer networks. One way to define a community is as a set of nodes that trap a diffusion-like dynamical process (usually a random walk) for a long time. In this view, communities are sets of nodes that create bottlenecks to the spreading of a dynamical process on a network. We analyze the local behavior of different random walks on multiplex networks (which are multilayer networks in which different layers correspond to different types of edges) and show that they have very different bottlenecks that hence correspond to rather different notions of what it means for a set of nodes to be a good community. This has direct implications for the behavior of community-detection methods that are based on these random walks.
△ Less
Submitted 22 May, 2016; v1 submitted 17 October, 2015;
originally announced October 2015.
-
Think Locally, Act Locally: The Detection of Small, Medium-Sized, and Large Communities in Large Networks
Authors:
Lucas G. S. Jeub,
Prakash Balachandran,
Mason A. Porter,
Peter J. Mucha,
Michael W. Mahoney
Abstract:
It is common in the study of networks to investigate meso-scale features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective t…
▽ More
It is common in the study of networks to investigate meso-scale features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify "communities," which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that "communities" are associated with bottlenecks of locally-biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for ``size-resolved community structure'' that can arise in real (and realistic) networks. Depending on which scenario holds, one may or may not be able to successfully identify ``good'' communities in a given network, the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics.In addition, our results suggest that, for many large realistic networks, the output of locally-biased methods that focus on communities that are centered around a given seed node might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate subtler structural properties that are important to consider in the development of better benchmark networks to test methods for community detection.
[Note: Because of space limitations in the arXiv's abstract field, this is an abridged version of the paper's abstract.]
△ Less
Submitted 8 October, 2014; v1 submitted 15 March, 2014;
originally announced March 2014.