-
Efficient community detection of network flows for varying Markov times and bipartite networks
Authors:
Masoumeh Kheirkhahzadeh,
Andrea Lancichinetti,
Martin Rosvall
Abstract:
Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often…
▽ More
Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner-workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.
△ Less
Submitted 19 March, 2016; v1 submitted 4 November, 2015;
originally announced November 2015.
-
Map** bilateral information interests using the activity of Wikipedia editors
Authors:
Fariba Karimi,
Ludvig Bohlin,
Anna Samoilenko,
Martin Rosvall,
Andrea Lancichinetti
Abstract:
We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values, and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today's world map of in- formation interests actually l…
▽ More
We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values, and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today's world map of in- formation interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries' bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.
△ Less
Submitted 25 January, 2016; v1 submitted 18 March, 2015;
originally announced March 2015.
-
Identifying modular flows on multilayer networks reveals highly overlap** organization in social systems
Authors:
Manlio De Domenico,
Andrea Lancichinetti,
Alex Arenas,
Martin Rosvall
Abstract:
Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weight…
▽ More
Unveiling the community structure of networks is a powerful methodology to comprehend interconnected systems across the social and natural sciences. To identify different types of functional modules in interaction data aggregated in a single network layer, researchers have developed many powerful methods. For example, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow in the systems they represent. However, many networked systems consist of agents or components that exhibit multiple layers of interactions. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here we propose a method based on compression of network flows that can identify modular flows in non-aggregated multilayer networks. Our numerical experiments on synthetic networks show that the method can accurately identify modules that cannot be identified in aggregated networks or by analyzing the layers separately. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks: scientists affiliated to the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method reveals smaller modules with more overlap that better capture the actual organization.
△ Less
Submitted 13 August, 2014;
originally announced August 2014.
-
Robustness of journal rankings by network flows with different amounts of memory
Authors:
Ludvig Bohlin,
Alcides Viamontes Esquivel,
Andrea Lancichinetti,
Martin Rosvall
Abstract:
As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network…
▽ More
As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, step** between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that higher-order Markov models perform better and are more robust to the selection of journals. Whereas our analysis indicates that higher-order models perform better, the performance gain for the second-order Markov model comes at the cost of requiring more citation data over a longer time period.
△ Less
Submitted 9 April, 2015; v1 submitted 30 May, 2014;
originally announced May 2014.
-
A high-reproducibility and high-accuracy method for automated topic classification
Authors:
Andrea Lancichinetti,
M. Irmak Sirer,
Jane X. Wang,
Daniel Acuna,
Konrad Körding,
Luís A. Nunes Amaral
Abstract:
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perf…
▽ More
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.
△ Less
Submitted 3 February, 2014;
originally announced February 2014.
-
Memory in network flows and its effects on spreading dynamics and community detection
Authors:
Martin Rosvall,
Alcides V. Esquivel,
Andrea Lancichinetti,
Jevin D. West,
Renaud Lambiotte
Abstract:
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and…
▽ More
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and while we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking, and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.
△ Less
Submitted 12 August, 2014; v1 submitted 21 May, 2013;
originally announced May 2013.
-
Consensus clustering in complex networks
Authors:
Andrea Lancichinetti,
Santo Fortunato
Abstract:
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable…
▽ More
The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.
△ Less
Submitted 27 March, 2012;
originally announced March 2012.
-
Limits of modularity maximization in community detection
Authors:
Andrea Lancichinetti,
Santo Fortunato
Abstract:
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates…
▽ More
Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates when the resolution is low; the tendency to split large subgraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.
△ Less
Submitted 12 February, 2012; v1 submitted 6 July, 2011;
originally announced July 2011.
-
Finding statistically significant communities in networks
Authors:
Andrea Lancichinetti,
Filippo Radicchi,
Jose' Javier Ramasco,
Santo Fortunato
Abstract:
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present…
▽ More
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlap** communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
△ Less
Submitted 4 May, 2011; v1 submitted 10 December, 2010;
originally announced December 2010.
-
Characterizing the community structure of complex networks
Authors:
Andrea Lancichinetti,
Mikko Kivela,
Jari Saramaki,
Santo Fortunato
Abstract:
Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. We present a systematic empirical analysis of the statistical properties of communities…
▽ More
Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. We present a systematic empirical analysis of the statistical properties of communities in large information, communication, technological, biological, and social networks. We find that the mesoscopic organization of networks of the same category is remarkably similar. This is reflected in several characteristics of community structure, which can be used as ``fingerprints'' of specific network categories. While community size distributions are always broad, certain categories of networks consist mainly of tree-like communities, while others have denser modules. Average path lengths within communities initially grow logarithmically with community size, but the growth saturates or slows down for communities larger than a characteristic size. This behaviour is related to the presence of hubs within communities, whose roles differ across categories. Also the community embeddedness of nodes, measured in terms of the fraction of links within their communities, has a characteristic distribution for each category. Our findings are verified by the use of two fundamentally different community detection methods.
△ Less
Submitted 24 May, 2010;
originally announced May 2010.