Search | arXiv e-print repository

Generating realistic scaled complex networks

Authors: Christian L. Staudt, Michael Hamann, Alexander Gutfraind, Ilya Safro, Henning Meyerhenke

Abstract: Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of mod… ▽ More Research on generative models is a central project in the emerging field of network science, and it studies how statistical patterns found in real networks could be generated by formal rules. Output from these generative models is then the basis for designing and evaluating computational methods on networks, and for verification and simulation studies. During the last two decades, a variety of models has been proposed with an ultimate goal of achieving comprehensive realism for the generated networks. In this study, we (a) introduce a new generator, termed ReCoN; (b) explore how ReCoN and some existing models can be fitted to an original network to produce a structurally similar replica, (c) use ReCoN to produce networks much larger than the original exemplar, and finally (d) discuss open problems and promising research directions. In a comparative experimental study, we find that ReCoN is often superior to many other state-of-the-art network generation methods. We argue that ReCoN is a scalable and effective tool for modeling a given network while preserving important properties at both micro- and macroscopic scales, and for scaling the exemplar data by orders of magnitude in size. △ Less

Submitted 23 March, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

Comments: 26 pages, 13 figures, extended version, a preliminary version of the paper was presented at the 5th International Workshop on Complex Networks and their Applications

arXiv:1601.00289 [pdf, other]

An Empirical Comparison of Big Graph Frameworks in the Context of Network Analysis

Authors: Jannis Koch, Christian L. Staudt, Maximilian Vogel, Henning Meyerhenke

Abstract: Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frame… ▽ More Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks - GraphLab, Apache Giraph, Giraph++ and Apache Flink - are used to implement algorithms for the representative problems Connected Components, Community Detection, PageRank and Clustering Coefficients. The implementations are executed on a computer cluster to evaluate the frameworks' suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with about 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume. △ Less

Submitted 3 January, 2016; originally announced January 2016.

arXiv:1601.00286 [pdf, other]

Structure-Preserving Sparsification Methods for Social Networks

Authors: Michael Hamann, Gerd Lindner, Henning Meyerhenke, Christian L. Staudt, Dorothea Wagner

Abstract: Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating… ▽ More Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally or locally by these scores. We show that applying a local filtering technique improves the preservation of all kinds of properties. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of social networks from Facebook, Google+, Twitter and LiveJournal with respect to network properties including diameter, connected components, community structure, multiple node centrality measures and the behavior of epidemic simulations. In order to assess the preservation of the community structure, we also include experiments on synthetically generated networks with ground truth communities. Experiments with our implementations of the sparsification methods (included in the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges for sparse graphs with a reasonable density. The experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. While our Local Degree method is best for preserving connectivity and short distances, other newly introduced local variants are best for preserving the community structure. △ Less

Submitted 3 January, 2016; originally announced January 2016.

arXiv:1505.00564 [pdf, other]

Structure-Preserving Sparsification of Social Networks

Authors: Gerd Lindner, Christian L. Staudt, Michael Hamann, Henning Meyerhenke, Dorothea Wagner

Abstract: Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating… ▽ More Sparsification reduces the size of networks while preserving structural and statistical properties of interest. Various sparsifying algorithms have been proposed in different contexts. We contribute the first systematic conceptual and experimental comparison of \textit{edge sparsification} methods on a diverse set of network properties. It is shown that they can be understood as methods for rating edges by importance and then filtering globally by these scores. In addition, we propose a new sparsification method (\textit{Local Degree}) which preserves edges leading to local hub nodes. All methods are evaluated on a set of 100 Facebook social networks with respect to network properties including diameter, connected components, community structure, and multiple node centrality measures. Experiments with our implementations of the sparsification methods (using the open-source network analysis tool suite NetworKit) show that many network properties can be preserved down to about 20\% of the original set of edges. Furthermore, the experimental results allow us to differentiate the behavior of different methods and show which method is suitable with respect to which property. Our Local Degree method is fast enough for large-scale networks and performs well across a wider range of properties than previously proposed methods. △ Less

Submitted 4 May, 2015; originally announced May 2015.

Comments: 8 pages

arXiv:1501.03545 [pdf, other]

Fast generation of complex networks with underlying hyperbolic geometry

Authors: Moritz von Looz, Christian L. Staudt, Henning Meyerhenke, Roman Prutkin

Abstract: Complex networks have become increasingly popular for modeling various real-world phenomena. Realistic generative network models are important in this context as they avoid privacy concerns of real data and simplify complex network research regarding data sharing, reproducibility, and scalability studies. \emph{Random hyperbolic graphs} are a well-analyzed family of geometric graphs. Previous work… ▽ More Complex networks have become increasingly popular for modeling various real-world phenomena. Realistic generative network models are important in this context as they avoid privacy concerns of real data and simplify complex network research regarding data sharing, reproducibility, and scalability studies. \emph{Random hyperbolic graphs} are a well-analyzed family of geometric graphs. Previous work provided empirical and theoretical evidence that this generative graph model creates networks with non-vanishing clustering and other realistic features. However, the investigated networks in previous applied work were small, possibly due to the quadratic running time of a previous generator. In this work we provide the first generation algorithm for these networks with subquadratic running time. We prove a time complexity of $O((n^{3/2}+m) \log n)$ with high probability for the generation process. This running time is confirmed by experimental data with our implementation. The acceleration stems primarily from the reduction of pairwise distance computations through a polar quadtree, which we adapt to hyperbolic space for this purpose. In practice we improve the running time of a previous implementation by at least two orders of magnitude this way. Networks with billions of edges can now be generated in a few minutes. Finally, we evaluate the largest networks of this model published so far. Our empirical analysis shows that important features are retained over different graph densities and degree distributions. △ Less

Submitted 23 April, 2015; v1 submitted 14 January, 2015; originally announced January 2015.

arXiv:1409.6241 [pdf, other]

Approximating Betweenness Centrality in Large Evolving Networks

Authors: Elisabetta Bergamini, Henning Meyerhenke, Christian L. Staudt

Abstract: Betweenness centrality ranks the importance of nodes by their participation in all shortest paths of the network. Therefore computing exact betweenness values is impractical in large networks. For static networks, approximation based on randomly sampled paths has been shown to be significantly faster in practice. However, for dynamic networks, no approximation algorithm for betweenness centrality… ▽ More Betweenness centrality ranks the importance of nodes by their participation in all shortest paths of the network. Therefore computing exact betweenness values is impractical in large networks. For static networks, approximation based on randomly sampled paths has been shown to be significantly faster in practice. However, for dynamic networks, no approximation algorithm for betweenness centrality is known that improves on static recomputation. We address this deficit by proposing two incremental approximation algorithms (for weighted and unweighted connected graphs) which provide a provable guarantee on the absolute approximation error. Processing batches of edge insertions, our algorithms yield significant speedups up to a factor of $10^4$ compared to restarting the approximation. This is enabled by investing memory to store and efficiently update shortest paths. As a building block, we also propose an asymptotically faster algorithm for updating the SSSP problem in unweighted graphs. Our experimental study shows that our algorithms are the first to make in-memory computation of a betweenness ranking practical for million-edge semi-dynamic networks. Moreover, our results show that the accuracy is even better than the theoretical guarantees in terms of absolutes errors and the rank of nodes is well preserved, in particular for those with high betweenness. △ Less

Submitted 22 September, 2014; originally announced September 2014.

Comments: 17 pages

arXiv:1403.3005 [pdf, other]

NetworKit: A Tool Suite for Large-scale Complex Network Analysis

Authors: Christian L. Staudt, Aleksejs Sazonovs, Henning Meyerhenke

Abstract: We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristic… ▽ More We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python front end, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks. △ Less

Submitted 13 November, 2015; v1 submitted 12 March, 2014; originally announced March 2014.

Comments: 21 pages

arXiv:1304.4453 [pdf, other]

doi 10.1109/TPDS.2015.2390633

Engineering Parallel Algorithms for Community Detection in Massive Networks

Authors: Christian L. Staudt, Henning Meyerhenke

Abstract: The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, althoug… ▽ More The amount of graph-structured data has recently experienced an enormous growth in many applications. To transform such data into useful information, fast analytics algorithms and software tools are necessary. One common graph analytics kernel is disjoint community detection (or graph clustering). Despite extensive research on heuristic solvers for this task, only few parallel codes exist, although parallelism will be necessary to scale to the data volume of real-world applications. We address the deficit in computing capability by a flexible and extensible community detection framework with shared-memory parallelism. Within this framework we design and implement efficient parallel community detection heuristics: A parallel label propagation scheme; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above. In extensive experiments driven by the algorithm engineering paradigm, we identify the most successful parameters and combinations of these algorithms. We also compare our implementations with state-of-the-art competitors. The processing rate of our fastest algorithm often reaches 50M edges/second. We recommend the parallel Louvain method and our variant with refinement as both qualitatively strong and fast. Our methods are suitable for massive data sets with billions of edges. △ Less

Submitted 2 February, 2015; v1 submitted 16 April, 2013; originally announced April 2013.

Comments: 14 pages, first presented at the 2013 International Conference on Parallel Processing, revised accepted for IEEE Transactions on Parallel and Distributed Systems (TPDS)

Showing 1–8 of 8 results for author: Staudt, C L