-
Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance
Authors:
Roya Aliakbarisani,
Robert Jankowski,
M. Ángeles Serrano,
Marián Boguñá
Abstract:
Graph Neural Networks (GNNs) have excelled in predicting graph properties in various applications ranging from identifying trends in social networks to drug discovery and malware detection. With the abundance of new architectures and increased complexity, GNNs are becoming highly specialized when tested on a few well-known datasets. However, how the performance of GNNs depends on the topological a…
▽ More
Graph Neural Networks (GNNs) have excelled in predicting graph properties in various applications ranging from identifying trends in social networks to drug discovery and malware detection. With the abundance of new architectures and increased complexity, GNNs are becoming highly specialized when tested on a few well-known datasets. However, how the performance of GNNs depends on the topological and features properties of graphs is still an open question. In this work, we introduce a comprehensive benchmarking framework for graph machine learning, focusing on the performance of GNNs across varied network structures. Utilizing the geometric soft configuration model in hyperbolic space, we generate synthetic networks with realistic topological properties and node feature vectors. This approach enables us to assess the impact of network properties, such as topology-feature correlation, degree distributions, local density of triangles (or clustering), and homophily, on the effectiveness of different GNN architectures. Our results highlight the dependency of model performance on the interplay between network structure and node features, providing insights for model selection in various scenarios. This study contributes to the field by offering a versatile tool for evaluating GNNs, thereby assisting in develo** and selecting suitable models based on specific data characteristics.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Feature-aware ultra-low dimensional reduction of real networks
Authors:
Robert Jankowski,
Pegah Hozhabrierdi,
Marián Boguñá,
M. Ángeles Serrano
Abstract:
In existing models and embedding methods of networked systems, node features describing their qualities are usually overlooked in favor of focusing solely on node connectivity. This study introduces $FiD$-Mercator, a model-based ultra-low dimensional reduction technique that integrates node features with network structure to create $D$-dimensional maps of complex networks in a hyperbolic space. Th…
▽ More
In existing models and embedding methods of networked systems, node features describing their qualities are usually overlooked in favor of focusing solely on node connectivity. This study introduces $FiD$-Mercator, a model-based ultra-low dimensional reduction technique that integrates node features with network structure to create $D$-dimensional maps of complex networks in a hyperbolic space. This embedding method efficiently uses features as an initial condition, guiding the search of nodes' coordinates towards an optimal solution. The research reveals that downstream task performance improves with the correlation between network connectivity and features, emphasizing the importance of such correlation for enhancing the description and predictability of real networks. Simultaneously, hyperbolic embedding's ability to reproduce local network properties remains unaffected by the inclusion of features. The findings highlight the necessity for develo** network embedding techniques capable of exploiting such correlations to optimize both network structure and feature association jointly in the future.
△ Less
Submitted 10 June, 2024; v1 submitted 17 January, 2024;
originally announced January 2024.
-
Geometric description of clustering in directed networks
Authors:
Antoine Allard,
M. Ángeles Serrano,
Marián Boguñá
Abstract:
First principle network models are crucial to make sense of the intricate topology of real complex networks. While modeling efforts have been quite successful in undirected networks, generative models for networks with asymmetric interactions are still not well developed and are unable to reproduce several basic topological properties. This is particularly disconcerting considering that real direc…
▽ More
First principle network models are crucial to make sense of the intricate topology of real complex networks. While modeling efforts have been quite successful in undirected networks, generative models for networks with asymmetric interactions are still not well developed and are unable to reproduce several basic topological properties. This is particularly disconcerting considering that real directed networks are the norm rather than the exception in many natural and human-made complex systems. In this paper, we fill this gap and show how the network geometry paradigm can be elegantly extended to the case of directed networks. We define a maximum entropy ensemble of geometric (directed) random graphs with a given sequence of in- and out-degrees. Beyond these local properties, the ensemble requires only two additional parameters to fix the level of reciprocity and the seven possible types of 3-node cycles in directed networks. A systematic comparison with several representative empirical datasets shows that fixing the level of reciprocity alongside the coupling with an underlying geometry is able to reproduce the wide diversity of clustering patterns observed in real complex directed networks.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Detecting the ultra low dimensionality of real networks
Authors:
Pedro Almagro,
Marian Boguna,
M. Angeles Serrano
Abstract:
Reducing dimension redundancy to find simplifying patterns in high-dimensional datasets and complex networks has become a major endeavor in many scientific fields. However, detecting the dimensionality of their latent space is challenging but necessary to generate efficient embeddings to be used in a multitude of downstream tasks. Here, we propose a method to infer the dimensionality of networks w…
▽ More
Reducing dimension redundancy to find simplifying patterns in high-dimensional datasets and complex networks has become a major endeavor in many scientific fields. However, detecting the dimensionality of their latent space is challenging but necessary to generate efficient embeddings to be used in a multitude of downstream tasks. Here, we propose a method to infer the dimensionality of networks without the need for any a priori spatial embedding. Due to the ability of hyperbolic geometry to capture the complex connectivity of real networks, we detect ultra low dimensionality far below values reported using other approaches. We applied our method to real networks from different domains and found unexpected regularities, including: tissue-specific biomolecular networks being extremely low dimensional; brain connectomes being close to the three dimensions of their anatomical embedding; and social networks and the Internet requiring slightly higher dimensionality. Beyond paving the way towards an ultra efficient dimensional reduction, our findings help address fundamental issues that hinge on dimensionality, such as universality in critical behavior.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
How to help university students to manage their interruptions and improve their attention and time management
Authors:
Aurora Vizcaíno,
Ignacio García-Rodríguez de Guzmán,
Antonio Manjavacas,
Félix García,
José A. Cruz-Lemus,
Manuel Ángel Serrano
Abstract:
Technology has changed both our way of life and the way in which we learn. Students now attend lectures with laptops and mobile phones, and this situation is accentuated in the case of students on Computer Science degrees, since they require their computers in order to participate in both theoretical and practical lessons. Problems, however, arise when the students' social networks are opened on t…
▽ More
Technology has changed both our way of life and the way in which we learn. Students now attend lectures with laptops and mobile phones, and this situation is accentuated in the case of students on Computer Science degrees, since they require their computers in order to participate in both theoretical and practical lessons. Problems, however, arise when the students' social networks are opened on their computers and they receive notifications that interrupt their work. We set up a workshop regarding time, thoughts and attention management with the objective of teaching our students techniques that would allow them to manage interruptions, concentrate better and definitively make better use of their time. Those who took part in the workshop were then evaluated to discover its effects. The results obtained are quite optimistic and are described in this paper with the objective of encouraging other universities to perform similar initiatives.
△ Less
Submitted 23 April, 2021;
originally announced April 2021.
-
Geometric detection of hierarchical backbones in real networks
Authors:
Elisenda Ortiz,
Guillermo García-Pérez,
M. Ángeles Serrano
Abstract:
Hierarchies permeate the structure of real networks, whose nodes can be ranked according to different features. However, networks are far from tree-like structures and the detection of hierarchical ordering remains a challenge, hindered by the small-world property and the presence of a large number of cycles, in particular clustering. Here, we use geometric representations of undirected networks t…
▽ More
Hierarchies permeate the structure of real networks, whose nodes can be ranked according to different features. However, networks are far from tree-like structures and the detection of hierarchical ordering remains a challenge, hindered by the small-world property and the presence of a large number of cycles, in particular clustering. Here, we use geometric representations of undirected networks to achieve an enriched interpretation of hierarchy that integrates features defining popularity of nodes and similarity between them, such that the more similar a node is to a less popular neighbor the higher the hierarchical load of the relationship. The geometric approach allows us to measure the local contribution of nodes and links to the hierarchy within a unified framework. Additionally, we propose a link filtering method, the similarity filter, able to extract hierarchical backbones containing the links that represent statistically significant deviations with respect to the maximum entropy null model for geometric heterogeneous networks. We applied our geometric approach to the detection of similarity backbones of real networks in different domains and found that the backbones preserve local topological features at all scales. Interestingly, we also found that similarity backbones favor cooperation in evolutionary dynamics modelling social dilemmas.
△ Less
Submitted 1 September, 2020; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Network Geometry
Authors:
Marian Boguna,
Ivan Bonamassa,
Manlio De Domenico,
Shlomo Havlin,
Dmitri Krioukov,
M. Angeles Serrano
Abstract:
Real networks are finite metric spaces. Yet the geometry induced by shortest path distances in a network is definitely not its only geometry. Other forms of network geometry are the geometry of latent spaces underlying many networks, and the effective geometry induced by dynamical processes in networks. These three approaches to network geometry are all intimately related, and all three of them ha…
▽ More
Real networks are finite metric spaces. Yet the geometry induced by shortest path distances in a network is definitely not its only geometry. Other forms of network geometry are the geometry of latent spaces underlying many networks, and the effective geometry induced by dynamical processes in networks. These three approaches to network geometry are all intimately related, and all three of them have been found to be exceptionally efficient in discovering fractality, scale-invariance, self-similarity, and other forms of fundamental symmetries in networks. Network geometry is also of great utility in a variety of practical applications, ranging from the understanding how the brain works, to routing in the Internet. Here, we review the most important theoretical and practical developments dealing with these approaches to network geometry in the last two decades, and offer perspectives on future research directions and challenges in this novel frontier in the study of complexity.
△ Less
Submitted 27 October, 2020; v1 submitted 9 January, 2020;
originally announced January 2020.
-
Small worlds and clustering in spatial networks
Authors:
Marian Boguna,
Dmitri Krioukov,
Pedro Almagro,
M. Angeles Serrano
Abstract:
Networks with underlying metric spaces attract increasing research attention in network science, statistical physics, applied mathematics, computer science, sociology, and other fields. This attention is further amplified by the current surge of activity in graph embedding. In the vast realm of spatial network models, only a few reproduce even the most basic properties of real-world networks. Here…
▽ More
Networks with underlying metric spaces attract increasing research attention in network science, statistical physics, applied mathematics, computer science, sociology, and other fields. This attention is further amplified by the current surge of activity in graph embedding. In the vast realm of spatial network models, only a few reproduce even the most basic properties of real-world networks. Here, we focus on three such properties---sparsity, small worldness, and clustering---and identify the general subclass of spatial homogeneous and heterogeneous network models that are sparse small worlds and that have nonzero clustering in the thermodynamic limit. We rely on the maximum entropy approach where network links correspond to noninteracting fermions whose energy dependence on spatial distances determines network small worldness and clustering.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Mercator: uncovering faithful hyperbolic embeddings of complex networks
Authors:
Guillermo García-Pérez,
Antoine Allard,
M. Ángeles Serrano,
Marián Boguñá
Abstract:
We introduce Mercator, a reliable embedding method to map real complex networks into their hyperbolic latent geometry. The method assumes that the structure of networks is well described by the Popularity$\times$Similarity $\mathbb{S}^1/\mathbb{H}^2$ static geometric network model, which can accommodate arbitrary degree distributions and reproduces many pivotal properties of real networks, includi…
▽ More
We introduce Mercator, a reliable embedding method to map real complex networks into their hyperbolic latent geometry. The method assumes that the structure of networks is well described by the Popularity$\times$Similarity $\mathbb{S}^1/\mathbb{H}^2$ static geometric network model, which can accommodate arbitrary degree distributions and reproduces many pivotal properties of real networks, including self-similarity patterns. The algorithm mixes machine learning and maximum likelihood approaches to infer the coordinates of the nodes in the underlying hyperbolic disk with the best matching between the observed network topology and the geometric model. In its fast mode, Mercator uses a model-adjusted machine learning technique performing dimensional reduction to produce a fast and accurate map, whose quality already outperform other embedding algorithms in the literature. In the refined Mercator mode, the fast-mode embedding result is taken as an initial condition in a Maximum Likelihood estimation, which significantly improves the quality of the final embedding. Apart from its accuracy as an embedding tool, Mercator has the clear advantage of systematically inferring not only node orderings, or angular positions, but also the hidden degrees and global model parameters, and has the ability to embed networks with arbitrary degree distributions. Overall, our results suggest that mixing machine learning and maximum likelihood techniques in a model-dependent framework can boost the meaningful map** of complex networks.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Predictability of missing links in complex networks
Authors:
Guillermo García-Pérez,
Roya Aliakbarisani,
Abdorasoul Ghasemi,
M. Ángeles Serrano
Abstract:
Predicting missing links in real networks is an important problem in network science to which considerable efforts have been devoted, giving as a result a vast plethora of link prediction methods in the literature. In this work, we take a different point of view on the problem and study the theoretical limitations to the predictability of missing links. In particular, we hypothesise that there is…
▽ More
Predicting missing links in real networks is an important problem in network science to which considerable efforts have been devoted, giving as a result a vast plethora of link prediction methods in the literature. In this work, we take a different point of view on the problem and study the theoretical limitations to the predictability of missing links. In particular, we hypothesise that there is an irreducible uncertainty in link prediction on real networks as a consequence of the random nature of their formation process. By considering ensembles defined by well-known network models, we prove analytically that even the best possible link prediction method for an ensemble, given by the ranking of the ensemble connection probabilities, yields a limited precision. This result suggests a theoretical limitation to the predictability of links in real complex networks. Finally, we show that connection probabilities inferred by fitting network models to real networks allow to estimate an upper-bound to the predictability of missing links, and we further propose a method to approximate such bound from incomplete instances of real-world networks.
△ Less
Submitted 31 January, 2019;
originally announced February 2019.
-
Navigability of temporal networks in hyperbolic space
Authors:
Elisenda Ortiz,
Michele Starnini,
M. Ángeles Serrano
Abstract:
Information routing is one of the main tasks in many complex networks with a communication function. Maps produced by embedding the networks in hyperbolic space can assist this task enabling the implementation of efficient navigation strategies. However, only static maps have been considered so far, while navigation in more realistic situations, where the network structure may vary in time, remain…
▽ More
Information routing is one of the main tasks in many complex networks with a communication function. Maps produced by embedding the networks in hyperbolic space can assist this task enabling the implementation of efficient navigation strategies. However, only static maps have been considered so far, while navigation in more realistic situations, where the network structure may vary in time, remain largely unexplored. Here, we analyze the navigability of real networks by using greedy routing in hyperbolic space, where the nodes are subject to a stochastic activation-inactivation dynamics. We find that such dynamics enhances navigability with respect to the static case. Interestingly, there exists an optimal intermediate activation value, which ensures the best trade-off between the increase in the number of successful paths and a limited growth of their length. Contrary to expectations, the enhanced navigability is robust even when the most connected nodes inactivate with very high probability. Finally, our results indicate that some real networks are ultranavigable and remain highly navigable even if the network structure is extremely unsteady. These findings have important implications for the design and evaluation of efficient routing protocols that account for the temporal nature of real complex networks.
△ Less
Submitted 8 September, 2017;
originally announced September 2017.
-
Geometric correlations mitigate the extreme vulnerability of multiplex networks against targeted attacks
Authors:
Kaj-Kolja Kleineberg,
Lubos Buzna,
Fragkiskos Papadopoulos,
Marián Boguñá,
M. Ángeles Serrano
Abstract:
We show that real multiplex networks are unexpectedly robust against targeted attacks on high degree nodes, and that hidden interlayer geometric correlations predict this robustness. Without geometric correlations, multiplexes exhibit an abrupt breakdown of mutual connectivity, even with interlayer degree correlations. With geometric correlations, we instead observe a multistep cascading process l…
▽ More
We show that real multiplex networks are unexpectedly robust against targeted attacks on high degree nodes, and that hidden interlayer geometric correlations predict this robustness. Without geometric correlations, multiplexes exhibit an abrupt breakdown of mutual connectivity, even with interlayer degree correlations. With geometric correlations, we instead observe a multistep cascading process leading into a continuous transition, which apparently becomes fully continuous in the thermodynamic limit. Our results are important for the design of efficient protection strategies and of robust interacting networks in many domains.
△ Less
Submitted 5 April, 2017; v1 submitted 7 February, 2017;
originally announced February 2017.
-
Hidden geometric correlations in real multiplex networks
Authors:
Kaj-Kolja Kleineberg,
Marian Boguna,
M. Angeles Serrano,
Fragkiskos Papadopoulos
Abstract:
Real networks often form interacting parts of larger and more complex systems. Examples can be found in different domains, ranging from the Internet to structural and functional brain networks. Here, we show that these multiplex systems are not random combinations of single network layers. Instead, they are organized in specific ways dictated by hidden geometric correlations between the individual…
▽ More
Real networks often form interacting parts of larger and more complex systems. Examples can be found in different domains, ranging from the Internet to structural and functional brain networks. Here, we show that these multiplex systems are not random combinations of single network layers. Instead, they are organized in specific ways dictated by hidden geometric correlations between the individual layers. We find that these correlations are strong in different real multiplexes, and form a key framework for answering many important questions. Specifically, we show that these geometric correlations facilitate: (i) the definition and detection of multidimensional communities, which are sets of nodes that are simultaneously similar in multiple layers; (ii) accurate trans-layer link prediction, where connections in one layer can be predicted by observing the hidden geometric space of another layer; and (iii) efficient targeted navigation in the multilayer system using only local knowledge, which outperforms navigation in the single layers only if the geometric correlations are sufficiently strong. Our findings uncover fundamental organizing principles behind real multiplexes and can have important applications in diverse domains.
△ Less
Submitted 8 February, 2017; v1 submitted 15 January, 2016;
originally announced January 2016.
-
Deciphering the global organization of clustering in real complex networks
Authors:
Pol Colomer-de-Simon,
M. Angeles Serrano,
Mariano G. Beiro,
J. Ignacio Alvarez-Hamelin,
Marian Boguna
Abstract:
We uncover the global organization of clustering in real complex networks. As it happens with other fundamental properties of networks such as the degree distribution, we find that real networks are neither completely random nor ordered with respect to clustering, although they tend to be closer to maximally random architectures. We reach this conclusion by comparing the global structure of cluste…
▽ More
We uncover the global organization of clustering in real complex networks. As it happens with other fundamental properties of networks such as the degree distribution, we find that real networks are neither completely random nor ordered with respect to clustering, although they tend to be closer to maximally random architectures. We reach this conclusion by comparing the global structure of clustering in real networks with that in maximally random and in maximally ordered clustered graphs. The former are produced with an exponential random graph model that maintains correlations among adjacent edges at the minimum needed to conform with the expected clustering spectrum; the later with a random model that arranges triangles in cliques inducing highly ordered structures. To compare the global organization of clustering in real and model networks, we compute $m$-core landscapes, where the $m$-core is defined, akin to the $k$-core, as the maximal subgraph with edges participating at least in $m$ triangles. This property defines a set of nested subgraphs that, contrarily to $k$-cores, is able to distinguish between hierarchical and modular architectures. To visualize the $m$-core decomposition we developed the LaNet-vi 3.0 tool.
△ Less
Submitted 1 June, 2013;
originally announced June 2013.
-
Epidemic spreading on interconnected networks
Authors:
Anna Saumell-Mendiola,
M. Ángeles Serrano,
Marián Boguñá
Abstract:
Many real networks are not isolated from each other but form networks of networks, often interrelated in non trivial ways. Here, we analyze an epidemic spreading process taking place on top of two interconnected complex networks. We develop a heterogeneous mean field approach that allows us to calculate the conditions for the emergence of an endemic state. Interestingly, a global endemic state may…
▽ More
Many real networks are not isolated from each other but form networks of networks, often interrelated in non trivial ways. Here, we analyze an epidemic spreading process taking place on top of two interconnected complex networks. We develop a heterogeneous mean field approach that allows us to calculate the conditions for the emergence of an endemic state. Interestingly, a global endemic state may arise in the coupled system even though the epidemics is not able to propagate on each network separately, and even when the number of coupling connections is small. Our analytic results are successfully confronted against large-scale numerical simulations.
△ Less
Submitted 18 February, 2012;
originally announced February 2012.
-
Popularity versus Similarity in Growing Networks
Authors:
Fragkiskos Papadopoulos,
Maksim Kitsak,
M. Angeles Serrano,
Marian Boguna,
Dmitri Krioukov
Abstract:
Popularity is attractive -- this is the formula underlying preferential attachment, a popular explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections that nodes have follows power laws observed in many real networks. Preferential attachment has been directly validate…
▽ More
Popularity is attractive -- this is the formula underlying preferential attachment, a popular explanation for the emergence of scaling in growing networks. If new connections are made preferentially to more popular nodes, then the resulting distribution of the number of connections that nodes have follows power laws observed in many real networks. Preferential attachment has been directly validated for some real networks, including the Internet. Preferential attachment can also be a consequence of different underlying processes based on node fitness, ranking, optimization, random walks, or duplication. Here we show that popularity is just one dimension of attractiveness. Another dimension is similarity. We develop a framework where new connections, instead of preferring popular nodes, optimize certain trade-offs between popularity and similarity. The framework admits a geometric interpretation, in which popularity preference emerges from local optimization. As opposed to preferential attachment, the optimization framework accurately describes large-scale evolution of technological (Internet), social (web of trust), and biological (E.coli metabolic) networks, predicting the probability of new links in them with a remarkable precision. The developed framework can thus be used for predicting new links in evolving networks, and provides a different perspective on preferential attachment as an emergent phenomenon.
△ Less
Submitted 17 April, 2013; v1 submitted 1 June, 2011;
originally announced June 2011.
-
Percolation in self-similar networks
Authors:
M. Angeles Serrano,
Dmitri Krioukov,
Marian Boguna
Abstract:
We provide a simple proof that graphs in a general class of self-similar networks have zero percolation threshold. The considered self-similar networks include random scale-free graphs with given expected node degrees and zero clustering, scale-free graphs with finite clustering and metric structure, growing scale-free networks, and many real networks. The proof and the derivation of the giant com…
▽ More
We provide a simple proof that graphs in a general class of self-similar networks have zero percolation threshold. The considered self-similar networks include random scale-free graphs with given expected node degrees and zero clustering, scale-free graphs with finite clustering and metric structure, growing scale-free networks, and many real networks. The proof and the derivation of the giant component size do not require the assumption that networks are treelike. Our results rely only on the observation that self-similar networks possess a hierarchy of nested subgraphs whose average degree grows with their depth in the hierarchy. We conjecture that this property is pivotal for percolation in networks.
△ Less
Submitted 27 January, 2011; v1 submitted 27 October, 2010;
originally announced October 2010.
-
Extracting the multiscale backbone of complex weighted networks
Authors:
M. Angeles Serrano,
Marian Boguna,
Alessandro Vespignani
Abstract:
A large number of complex systems find a natural abstraction in the form of weighted networks whose nodes represent the elements of the system and the weighted edges identify the presence of an interaction and its relative strength. In recent years, the study of an increasing number of large scale networks has highlighted the statistical heterogeneity of their interaction pattern, with degree an…
▽ More
A large number of complex systems find a natural abstraction in the form of weighted networks whose nodes represent the elements of the system and the weighted edges identify the presence of an interaction and its relative strength. In recent years, the study of an increasing number of large scale networks has highlighted the statistical heterogeneity of their interaction pattern, with degree and weight distributions which vary over many orders of magnitude. These features, along with the large number of elements and links, make the extraction of the truly relevant connections forming the network's backbone a very challenging problem. More specifically, coarse-graining approaches and filtering techniques are at struggle with the multiscale nature of large scale systems. Here we define a filtering method that offers a practical procedure to extract the relevant connection backbone in complex multiscale networks, preserving the edges that represent statistical significant deviations with respect to a null model for the local assignment of weights to edges. An important aspect of the method is that it does not belittle small-scale interactions and operates at all scales defined by the weight distribution. We apply our method to real world network instances and compare the obtained results with alternative backbone extraction techniques.
△ Less
Submitted 15 April, 2009;
originally announced April 2009.
-
Beyond Zipf's law: Modeling the structure of human language
Authors:
M. Angeles Serrano,
Alessandro Flammini,
Filippo Menczer
Abstract:
Human language, the most powerful communication system in history, is closely associated with cognition. Written text is one of the fundamental manifestations of language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Still, only classical patterns such as Zipf's law have been explored in depth…
▽ More
Human language, the most powerful communication system in history, is closely associated with cognition. Written text is one of the fundamental manifestations of language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Still, only classical patterns such as Zipf's law have been explored in depth. In contrast, other basic properties like the existence of bursts of rare words in specific documents, the topical organization of collections, or the sublinear growth of vocabulary size with the length of a document, have only been studied one by one and mainly applying heuristic methodologies rather than basic principles and general mechanisms. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on Heaps' law, burstiness, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science, and linguistics.
△ Less
Submitted 3 February, 2009;
originally announced February 2009.
-
On Cycles in AS Relationships
Authors:
Xenofontas Dimitropoulos,
M. Angeles Serrano,
Dmitri Krioukov
Abstract:
Several users of our AS relationship inference data (http://www.caida.org/data/active/as-relationships/), released with cs/0604017, asked us why it contained AS relationship cycles, e.g., cases where AS A is a provider of AS B, B is a provider of C, and C is a provider of A, or other cycle types. Having been answering these questions in private communications, we have eventually decided to write…
▽ More
Several users of our AS relationship inference data (http://www.caida.org/data/active/as-relationships/), released with cs/0604017, asked us why it contained AS relationship cycles, e.g., cases where AS A is a provider of AS B, B is a provider of C, and C is a provider of A, or other cycle types. Having been answering these questions in private communications, we have eventually decided to write down our answers here for future reference.
△ Less
Submitted 6 July, 2008;
originally announced July 2008.
-
Self-similarity of complex networks and hidden metric spaces
Authors:
M. Angeles Serrano,
Dmitri Krioukov,
Marian Boguna
Abstract:
We demonstrate that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We…
▽ More
We demonstrate that the self-similarity of some scale-free networks with respect to a simple degree-thresholding renormalization scheme finds a natural interpretation in the assumption that network nodes exist in hidden metric spaces. Clustering, i.e., cycles of length three, plays a crucial role in this framework as a topological reflection of the triangle inequality in the hidden geometry. We prove that a class of hidden variable models with underlying metric spaces are able to accurately reproduce the self-similarity properties that we measured in the real networks. Our findings indicate that hidden geometries underlying these real networks are a plausible explanation for their observed topologies and, in particular, for their self-similarity with respect to the degree-based renormalization.
△ Less
Submitted 20 February, 2008; v1 submitted 10 October, 2007;
originally announced October 2007.
-
Decoding the structure of the WWW: facts versus sampling biases
Authors:
M. Angeles Serrano,
Ana Maguitman,
Marian Boguna,
Santo Fortunato,
Alessandro Vespignani
Abstract:
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by se…
▽ More
The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this paper, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In order to provide a more accurate characterization of the Web graph and identify observables which are clearly discriminating with respect to the sampling process, we study the behavior of degree-degree correlation functions and the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of theWeb. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web.
△ Less
Submitted 14 February, 2006; v1 submitted 8 November, 2005;
originally announced November 2005.
-
Competition and adaptation in an Internet evolution model
Authors:
M. Angeles Serrano,
Marian Boguna,
Albert Diaz-Guilera
Abstract:
We model the evolution of the Internet at the Autonomous System level as a process of competition for users and adaptation of bandwidth capability. We find the exponent of the degree distribution as a simple function of the growth rates of the number of autonomous systems and the total number of connections in the Internet, both empirically measurable quantities. This fact place our model apart…
▽ More
We model the evolution of the Internet at the Autonomous System level as a process of competition for users and adaptation of bandwidth capability. We find the exponent of the degree distribution as a simple function of the growth rates of the number of autonomous systems and the total number of connections in the Internet, both empirically measurable quantities. This fact place our model apart from others in which this exponent depends on parameters that need to be adjusted in a model dependent way. Our approach also accounts for a high level of clustering as well as degree-degree correlations, both with the same hierarchical structure present in the real Internet. Further, it also highlights the interplay between bandwidth, connectivity and traffic of the network.
△ Less
Submitted 12 July, 2004; v1 submitted 30 June, 2004;
originally announced June 2004.