-
Who talks about what? Comparing the information treatment in traditional media with online discussions
Authors:
Hendrik Schawe,
Mariano Gastón Beiró,
J. Ignacio Alvarez-Hamelin,
Dimitris Kotzinos,
Laura Hernández
Abstract:
We study the dynamics of interactions between a traditional medium, the New York Times journal, and its followers in Twitter, using a massive dataset. It consists of the metadata of the articles published by the journal during the first year of the COVID-19 pandemic, and the posts published in Twitter by a large set of followers of the @nytimes account along with those published by a set of follow…
▽ More
We study the dynamics of interactions between a traditional medium, the New York Times journal, and its followers in Twitter, using a massive dataset. It consists of the metadata of the articles published by the journal during the first year of the COVID-19 pandemic, and the posts published in Twitter by a large set of followers of the @nytimes account along with those published by a set of followers of several other media of different kind. The dynamics of discussions held in Twitter by exclusive followers of a medium show a strong dependence on the medium they follow: the followers of @FoxNews show the highest similarity to each other and a strong differentiation of interests with the general group. Our results also reveal the difference in the attention payed to U.S. presidential elections by the journal and by its followers, and show that the topic related to the ``Black Lives Matter'' movement started in Twitter, and was addressed later by the journal.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Designing weighted and multiplex networks for deep learning user geolocation in Twitter
Authors:
Federico M. Funes,
José Ignacio Alvarez-Hamelin,
Mariano G. Beiró
Abstract:
Predicting the geographical location of users of social media like Twitter has found several applications in health surveillance, emergency monitoring, content personalization, and social studies in general. In this work we contribute to the research in this area by designing and evaluating new methods based on the literature of weighted multigraphs combined with state-of-the-art deep learning tec…
▽ More
Predicting the geographical location of users of social media like Twitter has found several applications in health surveillance, emergency monitoring, content personalization, and social studies in general. In this work we contribute to the research in this area by designing and evaluating new methods based on the literature of weighted multigraphs combined with state-of-the-art deep learning techniques. The explored methods depart from a similar underlying structure (that of an extended mention and/or follower network) but use different information processing strategies, e.g., information diffusion through transductive and inductive algorithms -- RGCNs and GraphSAGE, respectively -- and node embeddings with Node2vec+. These graphs are then combined with attention mechanisms to incorporate the users' text view into the models. We assess the performance of each of these methods and compare them to baseline models in the publicly available Twitter-US dataset; we also make a new dataset available based on a large Twitter capture in Latin America. Finally, our work discusses the limitations and validity of the comparisons among methods in the context of different label definitions and metrics.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Evolution of the political opinion landscape during electoral periods
Authors:
Tomás Mussi Reyero,
Mariano G. Beiró,
J. Ignacio Alvarez-Hamelin,
Laura Hernández,
Dimitris Kotzinos
Abstract:
We present a study of the evolution of the political landscape during the 2015 and 2019 presidential elections in Argentina, based on the data obtained from the micro-blogging platform Twitter. We build a semantic network based on the hashtags used by all the users following at least one of the main candidates. With this network we can detect the topics that are discussed in the society. At a diff…
▽ More
We present a study of the evolution of the political landscape during the 2015 and 2019 presidential elections in Argentina, based on the data obtained from the micro-blogging platform Twitter. We build a semantic network based on the hashtags used by all the users following at least one of the main candidates. With this network we can detect the topics that are discussed in the society. At a difference with most studies of opinion on social media, we do not choose the topics a priori, they naturally emerge from the community structure of the semantic network instead. We assign to each user a dynamical topic vector which measures the evolution of her/his opinion in this space and allows us to monitor the similarities and differences among groups of supporters of different candidates. Our results show that the method is able to detect the dynamics of formation of opinion on different topics and, in particular, it can capture the resha** of the political opinion landscape which has led to the inversion of result between the two rounds of the 2015 election.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Learning language variations in news corpora through differential embeddings
Authors:
Carlos Selmo,
Julian F. Martinez,
Mariano G. Beiró,
J. Ignacio Alvarez-Hamelin
Abstract:
There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a…
▽ More
There is an increasing interest in the NLP community in capturing variations in the usage of language, either through time (i.e., semantic drift), across regions (as dialects or variants) or in different social contexts (i.e., professional or media technolects). Several successful dynamical embeddings have been proposed that can track semantic change through time. Here we show that a model with a central word representation and a slice-dependent contribution can learn word embeddings from different corpora simultaneously. This model is based on a star-like representation of the slices. We apply it to The New York Times and The Guardian newspapers, and we show that it can capture both temporal dynamics in the yearly slices of each corpus, and language variations between US and UK English in a curated multi-source corpus. We provide an extensive evaluation of this methodology.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
Social Events in a Time-Varying Mobile Phone Graph
Authors:
Carlos Sarraute,
Jorge Brea,
Javier Burroni,
Klaus Wehmuth,
Artur Ziviani,
J. I. Alvarez-Hamelin
Abstract:
The large-scale study of human mobility has been significantly enhanced over the last decade by the massive use of mobile phones in urban populations. Studying the activity of mobile phones allows us, not only to infer social networks between individuals, but also to observe the movements of these individuals in space and time. In this work, we investigate how these two related sources of informat…
▽ More
The large-scale study of human mobility has been significantly enhanced over the last decade by the massive use of mobile phones in urban populations. Studying the activity of mobile phones allows us, not only to infer social networks between individuals, but also to observe the movements of these individuals in space and time. In this work, we investigate how these two related sources of information can be integrated within the context of detecting and analyzing large social events. We show that large social events can be characterized not only by an anomalous increase in activity of the antennas in the neighborhood of the event, but also by an increase in social relationships of the attendants present in the event. Moreover, having detected a large social event via increased antenna activity, we can use the network connections to infer whether an unobserved user was present at the event. More precisely, we address the following three challenges: (i) automatically detecting large social events via increased antenna activity; (ii) characterizing the social cohesion of the detected event; and (iii) analyzing the feasibility of inferring whether unobserved users were in the event.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Socioeconomic correlations and stratification in social-communication networks
Authors:
Yannick Leo,
Eric Fleury,
J. Ignacio Alvarez-Hamelin,
Carlos Sarraute,
Márton Karsai
Abstract:
The uneven distribution of wealth and individual economic capacities are among the main forces which shape modern societies and arguably bias the emerging social structures. However, the study of correlations between the social network and economic status of individuals is difficult due to the lack of large-scale multimodal data disclosing both the social ties and economic indicators of the same p…
▽ More
The uneven distribution of wealth and individual economic capacities are among the main forces which shape modern societies and arguably bias the emerging social structures. However, the study of correlations between the social network and economic status of individuals is difficult due to the lack of large-scale multimodal data disclosing both the social ties and economic indicators of the same population. Here, we close this gap through the analysis of coupled datasets recording the mobile phone communications and bank transaction history of one million anonymised individuals living in a Latin American country. We show that wealth and debt are unevenly distributed among people in agreement with the Pareto principle; the observed social structure is strongly stratified, with people being better connected to others of their own socioeconomic class rather than to others of different classes; the social network appears with assortative socioeconomic correlations and tightly connected "rich clubs"; and that egos from the same class live closer to each other but commute further if they are wealthier. These results are based on a representative, society-large population, and empirically demonstrate some long-lasting hypotheses on socioeconomic correlations which potentially lay behind social segregation, and induce differences in human mobility.
△ Less
Submitted 14 December, 2016;
originally announced December 2016.
-
A new intrinsic way to measure IXP performance: an experience in Bolivia
Authors:
Esteban Carisimo,
Hernan Galperin,
José Ignacio Alvarez-Hamelin
Abstract:
Bolivia, a landlocked emerging country in South America, has one of the smallest networks in the whole Internet. Before the IXP implementation, delivering packets between national ISPs had to be sent them through international transit links. Being aware of this situation and looking for increasing the number of users, Bolivian government enacted a law to gather all national ISPs on a single IXP in…
▽ More
Bolivia, a landlocked emerging country in South America, has one of the smallest networks in the whole Internet. Before the IXP implementation, delivering packets between national ISPs had to be sent them through international transit links. Being aware of this situation and looking for increasing the number of users, Bolivian government enacted a law to gather all national ISPs on a single IXP in 2013.
In spite of several articles have researched about this topic, no one before has set the focus on measuring the evolution of end-users parameters in a South American develo** country, moreover after a significant changing on the topology. For the current work, we have mainly studied hop, latency, traffic and route variation, a long a seven months. Topology have not been studied because Bolivian ISPs must be connected each others under legal obligation.
To achieve our measurement goals, and under absence of global-scale measuring projects in this country, we have developed our own active-measurement platform among local ASes. During the platform development we had to deal with local ISP fears, governmental agencies and regulation pressures.
We also survey the main previous papers on IXP analysis, and we classfied them on obtained data and their sources.
△ Less
Submitted 4 May, 2015;
originally announced May 2015.
-
Router-level community structure of the Internet Autonomous Systems
Authors:
Mariano G. Beiró,
Sebastián P. Grynberg,
J. Ignacio Alvarez-Hamelin
Abstract:
The Internet is composed of routing devices connected between them and organized into independent administrative entities: the Autonomous Systems. The existence of different types of Autonomous Systems (like large connectivity providers, Internet Service Providers or universities) together with geographical and economical constraints, turns the Internet into a complex modular and hierarchical ne…
▽ More
The Internet is composed of routing devices connected between them and organized into independent administrative entities: the Autonomous Systems. The existence of different types of Autonomous Systems (like large connectivity providers, Internet Service Providers or universities) together with geographical and economical constraints, turns the Internet into a complex modular and hierarchical network. This organization is reflected in many properties of the Internet topology, like its high degree of clustering and its robustness.
In this work, we study the modular structure of the Internet router-level graph in order to assess to what extent the Autonomous Systems satisfy some of the known notions of community structure. We show that the modular structure of the Internet is much richer than what can be captured by the current community detection methods, which are severely affected by resolution limits and by the heterogeneity of the Autonomous Systems. Here we overcome this issue by using a multiresolution detection algorithm combined with a small sample of nodes. We also discuss recent work on community structure in the light of our results.
△ Less
Submitted 27 March, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.
-
Deciphering the global organization of clustering in real complex networks
Authors:
Pol Colomer-de-Simon,
M. Angeles Serrano,
Mariano G. Beiro,
J. Ignacio Alvarez-Hamelin,
Marian Boguna
Abstract:
We uncover the global organization of clustering in real complex networks. As it happens with other fundamental properties of networks such as the degree distribution, we find that real networks are neither completely random nor ordered with respect to clustering, although they tend to be closer to maximally random architectures. We reach this conclusion by comparing the global structure of cluste…
▽ More
We uncover the global organization of clustering in real complex networks. As it happens with other fundamental properties of networks such as the degree distribution, we find that real networks are neither completely random nor ordered with respect to clustering, although they tend to be closer to maximally random architectures. We reach this conclusion by comparing the global structure of clustering in real networks with that in maximally random and in maximally ordered clustered graphs. The former are produced with an exponential random graph model that maintains correlations among adjacent edges at the minimum needed to conform with the expected clustering spectrum; the later with a random model that arranges triangles in cliques inducing highly ordered structures. To compare the global organization of clustering in real and model networks, we compute $m$-core landscapes, where the $m$-core is defined, akin to the $k$-core, as the maximal subgraph with edges participating at least in $m$ triangles. This property defines a set of nested subgraphs that, contrarily to $k$-cores, is able to distinguish between hierarchical and modular architectures. To visualize the $m$-core decomposition we developed the LaNet-vi 3.0 tool.
△ Less
Submitted 1 June, 2013;
originally announced June 2013.
-
Obtaining Communities with a Fitness Growth Process
Authors:
Mariano G. Beiró,
Jorge R. Busch,
Sebastian P. Grynberg,
J. Ignacio Alvarez-Hamelin
Abstract:
The study of community structure has been a hot topic of research over the last years. But, while successfully applied in several areas, the concept lacks of a general and precise notion. Facts like the hierarchical structure and heterogeneity of complex networks make it difficult to unify the idea of community and its evaluation. The global functional known as modularity is probably the most used…
▽ More
The study of community structure has been a hot topic of research over the last years. But, while successfully applied in several areas, the concept lacks of a general and precise notion. Facts like the hierarchical structure and heterogeneity of complex networks make it difficult to unify the idea of community and its evaluation. The global functional known as modularity is probably the most used technique in this area. Nevertheless, its limits have been deeply studied. Local techniques as the ones by Lancichinetti et al. and Palla et al. arose as an answer to the resolution limit and degeneracies that modularity has.
Here we start from the algorithm by Lancichinetti et al. and propose a unique growth process for a fitness function that, while being local, finds a community partition that covers the whole network, updating the scale parameter dynamically. We test the quality of our results by using a set of benchmarks of heterogeneous graphs. We discuss alternative measures for evaluating the community structure and, in the light of them, infer possible explanations for the better performance of local methods compared to global ones in these cases.
△ Less
Submitted 6 June, 2012;
originally announced June 2012.
-
Is it possible to find the maximum clique in general graphs?
Authors:
José Ignacio Alvarez-Hamelin
Abstract:
Finding the maximum clique is a known NP-Complete problem and it is also hard to approximate. This work proposes two efficient algorithms to obtain it. Nevertheless, the first one is able to fins the maximum for some special cases, while the second one has its execution time bounded by the number of cliques that each vertex belongs to.
Finding the maximum clique is a known NP-Complete problem and it is also hard to approximate. This work proposes two efficient algorithms to obtain it. Nevertheless, the first one is able to fins the maximum for some special cases, while the second one has its execution time bounded by the number of cliques that each vertex belongs to.
△ Less
Submitted 17 February, 2012; v1 submitted 24 October, 2011;
originally announced October 2011.
-
On weakly optimal partitions in modular networks
Authors:
José Ignacio Alvarez-Hamelin,
Beiró Mariano Gastón,
Jorge Rodolfo Busch
Abstract:
Modularity was introduced as a measure of goodness for the community structure induced by a partition of the set of vertices in a graph. Then, it also became an objective function used to find good partitions, with high success. Nevertheless, some works have shown a scaling limit and certain instabilities when finding communities with this criterion. Modularity has been studied proposing several f…
▽ More
Modularity was introduced as a measure of goodness for the community structure induced by a partition of the set of vertices in a graph. Then, it also became an objective function used to find good partitions, with high success. Nevertheless, some works have shown a scaling limit and certain instabilities when finding communities with this criterion. Modularity has been studied proposing several formalisms, as hamiltonians in a Potts model or laplacians in spectral partitioning. In this paper we present a new probabilistic formalism to analyze modularity, and from it we derive an algorithm based on weakly optimal partitions. This algorithm obtains good quality partitions and also scales to large graphs.
△ Less
Submitted 20 August, 2010;
originally announced August 2010.
-
Point-to-point and Point-to-multipoint CDMA Access Network with Enhanced Security
Authors:
Ortega A. Alfredo,
Victor A. Bettachini,
José Ignacio Alvarez-Hamelin,
Diego F. Grosz
Abstract:
We propose a network implementation with enhanced security at the physical layer by means of time-hop** CDMA, supporting cryptographically secure point-to-point and point-to-multipoint communication. In particular, we analyze an active star topology optical network implementation capable of supporting 128 simultaneous users up to 20 km apart. The feasibility of the proposed scheme is demonstrate…
▽ More
We propose a network implementation with enhanced security at the physical layer by means of time-hop** CDMA, supporting cryptographically secure point-to-point and point-to-multipoint communication. In particular, we analyze an active star topology optical network implementation capable of supporting 128 simultaneous users up to 20 km apart. The feasibility of the proposed scheme is demonstrated through numerical simulation.
△ Less
Submitted 3 February, 2011; v1 submitted 29 December, 2009;
originally announced December 2009.
-
Understanding edge-connectivity in the Internet through core-decomposition
Authors:
José Ignacio Alvarez-Hamelin,
Beiró Mariano Gastón,
Jorge Rodolfo Busch
Abstract:
Internet is a complex network composed by several networks: the Autonomous Systems, each one designed to transport information efficiently. Routing protocols aim to find paths between nodes whenever it is possible (i.e., the network is not partitioned), or to find paths verifying specific constraints (e.g., a certain QoS is required). As connectivity is a measure related to both of them (partiti…
▽ More
Internet is a complex network composed by several networks: the Autonomous Systems, each one designed to transport information efficiently. Routing protocols aim to find paths between nodes whenever it is possible (i.e., the network is not partitioned), or to find paths verifying specific constraints (e.g., a certain QoS is required). As connectivity is a measure related to both of them (partitions and selected paths) this work provides a formal lower bound to it based on core-decomposition, under certain conditions, and low complexity algorithms to find it. We apply them to analyze maps obtained from the prominent Internet map** projects, using the LaNet-vi open-source software for its visualization.
△ Less
Submitted 8 December, 2009;
originally announced December 2009.
-
Faceted Ranking of Egos in Collaborative Tagging Systems
Authors:
Jose Ignacio Orlicki,
Pablo Ignacio Fierens,
José Ignacio Alvarez-Hamelin
Abstract:
Multimedia uploaded content is tagged and recommended by users of collaborative systems, resulting in informal classifications also known as folksonomies. Faceted web ranking has been proved a reasonable alternative to a single ranking which does not take into account a personalized context. In this paper we analyze the online computation of rankings of users associated to facets made up of mult…
▽ More
Multimedia uploaded content is tagged and recommended by users of collaborative systems, resulting in informal classifications also known as folksonomies. Faceted web ranking has been proved a reasonable alternative to a single ranking which does not take into account a personalized context. In this paper we analyze the online computation of rankings of users associated to facets made up of multiple tags. Possible applications are user reputation evaluation (ego-ranking) and improvement of content quality in case of retrieval. We propose a solution based on PageRank as centrality measure: (i) a ranking for each tag is computed offline on the basis of the corresponding tag-dependent subgraph; (ii) a faceted order is generated by merging rankings corresponding to all the tags in the facet. The fundamental assumption, validated by empirical observations, is that step (i) is scalable. We also present algorithms for part (ii) having time complexity O(k), where k is the number of tags in the facet, well suited to online computation.
△ Less
Submitted 26 September, 2008;
originally announced September 2008.
-
K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases
Authors:
José Ignacio Alvarez-Hamelin,
Luca Dall'Asta,
Alain Barrat,
Alessandro Vespignani
Abstract:
We consider the $k$-core decomposition of network models and Internet graphs at the autonomous system (AS) level. The $k$-core analysis allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. We compare the $k$-core structure obtained for AS graphs with those of several network models and dis…
▽ More
We consider the $k$-core decomposition of network models and Internet graphs at the autonomous system (AS) level. The $k$-core analysis allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. We compare the $k$-core structure obtained for AS graphs with those of several network models and discuss the differences and similarities with the real Internet architecture. The presence of biases and the incompleteness of the real maps are discussed and their effect on the $k$-core analysis is assessed with numerical experiments simulating biased exploration on a wide range of network models. We find that the $k$-core analysis provides an interesting characterization of the fluctuations and incompleteness of maps as well as information hel** to discriminate the original underlying structure.
△ Less
Submitted 16 April, 2008; v1 submitted 2 November, 2005;
originally announced November 2005.
-
Architectural Considerations for a Self-Configuring Routing Scheme for Spontaneous Networks
Authors:
José Ignacio Alvarez-Hamelin,
Aline Carneiro Viana,
Marcelo Dias De Amorim
Abstract:
Decoupling the permanent identifier of a node from the node's topology-dependent address is a promising approach toward completely scalable self-organizing networks. A group of proposals that have adopted such an approach use the same structure to: address nodes, perform routing, and implement location service. In this way, the consistency of the routing protocol relies on the coherent sharing o…
▽ More
Decoupling the permanent identifier of a node from the node's topology-dependent address is a promising approach toward completely scalable self-organizing networks. A group of proposals that have adopted such an approach use the same structure to: address nodes, perform routing, and implement location service. In this way, the consistency of the routing protocol relies on the coherent sharing of the addressing space among all nodes in the network. Such proposals use a logical tree-like structure where routes in this space correspond to routes in the physical level. The advantage of tree-like spaces is that it allows for simple address assignment and management. Nevertheless, it has low route selection flexibility, which results in low routing performance and poor resilience to failures. In this paper, we propose to increase the number of paths using incomplete hypercubes. The design of more complex structures, like multi-dimensional Cartesian spaces, improves the resilience and routing performance due to the flexibility in route selection. We present a framework for using hypercubes to implement indirect routing. This framework allows to give a solution adapted to the dynamics of the network, providing a proactive and reactive routing protocols, our major contributions. We show that, contrary to traditional approaches, our proposal supports more dynamic networks and is more robust to node failures.
△ Less
Submitted 26 October, 2005;
originally announced October 2005.
-
k-core decomposition: a tool for the visualization of large scale networks
Authors:
José Ignacio Alvarez-Hamelin,
Luca Dall'Asta,
Alain Barrat,
Alessandro Vespignani
Abstract:
We use the k-core decomposition to visualize large scale complex networks in two dimensions. This decomposition, based on a recursive pruning of the least connected vertices, allows to disentangle the hierarchical structure of networks by progressively focusing on their central cores. By using this strategy we develop a general visualization algorithm that can be used to compare the structural p…
▽ More
We use the k-core decomposition to visualize large scale complex networks in two dimensions. This decomposition, based on a recursive pruning of the least connected vertices, allows to disentangle the hierarchical structure of networks by progressively focusing on their central cores. By using this strategy we develop a general visualization algorithm that can be used to compare the structural properties of various networks and highlight their hierarchical structure. The low computational complexity of the algorithm, O(n+e), where 'n' is the size of the network, and 'e' is the number of edges, makes it suitable for the visualization of very large sparse networks. We apply the proposed visualization tool to several real and synthetic graphs, showing its utility in finding specific structural fingerprints of computer generated and real world networks.
△ Less
Submitted 12 October, 2005; v1 submitted 28 April, 2005;
originally announced April 2005.
-
Exploring networks with traceroute-like probes: theory and simulations
Authors:
Luca Dall'Asta,
Ignacio Alvarez-Hamelin,
Alain Barrat,
Alexei Vazquez,
Alessandro Vespignani
Abstract:
Map** the Internet generally consists in sampling the network from a limited set of sources by using traceroute-like probes. This methodology, akin to the merging of different spanning trees to a set of destination, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. In this paper…
▽ More
Map** the Internet generally consists in sampling the network from a limited set of sources by using traceroute-like probes. This methodology, akin to the merging of different spanning trees to a set of destination, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. In this paper we explore these biases and provide a statistical analysis of their origin. We derive an analytical approximation for the probability of edge and vertex detection that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph. In particular, we find that the edge and vertex detection probability depends on the betweenness centrality of each element. This allows us to show that shortest path routed sampling provides a better characterization of underlying graphs with broad distributions of connectivity. We complement the analytical discussion with a throughout numerical investigation of simulated map** strategies in network models with different topologies. We show that sampled graphs provide a fair qualitative characterization of the statistical properties of the original networks in a fair range of different strategies and exploration parameters. Moreover, we characterize the level of redundancy and completeness of the exploration process as a function of the topological properties of the network. Finally, we study numerically how the fraction of vertices and edges discovered in the sampled graph depends on the particular deployements of probing sources. The results might hint the steps toward more efficient map** strategies.
△ Less
Submitted 2 December, 2004;
originally announced December 2004.
-
A statistical approach to the traceroute-like exploration of networks: theory and simulations
Authors:
Luca Dall'Asta,
Ignacio Alvarez-Hamelin,
Alain Barrat,
Alexei Vazquez,
Alessandro Vespignani
Abstract:
Map** the Internet generally consists in sampling the network from a limited set of sources by using "traceroute"-like probes. This methodology, akin to the merging of different spanning trees to a set of destinations, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. Here we exp…
▽ More
Map** the Internet generally consists in sampling the network from a limited set of sources by using "traceroute"-like probes. This methodology, akin to the merging of different spanning trees to a set of destinations, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. Here we explore these biases and provide a statistical analysis of their origin. We derive a mean-field analytical approximation for the probability of edge and vertex detection that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph. In particular we find that the edge and vertex detection probability is depending on the betweenness centrality of each element. This allows us to show that shortest path routed sampling provides a better characterization of underlying graphs with scale-free topology. We complement the analytical discussion with a throughout numerical investigation of simulated map** strategies in different network models. We show that sampled graphs provide a fair qualitative characterization of the statistical properties of the original networks in a fair range of different strategies and exploration parameters. The numerical study also allows the identification of intervals of the exploration parameters that optimize the fraction of nodes and edges discovered in the sampled graph. This finding might hint the steps toward more efficient map** strategies.
△ Less
Submitted 22 June, 2004; v1 submitted 17 June, 2004;
originally announced June 2004.