Search | arXiv e-print repository

Human-AI Coevolution

Authors: Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-Laszlo Barabasi, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, Janos Kertesz, Alistair Knott, Yannis Ioannidis, Paul Lukowicz, Andrea Passarella, Alex Sandy Pentland, John Shawe-Taylor, Alessandro Vespignani

Abstract: Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla… ▽ More Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political. △ Less

Submitted 3 May, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2206.04872 [pdf, other]

doi 10.1145/3534678.3539364

Multi-fidelity Hierarchical Neural Processes

Authors: Dongxia Wu, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

Abstract: Science and engineering fields use computer simulation extensively. These simulations are often run at multiple levels of sophistication to balance accuracy and efficiency. Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs. Cheap data generated from low-fidelity simulators can be combined with limited high-quality data generated by an expensive… ▽ More Science and engineering fields use computer simulation extensively. These simulations are often run at multiple levels of sophistication to balance accuracy and efficiency. Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs. Cheap data generated from low-fidelity simulators can be combined with limited high-quality data generated by an expensive high-fidelity simulator. Existing methods based on Gaussian processes rely on strong assumptions of the kernel functions and can hardly scale to high-dimensional settings. We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling. MF-HNP inherits the flexibility and scalability of Neural Processes. The latent variables transform the correlations among different fidelity levels from observations to latent space. The predictions across fidelities are conditionally independent given the latent states. It helps alleviate the error propagation issue in existing methods. MF-HNP is flexible enough to handle non-nested high dimensional data at different fidelity levels with varying input and output dimensions. We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation. In contrast to deep Gaussian Processes with only low-dimensional (< 10) tasks, our method shows great promise for speeding up high-dimensional complex simulations (over 7000 for epidemiology modeling and 45000 for climate modeling). △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2106.02770 [pdf, other]

Deep Bayesian Active Learning for Accelerating Stochastic Simulation

Authors: Dongxia Wu, Ruijia Niu, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

Abstract: Stochastic simulations such as large-scale, spatiotemporal, age-structured epidemic models are computationally expensive at fine-grained resolution. While deep surrogate models can speed up the simulations, doing so for stochastic simulations and with active learning approaches is an underexplored area. We propose Interactive Neural Process (INP), a deep Bayesian active learning framework for lear… ▽ More Stochastic simulations such as large-scale, spatiotemporal, age-structured epidemic models are computationally expensive at fine-grained resolution. While deep surrogate models can speed up the simulations, doing so for stochastic simulations and with active learning approaches is an underexplored area. We propose Interactive Neural Process (INP), a deep Bayesian active learning framework for learning deep surrogate models to accelerate stochastic simulations. INP consists of two components, a spatiotemporal surrogate model built upon Neural Process (NP) family and an acquisition function for active learning. For surrogate modeling, we develop Spatiotemporal Neural Process (STNP) to mimic the simulator dynamics. For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models. We perform a theoretical analysis and demonstrate that LIG reduces sample complexity compared with random sampling in high dimensions. We also conduct empirical studies on three complex spatiotemporal simulators for reaction diffusion, heat flow, and infectious disease. The results demonstrate that STNP outperforms the baselines in the offline learning setting and LIG achieves the state-of-the-art for Bayesian active learning. △ Less

Submitted 4 June, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

arXiv:2105.11982 [pdf, other]

Quantifying Uncertainty in Deep Spatiotemporal Forecasting

Authors: Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

Abstract: Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification… ▽ More Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification (UQ) methods for spatiotemporal forecasting is missing in the community. In this paper, we describe two types of spatiotemporal forecasting problems: regular grid-based and graph-based. Then we analyze UQ methods from both the Bayesian and the frequentist point of view, casting in a unified framework via statistical decision theory. Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical and computational trade-offs for different UQ methods: Bayesian methods are typically more robust in mean prediction, while confidence levels obtained from frequentist methods provide more extensive coverage over data variations. Computationally, quantile regression type methods are cheaper for a single confidence interval but require re-training for different intervals. Sampling based methods generate samples that can form multiple confidence intervals, albeit at a higher computational cost. △ Less

Submitted 12 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:2102.06684

arXiv:2102.06684 [pdf, other]

DeepGLEAM: A hybrid mechanistic and deep learning model for COVID-19 forecasting

Authors: Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yi-An Ma, Rose Yu

Abstract: We introduce DeepGLEAM, a hybrid model for COVID-19 forecasting. DeepGLEAM combines a mechanistic stochastic simulation model GLEAM with deep learning. It uses deep learning to learn the correction terms from GLEAM, which leads to improved performance. We further integrate various uncertainty quantification methods to generate confidence intervals. We demonstrate DeepGLEAM on real-world COVID-19 m… ▽ More We introduce DeepGLEAM, a hybrid model for COVID-19 forecasting. DeepGLEAM combines a mechanistic stochastic simulation model GLEAM with deep learning. It uses deep learning to learn the correction terms from GLEAM, which leads to improved performance. We further integrate various uncertainty quantification methods to generate confidence intervals. We demonstrate DeepGLEAM on real-world COVID-19 mortality forecasting tasks. △ Less

Submitted 23 March, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2012.04651 [pdf, other]

doi 10.1371/journal.pcbi.1009087

Predicting seasonal influenza using supermarket retail records

Authors: Ioanna Miliou, Xinyue Xiong, Salvatore Rinzivillo, Qian Zhang, Giulio Rossetti, Fosca Giannotti, Dino Pedreschi, Alessandro Vespignani

Abstract: Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy… ▽ More Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics. △ Less

Submitted 17 December, 2020; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: 17 pages, 2 figures, 4 tables (1 in appendix), 1 algorithm, submitted to PLOS Computational Biology

arXiv:2006.11913 [pdf, other]

Finding Patient Zero: Learning Contagion Source with Graph Neural Networks

Authors: Chintan Shah, Nima Dehmamy, Nicola Perra, Matteo Chinazzi, Albert-László Barabási, Alessandro Vespignani, Rose Yu

Abstract: Locating the source of an epidemic, or patient zero (P0), can provide critical insights into the infection's transmission course and allow efficient resource allocation. Existing methods use graph-theoretic centrality measures and expensive message-passing algorithms, requiring knowledge of the underlying dynamics and its parameters. In this paper, we revisit this problem using graph neural networ… ▽ More Locating the source of an epidemic, or patient zero (P0), can provide critical insights into the infection's transmission course and allow efficient resource allocation. Existing methods use graph-theoretic centrality measures and expensive message-passing algorithms, requiring knowledge of the underlying dynamics and its parameters. In this paper, we revisit this problem using graph neural networks (GNNs) to learn P0. We establish a theoretical limit for the identification of P0 in a class of epidemic models. We evaluate our method against different epidemic models on both synthetic and a real-world contact network considering a disease with history and characteristics of COVID-19. % We observe that GNNs can identify P0 close to the theoretical bound on accuracy, without explicit input of dynamics or its parameters. In addition, GNN is over 100 times faster than classic methods for inference on arbitrary graph topologies. Our theoretical bound also shows that the epidemic is like a ticking clock, emphasizing the importance of early contact-tracing. We find a maximum time after which accurate recovery of the source becomes impossible, regardless of the algorithm used. △ Less

Submitted 27 June, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

arXiv:2004.05222 [pdf]

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Authors: Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megías Jiménez, Anna Monreale , et al. (14 additional authors not shown)

Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri… ▽ More The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. △ Less

Submitted 16 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Revised text. Additional authors

Journal ref: Transactions on Data Privacy 13(1): 61-66 (2020), http://www.tdp.cat/issues16/abs.a389a20.php

arXiv:2004.04019 [pdf, other]

A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models

Authors: Dianbo Liu, Leonardo Clemente, Canelle Poirier, Xiyu Ding, Matteo Chinazzi, Jessica T Davis, Alessandro Vespignani, Mauricio Santillana

Abstract: We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from C… ▽ More We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers. △ Less

Submitted 8 April, 2020; originally announced April 2020.

arXiv:1802.05337 [pdf, other]

Link transmission centrality in large-scale social networks

Authors: Qian Zhang, Márton Karsai, Alessandro Vespignani

Abstract: Understanding the importance of links in transmitting information in a network can provide ways to hinder or postpone ongoing dynamical phenomena like the spreading of epidemic or the diffusion of information. In this work, we propose a new measure based on stochastic diffusion processes, the \textit{transmission centrality}, that captures the importance of links by estimating the average number o… ▽ More Understanding the importance of links in transmitting information in a network can provide ways to hinder or postpone ongoing dynamical phenomena like the spreading of epidemic or the diffusion of information. In this work, we propose a new measure based on stochastic diffusion processes, the \textit{transmission centrality}, that captures the importance of links by estimating the average number of nodes to whom they transfer information during a global spreading diffusion process. We propose a simple algorithmic solution to compute transmission centrality and to approximate it in very large networks at low computational cost. Finally we apply transmission centrality in the identification of weak ties in three large empirical social networks, showing that this metric outperforms other centrality measures in identifying links that drive spreading processes in a social network. △ Less

Submitted 14 February, 2018; originally announced February 2018.

Comments: 19 pages, 5 figures

arXiv:1509.08295 [pdf, ps, other]

doi 10.1093/comnet/cnv022

Detecting global bridges in networks

Authors: Pablo Jensen, Matteo Morini, Marton Karsai, Tommaso Venturini, Alessandro Vespignani, Mathieu Jacomy, Jean-Philippe Cointet, Pierre Merckle, Eric Fleury

Abstract: The identification of nodes occupying important positions in a network structure is crucial for the understanding of the associated real-world system. Usually, betweenness centrality is used to evaluate a node capacity to connect different graph regions. However, we argue here that this measure is not adapted for that task, as it gives equal weight to "local" centers (i.e. nodes of high degree cen… ▽ More The identification of nodes occupying important positions in a network structure is crucial for the understanding of the associated real-world system. Usually, betweenness centrality is used to evaluate a node capacity to connect different graph regions. However, we argue here that this measure is not adapted for that task, as it gives equal weight to "local" centers (i.e. nodes of high degree central to a single region) and to "global" bridges, which connect different communities. This distinction is important as the roles of such nodes are different in terms of the local and global organisation of the network structure. In this paper we propose a decomposition of betweenness centrality into two terms, one highlighting the local contributions and the other the global ones. We call the latter bridgeness centrality and show that it is capable to specifically spot out global bridges. In addition, we introduce an effective algorithmic implementation of this measure and demonstrate its capability to identify global bridges in air transportation and scientific collaboration networks. △ Less

Submitted 29 September, 2015; v1 submitted 28 September, 2015; originally announced September 2015.

Comments: Journal of Complex Networks Preprint; 14 pages; 6 figures

arXiv:1507.06106 [pdf, other]

doi 10.1126/sciadv.1501158

The dynamic of information-driven coordination phenomena: a transfer entropy analysis

Authors: Javier Borge-Holthoefer, Nicola Perra, Bruno Gonçalves, Sandra González-Bailón, Alex Arenas, Yamir Moreno, Alessandro Vespignani

Abstract: Data from social media are providing unprecedented opportunities to investigate the processes that rule the dynamics of collective social phenomena. Here, we consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of micro-blogging t… ▽ More Data from social media are providing unprecedented opportunities to investigate the processes that rule the dynamics of collective social phenomena. Here, we consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of micro-blogging time series to extract directed networks of influence among geolocalized sub-units in social systems. This methodology captures the emergence of system-level dynamics close to the onset of socially relevant collective phenomena. The framework is validated against a detailed empirical analysis of five case studies. In particular, we identify a change in the characteristic time-scale of the information transfer that flags the onset of information-driven collective phenomena. Furthermore, our approach identifies an order-disorder transition in the directed network of influence between social sub-units. In the absence of a clear exogenous driving, social collective phenomena can be represented as endogenously-driven structural transitions of the information transfer network. This study provides results that can help define models and predictive algorithms for the analysis of societal events based on open source data. △ Less

Submitted 22 July, 2015; originally announced July 2015.

Comments: 46 pages (main text: 16; SI: 30)

Journal ref: Science Advances 2(4) e1501158 (2016)

arXiv:1408.2701 [pdf, other]

doi 10.1103/RevModPhys.87.925

Epidemic processes in complex networks

Authors: Romualdo Pastor-Satorras, Claudio Castellano, Piet Van Mieghem, Alessandro Vespignani

Abstract: In recent years the research community has accumulated overwhelming evidence for the emergence of complex and heterogeneous connectivity patterns in a wide range of biological and sociotechnical systems. The complex properties of real-world networks have a profound impact on the behavior of equilibrium and nonequilibrium phenomena occurring in various systems, and the study of epidemic spreading i… ▽ More In recent years the research community has accumulated overwhelming evidence for the emergence of complex and heterogeneous connectivity patterns in a wide range of biological and sociotechnical systems. The complex properties of real-world networks have a profound impact on the behavior of equilibrium and nonequilibrium phenomena occurring in various systems, and the study of epidemic spreading is central to our understanding of the unfolding of dynamical processes in complex networks. The theoretical analysis of epidemic spreading in heterogeneous networks requires the development of novel analytical frameworks, and it has produced results of conceptual and practical relevance. A coherent and comprehensive review of the vast research activity concerning epidemic processes is presented, detailing the successful theoretical approaches as well as making their limits and assumptions clear. Physicists, mathematicians, epidemiologists, computer, and social scientists share a common interest in studying epidemic spreading and rely on similar models for the description of the diffusion of pathogens, knowledge, and innovation. For this reason, while focusing on the main results and the paradigmatic models in infectious disease modeling, the major results concerning generalized social contagion processes are also presented. Finally, the research activity at the forefront in the study of epidemic spreading in coevolving, coupled, and time-varying networks is reported. △ Less

Submitted 18 September, 2015; v1 submitted 12 August, 2014; originally announced August 2014.

Comments: 62 pages, 15 figures, final version

Journal ref: Rev. Mod. Phys. 87, 925 (2015)

arXiv:1309.7031 [pdf, other]

doi 10.1103/PhysRevLett.112.118702

Controlling Contagion Processes in Time-Varying Networks

Authors: Suyu Liu, Nicola Perra, Marton Karsai, Alessandro Vespignani

Abstract: The vast majority of strategies aimed at controlling contagion processes on networks considers the connectivity pattern of the system as either quenched or annealed. However, in the real world many networks are highly dynamical and evolve in time concurrently to the contagion process. Here, we derive an analytical framework for the study of control strategies specifically devised for time-varying… ▽ More The vast majority of strategies aimed at controlling contagion processes on networks considers the connectivity pattern of the system as either quenched or annealed. However, in the real world many networks are highly dynamical and evolve in time concurrently to the contagion process. Here, we derive an analytical framework for the study of control strategies specifically devised for time-varying networks. We consider the removal/immunization of individual nodes according the their activity in the network and develop a block variable mean-field approach that allows the derivation of the equations describing the evolution of the contagion process concurrently to the network dynamic. We derive the critical immunization threshold and assess the effectiveness of the control strategies. Finally, we validate the theoretical picture by simulating numerically the information spreading process and control strategies in both synthetic networks and a large-scale, real-world mobile telephone call dataset △ Less

Submitted 26 September, 2013; originally announced September 2013.

arXiv:1303.5966 [pdf, other]

doi 10.1038/srep04001

Time varying networks and the weakness of strong ties

Authors: Márton Karsai, Nicola Perra, Alessandro Vespignani

Abstract: In most social and information systems the activity of agents generates rapidly evolving time-varying networks. The temporal variation in networks' connectivity patterns and the ongoing dynamic processes are usually coupled in ways that still challenge our mathematical or computational modelling. Here we analyse a mobile call dataset and find a simple statistical law that characterize the temporal… ▽ More In most social and information systems the activity of agents generates rapidly evolving time-varying networks. The temporal variation in networks' connectivity patterns and the ongoing dynamic processes are usually coupled in ways that still challenge our mathematical or computational modelling. Here we analyse a mobile call dataset and find a simple statistical law that characterize the temporal evolution of users' egocentric networks. We encode this observation in a reinforcement process defining a time-varying network model that exhibits the emergence of strong and weak ties. We study the effect of time-varying and heterogeneous interactions on the classic rumour spreading model in both synthetic, and real-world networks. We observe that strong ties severely inhibit information diffusion by confining the spreading process among agents with recurrent communication patterns. This provides the counterintuitive evidence that strong ties may have a negative role in the spreading of information across networks. △ Less

Submitted 17 February, 2014; v1 submitted 24 March, 2013; originally announced March 2013.

Comments: 22 pages, 15 figures

Journal ref: Scientific Reports 4, 4001 (2014)

arXiv:1302.6569 [pdf, other]

doi 10.1038/srep01640

Characterizing scientific production and consumption in Physics

Authors: Qian Zhang, Nicola Perra, Bruno Goncalves, Fabio Ciulla, Alessandro Vespignani

Abstract: We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key citie… ▽ More We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key cities in the production and consumption of knowledge in Physics as a function of time. The results from the scientific production ranking algorithm allow us to characterize the top cities for scholarly research in Physics. Although we focus on a single dataset concerning a specific field, the methodology presented here opens the path to comparative studies of the dynamics of knowledge across disciplines and research areas △ Less

Submitted 26 February, 2013; originally announced February 2013.

Journal ref: Nature Scientific Reports 3, 1640 (2013)

arXiv:1212.5238 [pdf, other]

doi 10.1371/journal.pone.0061981

The Twitter of Babel: Map** World Languages through Microblogging Platforms

Authors: Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani

Abstract: Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of quest… ▽ More Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies" of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities. △ Less

Submitted 20 December, 2012; originally announced December 2012.

Journal ref: PLoS One 8, E61981 (2013)

arXiv:1205.4467 [pdf, other]

Beating the news using Social Media: the case study of American Idol

Authors: Fabio Ciulla, Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani

Abstract: We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period fol… ▽ More We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it, correlates with the contestants ranking and allows the anticipation of the voting outcome. Furthermore, the fraction of Tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators anticipating the future unfolding of opinion formation events. △ Less

Submitted 23 May, 2012; v1 submitted 20 May, 2012; originally announced May 2012.

Comments: 6 pages, 4 figures, 2 tables

arXiv:1203.5351 [pdf, other]

doi 10.1038/srep00469

Activity driven modeling of time varying networks

Authors: Nicola Perra, Bruno Gonçalves, Romualdo Pastor-Satorras, Alessandro Vespignani

Abstract: Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven. The structural patterns of the network are at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail… ▽ More Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven. The structural patterns of the network are at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail to describe the instantaneous and fluctuating dynamics of many networks. We address this challenge by defining the activity potential, a time invariant function characterizing the agents' interactions and constructing an activity driven model capable of encoding the instantaneous time description of the network dynamics. The model provides an explanation of structural features such as the presence of hubs, which simply originate from the heterogeneous activity of agents. Within this framework, highly dynamical networks can be described analytically, allowing a quantitative discussion of the biases induced by the time-aggregated representations in the analysis of dynamical processes. △ Less

Submitted 26 June, 2012; v1 submitted 23 March, 2012; originally announced March 2012.

Comments: 10 pages, 4 figures

Journal ref: Nature Scientific Reports 2, 469 (2012)

arXiv:1105.5170 [pdf, other]

doi 10.1371/journal.pone.0022656

Validation of Dunbar's number in Twitter conversations

Authors: Bruno Goncalves, Nicola Perra, Alessandro Vespignani

Abstract: Modern society's increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as Dunbar's… ▽ More Modern society's increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as Dunbar's number. We find that users can entertain a maximum of 100-200 stable relationships in support for Dunbar's prediction. The "economy of attention" is limited in the online world by cognitive and biological constraints as predicted by Dunbar's theory. Inspired by this empirical evidence we propose a simple dynamical mechanism, based on finite priority queuing and time resources, that reproduces the observed social behavior. △ Less

Submitted 28 May, 2011; v1 submitted 25 May, 2011; originally announced May 2011.

Comments: 8 pages, 6 figures

Journal ref: PLoS ONE 6(8): e22656 (2011)

arXiv:1007.3680 [pdf]

doi 10.1371/journal.pone.0011596

Dynamics of person-to-person interactions from distributed RFID sensor networks

Authors: Ciro Cattuto, Wouter Van den Broeck, Alain Barrat, Vittoria Colizza, Jean-François Pinton, Alessandro Vespignani

Abstract: Digital networks, mobile devices, and the possibility of mining the ever-increasing amount of digital traces that we leave behind in our daily activities are changing the way we can approach the study of human and social interactions. Large-scale datasets, however, are mostly available for collective and statistical behaviors, at coarse granularities, while high-resolution data on person-to-person… ▽ More Digital networks, mobile devices, and the possibility of mining the ever-increasing amount of digital traces that we leave behind in our daily activities are changing the way we can approach the study of human and social interactions. Large-scale datasets, however, are mostly available for collective and statistical behaviors, at coarse granularities, while high-resolution data on person-to-person interactions are generally limited to relatively small groups of individuals. Here we present a scalable experimental framework for gathering real-time data resolving face-to-face social interactions with tunable spatial and temporal granularities. We use active Radio Frequency Identification (RFID) devices that assess mutual proximity in a distributed fashion by exchanging low-power radio packets. We analyze the dynamics of person-to-person interaction networks obtained in three high-resolution experiments carried out at different orders of magnitude in community size. The data sets exhibit common statistical properties and lack of a characteristic time scale from 20 seconds to several hours. The association between the number of connections and their duration shows an interesting super-linear behavior, which indicates the possibility of defining super-connectors both in the number and intensity of connections. Taking advantage of scalability and resolution, this experimental framework allows the monitoring of social interactions, uncovering similarities in the way individuals interact in different contexts, and identifying patterns of super-connector behavior in the community. These results could impact our understanding of all phenomena driven by face-to-face interactions, such as the spreading of transmissible infectious diseases and information. △ Less

Submitted 21 July, 2010; originally announced July 2010.

Comments: see also http://www.sociopatterns.org

Journal ref: PLoS ONE 5(7): e11596 (2010)

arXiv:1005.2704 [pdf, other]

doi 10.1103/PhysRevLett.105.158701

Characterizing and modeling the dynamics of online popularity

Authors: Jacob Ratkiewicz, Filippo Menczer, Santo Fortunato, Alessandro Flammini, Alessandro Vespignani

Abstract: Online popularity has enormous impact on opinions, culture, policy, and profits. We provide a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two massive model systems, the Wikipedia and an entire country's Web space. We find that the dynamics of popularity are characterized by bursts, displaying characteristic features of critical systems such as fat-t… ▽ More Online popularity has enormous impact on opinions, culture, policy, and profits. We provide a quantitative, large scale, temporal analysis of the dynamics of online content popularity in two massive model systems, the Wikipedia and an entire country's Web space. We find that the dynamics of popularity are characterized by bursts, displaying characteristic features of critical systems such as fat-tailed distributions of magnitude and inter-event time. We propose a minimal model combining the classic preferential popularity increase mechanism with the occurrence of random popularity shifts due to exogenous factors. The model recovers the critical features observed in the empirical analysis of the systems analyzed here, highlighting the key factors needed in the description of popularity dynamics. △ Less

Submitted 10 October, 2010; v1 submitted 15 May, 2010; originally announced May 2010.

Comments: 5 pages, 4 figures. Modeling part detailed. Final version published in Physical Review Letters

Journal ref: Physical Review Letters 105, 158701 (2010)

arXiv:0907.1050 [pdf, ps, other]

doi 10.1103/PhysRevE.80.056103

Diffusion of scientific credits and the ranking of scientists

Authors: Filippo Radicchi, Santo Fortunato, Benjamin Markines, Alessandro Vespignani

Abstract: Recently, the abundance of digital data enabled the implementation of graph based ranking algorithms that provide system level analysis for ranking publications and authors. Here we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors' networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechan… ▽ More Recently, the abundance of digital data enabled the implementation of graph based ranking algorithms that provide system level analysis for ranking publications and authors. Here we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors' networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of Physics. A web site where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.org △ Less

Submitted 23 September, 2009; v1 submitted 6 July, 2009; originally announced July 2009.

Comments: Revised version. 11 pages, 10 figures, 1 table. The portal to compute the rankings of scientists is at http://www.physauthorsrank.org

Journal ref: Phys. Rev. E 80, 056103 (2009)

arXiv:0904.2389 [pdf, other]

doi 10.1073/pnas.0808904106

Extracting the multiscale backbone of complex weighted networks

Authors: M. Angeles Serrano, Marian Boguna, Alessandro Vespignani

Abstract: A large number of complex systems find a natural abstraction in the form of weighted networks whose nodes represent the elements of the system and the weighted edges identify the presence of an interaction and its relative strength. In recent years, the study of an increasing number of large scale networks has highlighted the statistical heterogeneity of their interaction pattern, with degree an… ▽ More A large number of complex systems find a natural abstraction in the form of weighted networks whose nodes represent the elements of the system and the weighted edges identify the presence of an interaction and its relative strength. In recent years, the study of an increasing number of large scale networks has highlighted the statistical heterogeneity of their interaction pattern, with degree and weight distributions which vary over many orders of magnitude. These features, along with the large number of elements and links, make the extraction of the truly relevant connections forming the network's backbone a very challenging problem. More specifically, coarse-graining approaches and filtering techniques are at struggle with the multiscale nature of large scale systems. Here we define a filtering method that offers a practical procedure to extract the relevant connection backbone in complex multiscale networks, preserving the edges that represent statistical significant deviations with respect to a null model for the local assignment of weights to edges. An important aspect of the method is that it does not belittle small-scale interactions and operates at all scales defined by the weight distribution. We apply our method to real world network instances and compare the obtained results with alternative backbone extraction techniques. △ Less

Submitted 15 April, 2009; originally announced April 2009.

Journal ref: Proc. Natl. Acad. Sci. USA 106, 6483-6488 (2009)

arXiv:0811.4170 [pdf, other]

doi 10.1371/journal.pone.0011596

High resolution dynamical map** of social interactions with active RFID

Authors: Alain Barrat, Ciro Cattuto, Vittoria Colizza, Jean-Francois Pinton, Wouter Van den Broeck, Alessandro Vespignani

Abstract: In this paper we present an experimental framework to gather data on face-to-face social interactions between individuals, with a high spatial and temporal resolution. We use active Radio Frequency Identification (RFID) devices that assess contacts with one another by exchanging low-power radio packets. When individuals wear the beacons as a badge, a persistent radio contact between the RFID dev… ▽ More In this paper we present an experimental framework to gather data on face-to-face social interactions between individuals, with a high spatial and temporal resolution. We use active Radio Frequency Identification (RFID) devices that assess contacts with one another by exchanging low-power radio packets. When individuals wear the beacons as a badge, a persistent radio contact between the RFID devices can be used as a proxy for a social interaction between individuals. We present the results of a pilot study recently performed during a conference, and a subsequent preliminary data analysis, that provides an assessment of our method and highlights its versatility and applicability in many areas concerned with human dynamics. △ Less

Submitted 25 November, 2008; v1 submitted 25 November, 2008; originally announced November 2008.

Journal ref: PLoS ONE 5(7): e11596 (2010)

arXiv:0706.3146 [pdf, other]

doi 10.1073/pnas.0811973106

WiFi Epidemiology: Can Your Neighbors' Router Make Yours Sick?

Authors: Hao Hu, Steven Myers, Vittoria Colizza, Alessandro Vespignani

Abstract: In densely populated urban areas WiFi routers form a tightly interconnected proximity network that can be exploited as a substrate for the spreading of malware able to launch massive fraudulent attack and affect entire urban areas WiFi networks. In this paper we consider several scenarios for the deployment of malware that spreads solely over the wireless channel of major urban areas in the US.… ▽ More In densely populated urban areas WiFi routers form a tightly interconnected proximity network that can be exploited as a substrate for the spreading of malware able to launch massive fraudulent attack and affect entire urban areas WiFi networks. In this paper we consider several scenarios for the deployment of malware that spreads solely over the wireless channel of major urban areas in the US. We develop an epidemiological model that takes into consideration prevalent security flaws on these routers. The spread of such a contagion is simulated on real-world data for geo-referenced wireless routers. We uncover a major weakness of WiFi networks in that most of the simulated scenarios show tens of thousands of routers infected in as little time as two weeks, with the majority of the infections occurring in the first 24 to 48 hours. We indicate possible containment and prevention measure to limit the eventual harm of such an attack. △ Less

Submitted 21 June, 2007; originally announced June 2007.

Comments: 22 pages, 1 table, 4 figures

Journal ref: Proceedings of the National Academy of Sciences, vol. 106, no. 5, 1318-1323 (2009)

arXiv:cs/0612040 [pdf, ps, other]

doi 10.1145/1198255.1198267

The Workshop on Internet Topology (WIT) Report

Authors: Dmitri Krioukov, Fan Chung, kc claffy, Marina Fomenkov, Alessandro Vespignani, Walter Willinger

Abstract: Internet topology analysis has recently experienced a surge of interest in computer science, physics, and the mathematical sciences. However, researchers from these different disciplines tend to approach the same problem from different angles. As a result, the field of Internet topology analysis and modeling must untangle sets of inconsistent findings, conflicting claims, and contradicting state… ▽ More Internet topology analysis has recently experienced a surge of interest in computer science, physics, and the mathematical sciences. However, researchers from these different disciplines tend to approach the same problem from different angles. As a result, the field of Internet topology analysis and modeling must untangle sets of inconsistent findings, conflicting claims, and contradicting statements. On May 10-12, 2006, CAIDA hosted the Workshop on Internet topology (WIT). By bringing together a group of researchers spanning the areas of computer science, physics, and the mathematical sciences, the workshop aimed to improve communication across these scientific disciplines, enable interdisciplinary crossfertilization, identify commonalities in the different approaches, promote synergy where it exists, and utilize the richness that results from exploring similar problems from multiple perspectives. This report describes the findings of the workshop, outlines a set of relevant open research problems identified by participants, and concludes with recommendations that can benefit all scientific communities interested in Internet topology research. △ Less

Submitted 7 December, 2006; originally announced December 2006.

ACM Class: C.2.5; C.2.1

Journal ref: ACM SIGCOMM Computer Communication Review (CCR), v.37, n.1, p.69-73, 2007

arXiv:cs/0511035 [pdf, ps, other]

Decoding the structure of the WWW: facts versus sampling biases

Authors: M. Angeles Serrano, Ana Maguitman, Marian Boguna, Santo Fortunato, Alessandro Vespignani

Abstract: The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by se… ▽ More The understanding of the immense and intricate topological structure of the World Wide Web (WWW) is a major scientific and technological challenge. This has been tackled recently by characterizing the properties of its representative graphs in which vertices and directed edges are identified with web-pages and hyperlinks, respectively. Data gathered in large scale crawls have been analyzed by several groups resulting in a general picture of the WWW that encompasses many of the complex properties typical of rapidly evolving networks. In this paper, we report a detailed statistical analysis of the topological properties of four different WWW graphs obtained with different crawlers. We find that, despite the very large size of the samples, the statistical measures characterizing these graphs differ quantitatively, and in some cases qualitatively, depending on the domain analyzed and the crawl used for gathering the data. This spurs the issue of the presence of sampling biases and structural differences of Web crawls that might induce properties not representative of the actual global underlying graph. In order to provide a more accurate characterization of the Web graph and identify observables which are clearly discriminating with respect to the sampling process, we study the behavior of degree-degree correlation functions and the statistics of reciprocal connections. The latter appears to enclose the relevant correlations of the WWW graph and carry most of the topological information of theWeb. The analysis of this quantity is also of major interest in relation to the navigability and searchability of the Web. △ Less

Submitted 14 February, 2006; v1 submitted 8 November, 2005; originally announced November 2005.

Comments: 10 pages 19 figures. Values in Table 2 and Figure 1 corrected. Figure 7 updated. Minor changes in the text

ACM Class: H.4.m; G.3

Journal ref: ACM Transactions on the Web (TWEB) 1, 10 (2007)

arXiv:cs/0511007 [pdf, ps, other]

K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases

Authors: José Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, Alessandro Vespignani

Abstract: We consider the $k$-core decomposition of network models and Internet graphs at the autonomous system (AS) level. The $k$-core analysis allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. We compare the $k$-core structure obtained for AS graphs with those of several network models and dis… ▽ More We consider the $k$-core decomposition of network models and Internet graphs at the autonomous system (AS) level. The $k$-core analysis allows to characterize networks beyond the degree distribution and uncover structural properties and hierarchies due to the specific architecture of the system. We compare the $k$-core structure obtained for AS graphs with those of several network models and discuss the differences and similarities with the real Internet architecture. The presence of biases and the incompleteness of the real maps are discussed and their effect on the $k$-core analysis is assessed with numerical experiments simulating biased exploration on a wide range of network models. We find that the $k$-core analysis provides an interesting characterization of the fluctuations and incompleteness of maps as well as information hel** to discriminate the original underlying structure. △ Less

Submitted 16 April, 2008; v1 submitted 2 November, 2005; originally announced November 2005.

Journal ref: Networks and Heterogeneous Media 3 (2008) 371

arXiv:cs/0511005 [pdf, ps, other]

doi 10.1073/pnas.0605525103

The egalitarian effect of search engines

Authors: Santo Fortunato, Alessandro Flammini, Filippo Menczer, Alessandro Vespignani

Abstract: Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular… ▽ More Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web. △ Less

Submitted 23 August, 2006; v1 submitted 1 November, 2005; originally announced November 2005.

Comments: 9 pages, 8 figures, 2 appendices. The final version of this e-print has been published on the Proc. Natl. Acad. Sci. USA 103(34), 12684-12689 (2006), http://www.pnas.org/cgi/content/abstract/103/34/12684

ACM Class: H.3.3; H.3.4; H.3.5; H.5.4; K.4.m

arXiv:cs/0504107 [pdf, ps, other]

k-core decomposition: a tool for the visualization of large scale networks

Authors: José Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, Alessandro Vespignani

Abstract: We use the k-core decomposition to visualize large scale complex networks in two dimensions. This decomposition, based on a recursive pruning of the least connected vertices, allows to disentangle the hierarchical structure of networks by progressively focusing on their central cores. By using this strategy we develop a general visualization algorithm that can be used to compare the structural p… ▽ More We use the k-core decomposition to visualize large scale complex networks in two dimensions. This decomposition, based on a recursive pruning of the least connected vertices, allows to disentangle the hierarchical structure of networks by progressively focusing on their central cores. By using this strategy we develop a general visualization algorithm that can be used to compare the structural properties of various networks and highlight their hierarchical structure. The low computational complexity of the algorithm, O(n+e), where 'n' is the size of the network, and 'e' is the number of edges, makes it suitable for the visualization of very large sparse networks. We apply the proposed visualization tool to several real and synthetic graphs, showing its utility in finding specific structural fingerprints of computer generated and real world networks. △ Less

Submitted 12 October, 2005; v1 submitted 28 April, 2005; originally announced April 2005.

Journal ref: Advances in Neural Information Processing Systems 18, Canada (2006) 41

arXiv:cs/0412007 [pdf, ps, other]

doi 10.1016/j.tcs.2005.12.009

Exploring networks with traceroute-like probes: theory and simulations

Authors: Luca Dall'Asta, Ignacio Alvarez-Hamelin, Alain Barrat, Alexei Vazquez, Alessandro Vespignani

Abstract: Map** the Internet generally consists in sampling the network from a limited set of sources by using traceroute-like probes. This methodology, akin to the merging of different spanning trees to a set of destination, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. In this paper… ▽ More Map** the Internet generally consists in sampling the network from a limited set of sources by using traceroute-like probes. This methodology, akin to the merging of different spanning trees to a set of destination, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. In this paper we explore these biases and provide a statistical analysis of their origin. We derive an analytical approximation for the probability of edge and vertex detection that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph. In particular, we find that the edge and vertex detection probability depends on the betweenness centrality of each element. This allows us to show that shortest path routed sampling provides a better characterization of underlying graphs with broad distributions of connectivity. We complement the analytical discussion with a throughout numerical investigation of simulated map** strategies in network models with different topologies. We show that sampled graphs provide a fair qualitative characterization of the statistical properties of the original networks in a fair range of different strategies and exploration parameters. Moreover, we characterize the level of redundancy and completeness of the exploration process as a function of the topological properties of the network. Finally, we study numerically how the fraction of vertices and edges discovered in the sampled graph depends on the particular deployements of probing sources. The results might hint the steps toward more efficient map** strategies. △ Less

Submitted 2 December, 2004; originally announced December 2004.

Comments: This paper is related to cond-mat/0406404, with explorations of different networks and complementary discussions

Journal ref: Theoretical Computer Science 355 (2006) 6

arXiv:cond-mat/0406404 [pdf, ps, other]

A statistical approach to the traceroute-like exploration of networks: theory and simulations

Authors: Luca Dall'Asta, Ignacio Alvarez-Hamelin, Alain Barrat, Alexei Vazquez, Alessandro Vespignani

Abstract: Map** the Internet generally consists in sampling the network from a limited set of sources by using "traceroute"-like probes. This methodology, akin to the merging of different spanning trees to a set of destinations, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. Here we exp… ▽ More Map** the Internet generally consists in sampling the network from a limited set of sources by using "traceroute"-like probes. This methodology, akin to the merging of different spanning trees to a set of destinations, has been argued to introduce uncontrolled sampling biases that might produce statistical properties of the sampled graph which sharply differ from the original ones. Here we explore these biases and provide a statistical analysis of their origin. We derive a mean-field analytical approximation for the probability of edge and vertex detection that exploits the role of the number of sources and targets and allows us to relate the global topological properties of the underlying network with the statistical accuracy of the sampled graph. In particular we find that the edge and vertex detection probability is depending on the betweenness centrality of each element. This allows us to show that shortest path routed sampling provides a better characterization of underlying graphs with scale-free topology. We complement the analytical discussion with a throughout numerical investigation of simulated map** strategies in different network models. We show that sampled graphs provide a fair qualitative characterization of the statistical properties of the original networks in a fair range of different strategies and exploration parameters. The numerical study also allows the identification of intervals of the exploration parameters that optimize the fraction of nodes and edges discovered in the sampled graph. This finding might hint the steps toward more efficient map** strategies. △ Less

Submitted 22 June, 2004; v1 submitted 17 June, 2004; originally announced June 2004.

Journal ref: CAAN 2004, LNCS 3405, p. 140 (2005) .

arXiv:cs/0405070 [pdf, ps, other]

Traffic-driven model of the World Wide Web graph

Authors: Alain Barrat, Marc Barthelemy, Alessandro Vespignani

Abstract: We propose a model for the World Wide Web graph that couples the topological growth with the traffic's dynamical evolution. The model is based on a simple traffic-driven dynamics and generates weighted directed graphs exhibiting the statistical properties observed in the Web. In particular, the model yields a non-trivial time evolution of vertices and heavy-tail distributions for the topological… ▽ More We propose a model for the World Wide Web graph that couples the topological growth with the traffic's dynamical evolution. The model is based on a simple traffic-driven dynamics and generates weighted directed graphs exhibiting the statistical properties observed in the Web. In particular, the model yields a non-trivial time evolution of vertices and heavy-tail distributions for the topological and traffic properties. The generated graphs exhibit a complex architecture with a hierarchy of cohesiveness levels similar to those observed in the analysis of real data. △ Less

Submitted 24 May, 2004; v1 submitted 20 May, 2004; originally announced May 2004.

Journal ref: LNCS 3243, 56 (2004)

arXiv:cond-mat/0206084 [pdf, ps, other]

Internet topology at the router and autonomous system level

Authors: A. Vazquez, R. Pastor-Satorras, A. Vespignani

Abstract: We present a statistical analysis of different metrics characterizing the topological properties of Internet maps, collected at two different resolution scales: the router and the autonomous system level. The metrics we consider allow us to confirm the presence of scale-free signatures in several statistical distributions, as well as to show in a quantitative way the hierarchical nature of the I… ▽ More We present a statistical analysis of different metrics characterizing the topological properties of Internet maps, collected at two different resolution scales: the router and the autonomous system level. The metrics we consider allow us to confirm the presence of scale-free signatures in several statistical distributions, as well as to show in a quantitative way the hierarchical nature of the Internet. Our findings are relevant for the development of more accurate Internet topology generators, which should include, along with the scale-free properties of the connectivity distribution, the hierarchical signatures unveiled in the present work. △ Less

Submitted 6 June, 2002; originally announced June 2002.

Comments: 8 pages, 6 figures, ACM style package

Showing 1–35 of 35 results for author: Vespignani, A