-
Fast unfolding of communities in large networks: 15 years later
Authors:
Vincent Blondel,
Jean-Loup Guillaume,
Renaud Lambiotte
Abstract:
The Louvain method was proposed 15 years ago as a heuristic method for the fast detection of communities in large networks. During this period, it has emerged as one of the most popular methods for community detection, the task of partitioning vertices of a network into dense groups, usually called communities or clusters. Here, after a short introduction to the method, we give an overview of the…
▽ More
The Louvain method was proposed 15 years ago as a heuristic method for the fast detection of communities in large networks. During this period, it has emerged as one of the most popular methods for community detection, the task of partitioning vertices of a network into dense groups, usually called communities or clusters. Here, after a short introduction to the method, we give an overview of the different generalizations and modifications that have been proposed in the literature, and also survey the quality functions, beyond modularity, for which it has been implemented.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Clean up or mess up: the effect of sampling biases on measurements of degree distributions in mobile phone datasets
Authors:
Adeline Decuyper,
Arnaud Browet,
Vincent Traag,
Vincent D. Blondel,
Jean-Charles Delvenne
Abstract:
Mobile phone data have been extensively used in the recent years to study social behavior. However, most of these studies are based on only partial data whose coverage is limited both in space and time. In this paper, we point to an observation that the bias due to the limited coverage in time may have an important influence on the results of the analyses performed. In particular, we observe signi…
▽ More
Mobile phone data have been extensively used in the recent years to study social behavior. However, most of these studies are based on only partial data whose coverage is limited both in space and time. In this paper, we point to an observation that the bias due to the limited coverage in time may have an important influence on the results of the analyses performed. In particular, we observe significant differences, both qualitatively and quantitatively, in the degree distribution of the network, depending on the way the dataset is pre-processed and we present a possible explanation for the emergence of Double Pareto LogNormal (DPLN) degree distributions in temporal data.
△ Less
Submitted 29 September, 2016;
originally announced September 2016.
-
Modelling influence and opinion evolution in online collective behaviour
Authors:
Corentin Vande Kerckhove,
Samuel Martin,
Pascal Gend,
Peter J. Rentfrow,
Julien M. Hendrickx,
Vincent D. Blondel
Abstract:
Opinion evolution and judgment revision are mediated through social influence. Based on a large crowdsourced in vitro experiment (n=861), it is shown how a consensus model can be used to predict opinion evolution in online collective behaviour. It is the first time the predictive power of a quantitative model of opinion dynamics is tested against a real dataset. Unlike previous research on the top…
▽ More
Opinion evolution and judgment revision are mediated through social influence. Based on a large crowdsourced in vitro experiment (n=861), it is shown how a consensus model can be used to predict opinion evolution in online collective behaviour. It is the first time the predictive power of a quantitative model of opinion dynamics is tested against a real dataset. Unlike previous research on the topic, the model was validated on data which did not serve to calibrate it. This avoids to favor more complex models over more simple ones and prevents overfitting. The model is parametrized by the influenceability of each individual, a factor representing to what extent individuals incorporate external judgments. The prediction accuracy depends on prior knowledge on the participants' past behaviour. Several situations reflecting data availability are compared. When the data is scarce, the data from previous participants is used to predict how a new participant will behave. Judgment revision includes unpredictable variations which limit the potential for prediction. A first measure of unpredictability is proposed. The measure is based on a specific control experiment. More than two thirds of the prediction errors are found to occur due to unpredictability of the human judgment revision process rather than to model imperfection.
△ Less
Submitted 3 June, 2016; v1 submitted 9 November, 2015;
originally announced November 2015.
-
Markov modeling of online inter-arrival times
Authors:
Corentin Vande Kerckhove,
Balázs Gerencsér,
Julien M. Hendrickx,
Vincent D. Blondel
Abstract:
In this paper, we investigate the arising communication patterns on social media, and in particular the series of events happening for a single user. While the distribution of inter-event times is often assimilated to power-law density functions, a debate persists on the nature of an underlying model that explains the observed distribution. In the present, we propose an intuitive explanation to un…
▽ More
In this paper, we investigate the arising communication patterns on social media, and in particular the series of events happening for a single user. While the distribution of inter-event times is often assimilated to power-law density functions, a debate persists on the nature of an underlying model that explains the observed distribution. In the present, we propose an intuitive explanation to understand the observed dependence of subsequent waiting times. Our contribution is twofold. The first idea consists of separating the short waiting times -- out of scope for power-law distributions -- from the long ones. The model is further enhanced by introducing a two-state Markovian process to incorporate memory.
△ Less
Submitted 7 December, 2018; v1 submitted 16 September, 2015;
originally announced September 2015.
-
Sensitivity analysis of a branching process evolving on a network with application in epidemiology
Authors:
Sophie Hautphenne,
Gautier Krings,
Jean-Charles Delvenne,
Vincent D. Blondel
Abstract:
We perform an analytical sensitivity analysis for a model of a continuous-time branching process evolving on a fixed network. This allows us to determine the relative importance of the model parameters to the growth of the population on the network. We then apply our results to the early stages of an influenza-like epidemic spreading among a set of cities connected by air routes in the United Stat…
▽ More
We perform an analytical sensitivity analysis for a model of a continuous-time branching process evolving on a fixed network. This allows us to determine the relative importance of the model parameters to the growth of the population on the network. We then apply our results to the early stages of an influenza-like epidemic spreading among a set of cities connected by air routes in the United States. We also consider vaccination and analyze the sensitivity of the total size of the epidemic with respect to the fraction of vaccinated people. Our analysis shows that the epidemic growth is more sensitive with respect to transmission rates within cities than travel rates between cities. More generally, we highlight the fact that branching processes offer a powerful stochastic modeling tool with analytical formulas for sensitivity which are easy to use in practice.
△ Less
Submitted 6 September, 2015;
originally announced September 2015.
-
A survey of results on mobile phone datasets analysis
Authors:
Vincent D. Blondel,
Adeline Decuyper,
Gautier Krings
Abstract:
In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We will survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographica…
▽ More
In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We will survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.
△ Less
Submitted 11 February, 2015;
originally announced February 2015.
-
Estimating Food Consumption and Poverty Indices with Mobile Phone Data
Authors:
Adeline Decuyper,
Alex Rutherford,
Amit Wadhwa,
Jean-Martin Bauer,
Gautier Krings,
Thoralf Gutierrez,
Vincent D. Blondel,
Miguel A. Luengo-Oroz
Abstract:
Recent studies have shown the value of mobile phone data to tackle problems related to economic development and humanitarian action. In this research, we assess the suitability of indicators derived from mobile phone data as a proxy for food security indicators. We compare the measures extracted from call detail records and airtime credit purchases to the results of a nationwide household survey c…
▽ More
Recent studies have shown the value of mobile phone data to tackle problems related to economic development and humanitarian action. In this research, we assess the suitability of indicators derived from mobile phone data as a proxy for food security indicators. We compare the measures extracted from call detail records and airtime credit purchases to the results of a nationwide household survey conducted at the same time. Results show high correlations (> .8) between mobile phone data derived indicators and several relevant food security variables such as expenditure on food or vegetable consumption. This correspondence suggests that, in the future, proxies derived from mobile phone data could be used to provide valuable up-to-date operational information on food security throughout low and middle income countries.
△ Less
Submitted 22 November, 2014;
originally announced December 2014.
-
Group colocation behavior in technological social networks
Authors:
Chloë Brown,
Neal Lathia,
Anastasios Noulas,
Cecilia Mascolo,
Vincent Blondel
Abstract:
We analyze two large datasets from technological networks with location and social data: user location records from an online location-based social networking service, and anonymized telecommunications data from a European cellphone operator, in order to investigate the differences between individual and group behavior with respect to physical location. We discover agreements between the two datas…
▽ More
We analyze two large datasets from technological networks with location and social data: user location records from an online location-based social networking service, and anonymized telecommunications data from a European cellphone operator, in order to investigate the differences between individual and group behavior with respect to physical location. We discover agreements between the two datasets: firstly, that individuals are more likely to meet with one friend at a place they have not visited before, but tend to meet at familiar locations when with a larger group. We also find that groups of individuals are more likely to meet at places that their other friends have visited, and that the type of a place strongly affects the propensity for groups to meet there. These differences between group and solo mobility has potential technological applications, for example, in venue recommendation in location-based social networks.
△ Less
Submitted 8 August, 2014; v1 submitted 7 August, 2014;
originally announced August 2014.
-
D4D-Senegal: The Second Mobile Phone Data for Development Challenge
Authors:
Yves-Alexandre de Montjoye,
Zbigniew Smoreda,
Romain Trinquart,
Cezary Ziemlicki,
Vincent D. Blondel
Abstract:
The D4D-Senegal challenge is an open innovation data challenge on anonymous call patterns of Orange's mobile phone users in Senegal. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Senegalese population. Participants to the challenge are given access to three mobile phone datasets. This…
▽ More
The D4D-Senegal challenge is an open innovation data challenge on anonymous call patterns of Orange's mobile phone users in Senegal. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Senegalese population. Participants to the challenge are given access to three mobile phone datasets. This paper describes the three datasets. The datasets are based on Call Detail Records (CDR) of phone calls and text exchanges between more than 9 million of Orange's customers in Senegal between January 1, 2013 to December 31, 2013. The datasets are: (1) antenna-to-antenna traffic for 1666 antennas on an hourly basis, (2) fine-grained mobility data on a rolling 2-week basis for a year with bandicoot behavioral indicators at individual level for about 300,000 randomly sampled users, (3) one year of coarse-grained mobility data at arrondissement level with bandicoot behavioral indicators at individual level for about 150,000 randomly sampled users
△ Less
Submitted 30 July, 2014; v1 submitted 18 July, 2014;
originally announced July 2014.
-
Career on the Move: Geography, Stratification, and Scientific Impact
Authors:
Pierre Deville,
Dashun Wang,
Roberta Sinatra,
Chaoming Song,
Vincent D. Blondel,
Albert-Laszlo Barabasi
Abstract:
Changing institutions is an integral part of an academic life. Yet little is known about the mobility patterns of scientists at an institutional level and how these career choices affect scientific outcomes. Here, we examine over 420,000 papers, to track the affiliation information of individual scientists, allowing us to reconstruct their career trajectories over decades. We find that career move…
▽ More
Changing institutions is an integral part of an academic life. Yet little is known about the mobility patterns of scientists at an institutional level and how these career choices affect scientific outcomes. Here, we examine over 420,000 papers, to track the affiliation information of individual scientists, allowing us to reconstruct their career trajectories over decades. We find that career movements are not only temporally and spatially localized, but also characterized by a high degree of stratification in institutional ranking. When cross-group movement occurs, we find that while going from elite to lower-rank institutions on average associates with modest decrease in scientific performance, transitioning into elite institutions does not result in subsequent performance gain. These results offer empirical evidence on institutional level career choices and movements and have potential implications for science policy.
△ Less
Submitted 24 April, 2014;
originally announced April 2014.
-
On the use of human mobility proxy for the modeling of epidemics
Authors:
Michele Tizzoni,
Paolo Bajardi,
Adeline Decuyper,
Guillaume Kon Kam King,
Christian M. Schneider,
Vincent Blondel,
Zbigniew Smoreda,
Marta C. González,
Vittoria Colizza
Abstract:
Human mobility is a key component of large-scale spatial-transmission models of infectious diseases. Correctly modeling and quantifying human mobility is critical for improving epidemic control policies, but may be hindered by incomplete data in some regions of the world. Here we explore the opportunity of using proxy data or models for individual mobility to describe commuting movements and predi…
▽ More
Human mobility is a key component of large-scale spatial-transmission models of infectious diseases. Correctly modeling and quantifying human mobility is critical for improving epidemic control policies, but may be hindered by incomplete data in some regions of the world. Here we explore the opportunity of using proxy data or models for individual mobility to describe commuting movements and predict the diffusion of infectious disease. We consider three European countries and the corresponding commuting networks at different resolution scales obtained from official census surveys, from proxy data for human mobility extracted from mobile phone call records, and from the radiation model calibrated with census data. Metapopulation models defined on the three countries and integrating the different mobility layers are compared in terms of epidemic observables. We show that commuting networks from mobile phone data well capture the empirical commuting patterns, accounting for more than 87% of the total fluxes. The distributions of commuting fluxes per link from both sources of data - mobile phones and census - are similar and highly correlated, however a systematic overestimation of commuting traffic in the mobile phone data is observed. This leads to epidemics that spread faster than on census commuting networks, however preserving the order of infection of newly infected locations. Match in the epidemic invasion pattern is sensitive to initial conditions: the radiation model shows higher accuracy with respect to mobile phone data when the seed is central in the network, while the mobile phone proxy performs better for epidemics seeded in peripheral locations. Results suggest that different proxies can be used to approximate commuting patterns across different resolution scales in spatial epidemic simulations, in light of the desired accuracy in the epidemic outcome under study.
△ Less
Submitted 27 May, 2014; v1 submitted 27 September, 2013;
originally announced September 2013.
-
Partition-Merge: Distributed Inference and Modularity Optimization
Authors:
Vincent Blondel,
Kyomin Jung,
Pushmeet Kohli,
Devavrat Shah
Abstract:
This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global s…
▽ More
This paper presents a novel meta algorithm, Partition-Merge (PM), which takes existing centralized algorithms for graph computation and makes them distributed and faster. In a nutshell, PM divides the graph into small subgraphs using our novel randomized partitioning scheme, runs the centralized algorithm on each partition separately, and then stitches the resulting solutions to produce a global solution. We demonstrate the efficiency of the PM algorithm on two popular problems: computation of Maximum A Posteriori (MAP) assignment in an arbitrary pairwise Markov Random Field (MRF), and modularity optimization for community detection. We show that the resulting distributed algorithms for these problems essentially run in time linear in the number of nodes in the graph, and perform as well -- or even better -- than the original centralized algorithm as long as the graph has geometric structures. Here we say a graph has geometric structures, or polynomial growth property, when the number of nodes within distance r of any given node grows no faster than a polynomial function of r. More precisely, if the centralized algorithm is a C-factor approximation with constant C \ge 1, the resulting distributed algorithm is a (C+δ)-factor approximation for any small δ>0; but if the centralized algorithm is a non-constant (e.g. logarithmic) factor approximation, then the resulting distributed algorithm becomes a constant factor approximation. For general graphs, we compute explicit bounds on the loss of performance of the resulting distributed algorithm with respect to the centralized algorithm.
△ Less
Submitted 24 September, 2013;
originally announced September 2013.
-
Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets
Authors:
Thoralf Gutierrez,
Gautier Krings,
Vincent D. Blondel
Abstract:
Reliable statistical information is important to make political decisions on a sound basis and to help measure the impact of policies. Unfortunately, statistics offices in develo** countries have scarce resources and statistical censuses are therefore conducted sporadically. Based on mobile phone communications and history of airtime credit purchases, we estimate the relative income of individua…
▽ More
Reliable statistical information is important to make political decisions on a sound basis and to help measure the impact of policies. Unfortunately, statistics offices in develo** countries have scarce resources and statistical censuses are therefore conducted sporadically. Based on mobile phone communications and history of airtime credit purchases, we estimate the relative income of individuals, the diversity and inequality of income, and an indicator for socioeconomic segregation for fine-grained regions of an African country. Our study shows how to use mobile phone datasets as a starting point to understand the socio-economic state of a country, which can be especially useful in countries with few resources to conduct large surveys.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
A place-focused model for social networks in cities
Authors:
Chloë Brown,
Anastasios Noulas,
Cecilia Mascolo,
Vincent Blondel
Abstract:
The focused organization theory of social ties proposes that the structure of human social networks can be arranged around extra-network foci, which can include shared physical spaces such as homes, workplaces, restaurants, and so on. Until now, this has been difficult to investigate on a large scale, but the huge volume of data available from online location-based social services now makes it pos…
▽ More
The focused organization theory of social ties proposes that the structure of human social networks can be arranged around extra-network foci, which can include shared physical spaces such as homes, workplaces, restaurants, and so on. Until now, this has been difficult to investigate on a large scale, but the huge volume of data available from online location-based social services now makes it possible to examine the friendships and mobility of many thousands of people, and to investigate the relationship between meetings at places and the structure of the social network. In this paper, we analyze a large dataset from Foursquare, the most popular online location-based social network. We examine the properties of city-based social networks, finding that they have common structural properties, and that the category of place where two people meet has very strong influence on the likelihood of their being friends. Inspired by these observations in combination with the focused organization theory, we then present a model to generate city-level social networks, and show that it produces networks with the structural properties seen in empirical data.
△ Less
Submitted 12 August, 2013;
originally announced August 2013.
-
On Primitivity of Sets of Matrices
Authors:
Vincent D. Blondel,
Raphael M. Jungers,
Alex Olshevsky
Abstract:
A nonnegative matrix $A$ is called primitive if $A^k$ is positive for some integer $k>0$. A generalization of this concept to finite sets of matrices is as follows: a set of matrices $\mathcal M = \{A_1, A_2, \ldots, A_m \}$ is primitive if $A_{i_1} A_{i_2} \ldots A_{i_k}$ is positive for some indices $i_1, i_2, ..., i_k$. The concept of primitive sets of matrices comes up in a number of problems…
▽ More
A nonnegative matrix $A$ is called primitive if $A^k$ is positive for some integer $k>0$. A generalization of this concept to finite sets of matrices is as follows: a set of matrices $\mathcal M = \{A_1, A_2, \ldots, A_m \}$ is primitive if $A_{i_1} A_{i_2} \ldots A_{i_k}$ is positive for some indices $i_1, i_2, ..., i_k$. The concept of primitive sets of matrices comes up in a number of problems within the study of discrete-time switched systems. In this paper, we analyze the computational complexity of deciding if a given set of matrices is primitive and we derive bounds on the length of the shortest positive product.
We show that while primitivity is algorithmically decidable, unless $P=NP$ it is not possible to decide primitivity of a matrix set in polynomial time. Moreover, we show that the length of the shortest positive sequence can be superpolynomial in the dimension of the matrices. On the other hand, defining ${\mathcal P}$ to be the set of matrices with no zero rows or columns, we give a simple combinatorial proof of a previously-known characterization of primitivity for matrices in ${\mathcal P}$ which can be tested in polynomial time. This latter observation is related to the well-known 1964 conjecture of Cerny on synchronizing automata; in fact, any bound on the minimal length of a synchronizing word for synchronizing automata immediately translates into a bound on the length of the shortest positive product of a primitive set of matrices in ${\mathcal P}$. In particular, any primitive set of $n \times n$ matrices in ${\mathcal P}$ has a positive product of length $O(n^3)$.
△ Less
Submitted 15 April, 2015; v1 submitted 4 June, 2013;
originally announced June 2013.
-
Flow Motifs Reveal Limitations of the Static Framework to Represent Human interactions
Authors:
Luis Enrique Correa Rocha,
Vincent D Blondel
Abstract:
Networks are commonly used to define underlying interaction structures where infections, information, or other quantities may spread. Although the standard approach has been to aggregate all links into a static structure, some studies suggest that the time order in which the links are established may alter the dynamics of spreading. In this paper, we study the impact of the time ordering in the li…
▽ More
Networks are commonly used to define underlying interaction structures where infections, information, or other quantities may spread. Although the standard approach has been to aggregate all links into a static structure, some studies suggest that the time order in which the links are established may alter the dynamics of spreading. In this paper, we study the impact of the time ordering in the limits of flow on various empirical temporal networks. By using a random walk dynamics, we estimate the flow on links and convert the original undirected network (temporal and static) into a directed flow network. We then introduce the concept of flow motifs and quantify the divergence in the representativity of motifs when using the temporal and static frameworks. We find that the regularity of contacts and persistence of vertices (common in email communication and face-to-face interactions) result on little differences in the limits of flow for both frameworks. On the other hand, in the case of communication within a dating site (and of a sexual network), the flow between vertices changes significantly in the temporal framework such that the static approximation poorly represents the structure of contacts. We have also observed that cliques with 3 and 4 vertices con- taining only low-flow links are more represented than the same cliques with all high-flow links. The representativity of these low-flow cliques is higher in the temporal framework. Our results suggest that the flow between vertices connected in cliques depend on the topological context in which they are placed and in the time sequence in which the links are established. The structure of the clique alone does not completely characterize the potential of flow between the vertices.
△ Less
Submitted 13 March, 2013;
originally announced March 2013.
-
Exploring the Mobility of Mobile Phone Users
Authors:
Balázs Cs. Csáji,
Arnaud Browet,
V. A. Traag,
Jean-Charles Delvenne,
Etienne Huens,
Paul Van Dooren,
Zbigniew Smoreda,
Vincent D. Blondel
Abstract:
Mobile phone datasets allow for the analysis of human behavior on an unprecedented scale. The social network, temporal dynamics and mobile behavior of mobile phone users have often been analyzed independently from each other using mobile phone datasets. In this article, we explore the connections between various features of human behavior extracted from a large mobile phone dataset. Our observatio…
▽ More
Mobile phone datasets allow for the analysis of human behavior on an unprecedented scale. The social network, temporal dynamics and mobile behavior of mobile phone users have often been analyzed independently from each other using mobile phone datasets. In this article, we explore the connections between various features of human behavior extracted from a large mobile phone dataset. Our observations are based on the analysis of communication data of 100000 anonymized and randomly chosen individuals in a dataset of communications in Portugal. We show that clustering and principal component analysis allow for a significant dimension reduction with limited loss of information. The most important features are related to geographical location. In particular, we observe that most people spend most of their time at only a few locations. With the help of clustering methods, we then robustly identify home and office locations and compare the results with official census data. Finally, we analyze the geographic spread of users' frequent locations and show that commuting distances can be reasonably well explained by a gravity model.
△ Less
Submitted 26 November, 2012;
originally announced November 2012.
-
Cramér-Rao bounds for synchronization of rotations
Authors:
Nicolas Boumal,
Amit Singer,
P. -A. Absil,
Vincent D. Blondel
Abstract:
Synchronization of rotations is the problem of estimating a set of rotations R_i in SO(n), i = 1, ..., N, based on noisy measurements of relative rotations R_i R_j^T. This fundamental problem has found many recent applications, most importantly in structural biology. We provide a framework to study synchronization as estimation on Riemannian manifolds for arbitrary n under a large family of noise…
▽ More
Synchronization of rotations is the problem of estimating a set of rotations R_i in SO(n), i = 1, ..., N, based on noisy measurements of relative rotations R_i R_j^T. This fundamental problem has found many recent applications, most importantly in structural biology. We provide a framework to study synchronization as estimation on Riemannian manifolds for arbitrary n under a large family of noise models. The noise models we address encompass zero-mean isotropic noise, and we develop tools for Gaussian-like as well as heavy-tail types of noise in particular. As a main contribution, we derive the Cramér-Rao bounds of synchronization, that is, lower-bounds on the variance of unbiased estimators. We find that these bounds are structured by the pseudoinverse of the measurement graph Laplacian, where edge weights are proportional to measurement quality. We leverage this to provide interpretation in terms of random walks and visualization tools for these bounds in both the anchored and anchor-free scenarios. Similar bounds previously established were limited to rotations in the plane and Gaussian-like noise.
△ Less
Submitted 4 July, 2013; v1 submitted 7 November, 2012;
originally announced November 2012.
-
Data for Development: the D4D Challenge on Mobile Phone Data
Authors:
Vincent D. Blondel,
Markus Esch,
Connie Chan,
Fabrice Clerot,
Pierre Deville,
Etienne Huens,
Frédéric Morlot,
Zbigniew Smoreda,
Cezary Ziemlicki
Abstract:
The Orange "Data for Development" (D4D) challenge is an open data challenge on anonymous call patterns of Orange's mobile phone users in Ivory Coast. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Ivory Coast population. Participants to the challenge are given access to four mobile pho…
▽ More
The Orange "Data for Development" (D4D) challenge is an open data challenge on anonymous call patterns of Orange's mobile phone users in Ivory Coast. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Ivory Coast population. Participants to the challenge are given access to four mobile phone datasets and the purpose of this paper is to describe the four datasets. The website http://www.d4d.orange.com contains more information about the participation rules. The datasets are based on anonymized Call Detail Records (CDR) of phone calls and SMS exchanges between five million of Orange's customers in Ivory Coast between December 1, 2011 and April 28, 2012. The datasets are: (a) antenna-to-antenna traffic on an hourly basis, (b) individual trajectories for 50,000 customers for two week time windows with antenna location information, (3) individual trajectories for 500,000 customers over the entire observation period with sub-prefecture location information, and (4) a sample of communication graphs for 5,000 customers
△ Less
Submitted 28 January, 2013; v1 submitted 29 September, 2012;
originally announced October 2012.
-
Temporal Heterogeneities Increase the Prevalence of Epidemics on Evolving Networks
Authors:
Luis Enrique Correa Rocha,
Vincent D. Blondel
Abstract:
Empirical studies suggest that contact patterns follow heterogeneous inter-event times, meaning that intervals of high activity are followed by periods of inactivity. Combined with birth and death of individuals, these temporal constraints affect the spread of infections in a non-trivial way and are dependent on the particular contact dynamics. We propose a stochastic model to generate temporal ne…
▽ More
Empirical studies suggest that contact patterns follow heterogeneous inter-event times, meaning that intervals of high activity are followed by periods of inactivity. Combined with birth and death of individuals, these temporal constraints affect the spread of infections in a non-trivial way and are dependent on the particular contact dynamics. We propose a stochastic model to generate temporal networks where vertices make instantaneous contacts following heterogeneous inter-event times, and leave and enter the system at fixed rates. We study how these temporal properties affect the prevalence of an infection and estimate R0, the number of secondary infections, by modeling simulated infections (SIR, SI and SIS) co-evolving with the network structure. We find that heterogeneous contact patterns cause earlier and larger epidemics on the SIR model in comparison to homogeneous scenarios. In case of SI and SIS, the epidemics is faster in the early stages (up to 90% of prevalence) followed by a slowdown in the asymptotic limit in case of heterogeneous patterns. In the presence of birth and death, heterogeneous patterns always cause higher prevalence in comparison to homogeneous scenarios with same average inter-event times. Our results suggest that R0 may be underestimated if temporal heterogeneities are not taken into account in the modeling of epidemics.
△ Less
Submitted 26 June, 2012;
originally announced June 2012.
-
Epidemics on a stochastic model of temporal network
Authors:
Luis Enrique Correa Rocha,
Adeline Decuyper,
Vincent D Blondel
Abstract:
Contacts between individuals serve as pathways where infections may propagate. These contact patterns can be represented by network structures. Static structures have been the common modeling paradigm but recent results suggest that temporal structures play different roles to regulate the spread of infections or infection-like dynamics. On temporal networks a vertex is active only at certain momen…
▽ More
Contacts between individuals serve as pathways where infections may propagate. These contact patterns can be represented by network structures. Static structures have been the common modeling paradigm but recent results suggest that temporal structures play different roles to regulate the spread of infections or infection-like dynamics. On temporal networks a vertex is active only at certain moments and inactive otherwise such that a contact is not continuously available. In several empirical networks, the time between two consecutive vertex-activation events typically follows heterogeneous activity (e.g. bursts). In this chapter, we present a simple and intuitive stochastic model of a temporal network and investigate how epidemics co-evolves with the temporal structures, focusing on the growth dynamics of the epidemics. The model assumes no underlying topological structure and is only constrained by the time between two consecutive events of vertex activation. The main observation is that the speed of the infection spread is different in case of heterogeneous and homogeneous temporal patterns but the differences depend on the stage of the epidemics. In comparison to the homogeneous scenario, the power law case results in a faster growth in the beginning but turns out to be slower after a certain time, taking several time steps to reach the whole network.
△ Less
Submitted 24 April, 2012;
originally announced April 2012.
-
How to decide consensus? A combinatorial necessary and sufficient condition and a proof that consensus is decidable but NP-hard
Authors:
Vincent Blondel,
Alex Olshevsky
Abstract:
A set of stochastic matrices ${\cal P}$ is a consensus set if for every sequence of matrices $P(1), P(2), \ldots$ whose elements belong to ${\cal P}$ and every initial state $x(0)$, the sequence of states defined by $x(t) = P(t) P(t-1) \cdots P(1) x(0)$ converges to a vector whose entries are all identical. In this paper, we introduce an "avoiding set condition" for compact sets of matrices and pr…
▽ More
A set of stochastic matrices ${\cal P}$ is a consensus set if for every sequence of matrices $P(1), P(2), \ldots$ whose elements belong to ${\cal P}$ and every initial state $x(0)$, the sequence of states defined by $x(t) = P(t) P(t-1) \cdots P(1) x(0)$ converges to a vector whose entries are all identical. In this paper, we introduce an "avoiding set condition" for compact sets of matrices and prove in our main theorem that this explicit combinatorial condition is both necessary and sufficient for consensus. We show that several of the conditions for consensus proposed in the literature can be directly derived from the avoiding set condition. The avoiding set condition is easy to check with an elementary algorithm, and so our result also establishes that consensus is algorithmically decidable. Direct verification of the avoiding set condition may require more than a polynomial time number of operations. This is however likely to be the case for any consensus checking algorithm since we also prove in this paper that unless $P=NP$, consensus cannot be decided in polynomial time.
△ Less
Submitted 31 May, 2014; v1 submitted 14 February, 2012;
originally announced February 2012.
-
Effects of time window size and placement on the structure of aggregated networks
Authors:
Gautier Krings,
Márton Karsai,
Sebastian Bernharsson,
Vincent D Blondel,
Jari Saramäki
Abstract:
Complex networks are often constructed by aggregating empirical data over time, such that a link represents the existence of interactions between the endpoint nodes and the link weight represents the intensity of such interactions within the aggregation time window. The resulting networks are then often considered static. More often than not, the aggregation time window is dictated by the availabi…
▽ More
Complex networks are often constructed by aggregating empirical data over time, such that a link represents the existence of interactions between the endpoint nodes and the link weight represents the intensity of such interactions within the aggregation time window. The resulting networks are then often considered static. More often than not, the aggregation time window is dictated by the availability of data, and the effects of its length on the resulting networks are rarely considered. Here, we address this question by studying the structural features of networks emerging from aggregating empirical data over different time intervals, focussing on networks derived from time-stamped, anonymized mobile telephone call records. Our results show that short aggregation intervals yield networks where strong links associated with dense clusters dominate; the seeds of such clusters or communities become already visible for intervals of around one week. The degree and weight distributions are seen to become stationary around a few days and a few weeks, respectively. An aggregation interval of around 30 days results in the stablest similar networks when consecutive windows are compared. For longer intervals, the effects of weak or random links become increasingly stronger, and the average degree of the network keeps growing even for intervals up to 180 days. The placement of the time window is also seen to affect the outcome: for short windows, different behavioural patterns play a role during weekends and weekdays, and for longer windows it is seen that networks aggregated during holiday periods are significantly different.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
Extracting spatial information from networks with low-order eigenvectors
Authors:
Mihai Cucuringu,
Vincent D. Blondel,
Paul Van Dooren
Abstract:
We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix co…
▽ More
We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix constructed from the link intensities to obtain informative visualizations and capture natural geographical subdivisions. We observe that some low order eigenvectors localize very well and seem to reveal small geographically cohesive regions that match remarkably well with political and administrative boundaries. We discuss possible explanations for this observation by describing diffusion maps and localized eigenfunctions. In addition, we discuss a possible connection with the weighted graph cut problem, and provide numerical evidence supporting the idea that lower order eigenvectors point out local cuts in the network. However, we do not provide a formal and rigorous justification for our observations.
△ Less
Submitted 3 November, 2011;
originally announced November 2011.
-
An upper bound on community size in scalable community detection
Authors:
Gautier Krings,
Vincent D. Blondel
Abstract:
It is well-known that community detection methods based on modularity optimization often fails to discover small communities. Several objective functions used for community detection therefore involve a resolution parameter that allows the detection of communities at different scales. We provide an explicit upper bound on the community size of communities resulting from the optimization of several…
▽ More
It is well-known that community detection methods based on modularity optimization often fails to discover small communities. Several objective functions used for community detection therefore involve a resolution parameter that allows the detection of communities at different scales. We provide an explicit upper bound on the community size of communities resulting from the optimization of several of these functions. We also show with a simple example that the use of the resolution parameter may artificially force the complete disaggregation of large and densely connected communities.
△ Less
Submitted 29 March, 2011;
originally announced March 2011.
-
Interplay between telecommunications and face-to-face interactions - a study using mobile phone data
Authors:
Francesco Calabrese,
Zbigniew Smoreda,
Vincent D. Blondel,
Carlo Ratti
Abstract:
In this study we analyze one year of anonymized telecommunications data for over one million customers from a large European cellphone operator, and we investigate the relationship between people's calls and their physical location. We discover that more than 90% of users who have called each other have also shared the same space (cell tower), even if they live far apart. Moreover, we find that cl…
▽ More
In this study we analyze one year of anonymized telecommunications data for over one million customers from a large European cellphone operator, and we investigate the relationship between people's calls and their physical location. We discover that more than 90% of users who have called each other have also shared the same space (cell tower), even if they live far apart. Moreover, we find that close to 70% of users who call each other frequently (at least once per month on average) have shared the same space at the same time - an instance that we call co-location. Co-locations appear indicative of coordination calls, which occur just before face-to-face meetings. Their number is highly predictable based on the amount of calls between two users and the distance between their home locations - suggesting a new way to quantify the interplay between telecommunications and face-to-face interactions.
△ Less
Submitted 21 July, 2011; v1 submitted 24 January, 2011;
originally announced January 2011.
-
Uncovering space-independent communities in spatial networks
Authors:
Paul Expert,
Tim Evans,
Vincent D. Blondel,
Renaud Lambiotte
Abstract:
Many complex systems are organized in the form of a network embedded in space. Important examples include the physical Internet infrastucture, road networks, flight connections, brain functional networks and social networks. The effect of space on network topology has recently come under the spotlight because of the emergence of pervasive technologies based on geo-localization, which constantly fi…
▽ More
Many complex systems are organized in the form of a network embedded in space. Important examples include the physical Internet infrastucture, road networks, flight connections, brain functional networks and social networks. The effect of space on network topology has recently come under the spotlight because of the emergence of pervasive technologies based on geo-localization, which constantly fill databases with people's movements and thus reveal their trajectories and spatial behaviour. Extracting patterns and regularities from the resulting massive amount of human mobility data requires the development of appropriate tools for uncovering information in spatially-embedded networks. In contrast with most works that tend to apply standard network metrics to any type of network, we argue in this paper for a careful treatment of the constraints imposed by space on network topology. In particular, we focus on the problem of community detection and propose a modularity function adapted to spatial networks. We show that it is possible to factor out the effect of space in order to reveal more clearly hidden structural similarities between the nodes. Methods are tested on a large mobile phone network and computer-generated benchmarks where the effect of space has been incorporated.
△ Less
Submitted 3 January, 2012; v1 submitted 15 December, 2010;
originally announced December 2010.
-
The set of realizations of a max-plus linear sequence is semi-polyhedral
Authors:
Vincent Blondel,
Stéphane Gaubert,
Natacha Portier
Abstract:
We show that the set of realizations of a given dimension of a max-plus linear sequence is a finite union of polyhedral sets, which can be computed from any realization of the sequence. This yields an (expensive) algorithm to solve the max-plus minimal realization problem. These results are derived from general facts on rational expressions over idempotent commutative semirings: we show more gener…
▽ More
We show that the set of realizations of a given dimension of a max-plus linear sequence is a finite union of polyhedral sets, which can be computed from any realization of the sequence. This yields an (expensive) algorithm to solve the max-plus minimal realization problem. These results are derived from general facts on rational expressions over idempotent commutative semirings: we show more generally that the set of values of the coefficients of a commutative rational expression in one letter that yield a given max-plus linear sequence is a semi-algebraic set in the max-plus sense. In particular, it is a finite union of polyhedral sets.
△ Less
Submitted 18 October, 2010;
originally announced October 2010.
-
PageRank Optimization by Edge Selection
Authors:
Balázs Csanád Csáji,
Raphaël M. Jungers,
Vincent D. Blondel
Abstract:
The importance of a node in a directed graph can be measured by its PageRank. The PageRank of a node is used in a number of application contexts - including ranking websites - and can be interpreted as the average portion of time spent at the node by an infinite random walk. We consider the problem of maximizing the PageRank of a node by selecting some of the edges from a set of edges that are und…
▽ More
The importance of a node in a directed graph can be measured by its PageRank. The PageRank of a node is used in a number of application contexts - including ranking websites - and can be interpreted as the average portion of time spent at the node by an infinite random walk. We consider the problem of maximizing the PageRank of a node by selecting some of the edges from a set of edges that are under our control. By applying results from Markov decision theory, we show that an optimal solution to this problem can be found in polynomial time. Our core solution results in a linear programming formulation, but we also provide an alternative greedy algorithm, a variant of policy iteration, which runs in polynomial time, as well. Finally, we show that, under the slight modification for which we are given mutually exclusive pairs of edges, the problem of PageRank optimization becomes NP-hard.
△ Less
Submitted 18 January, 2012; v1 submitted 12 November, 2009;
originally announced November 2009.
-
Continuous-time average-preserving opinion dynamics with opinion-dependent communications
Authors:
Vincent D. Blondel,
Julien M. Hendrickx,
John N. Tsitsiklis
Abstract:
We study a simple continuous-time multi-agent system related to Krause's model of opinion dynamics: each agent holds a real value, and this value is continuously attracted by every other value differing from it by less than 1, with an intensity proportional to the difference.
We prove convergence to a set of clusters, with the agents in each cluster sharing a common value, and provide a lower…
▽ More
We study a simple continuous-time multi-agent system related to Krause's model of opinion dynamics: each agent holds a real value, and this value is continuously attracted by every other value differing from it by less than 1, with an intensity proportional to the difference.
We prove convergence to a set of clusters, with the agents in each cluster sharing a common value, and provide a lower bound on the distance between clusters at a stable equilibrium, under a suitable notion of multi-agent system stability.
To better understand the behavior of the system for a large number of agents, we introduce a variant involving a continuum of agents. We prove, under some conditions, the existence of a solution to the system dynamics, convergence to clusters, and a non-trivial lower bound on the distance between clusters. Finally, we establish that the continuum model accurately represents the asymptotic behavior of a system with a finite but large number of agents.
△ Less
Submitted 27 July, 2009;
originally announced July 2009.
-
Urban Gravity: a Model for Intercity Telecommunication Flows
Authors:
G. Krings,
F. Calabrese,
C. Ratti,
V. D. Blondel
Abstract:
We analyze the anonymous communication patterns of 2.5 million customers of a Belgian mobile phone operator. Grou** customers by billing address, we build a social network of cities, that consists of communications between 571 cities in Belgium. We show that inter-city communication intensity is characterized by a gravity model: the communication intensity between two cities is proportional to…
▽ More
We analyze the anonymous communication patterns of 2.5 million customers of a Belgian mobile phone operator. Grou** customers by billing address, we build a social network of cities, that consists of communications between 571 cities in Belgium. We show that inter-city communication intensity is characterized by a gravity model: the communication intensity between two cities is proportional to the product of their sizes divided by the square of their distance.
△ Less
Submitted 15 June, 2009; v1 submitted 5 May, 2009;
originally announced May 2009.
-
Dynamics of latent voters
Authors:
Renaud Lambiotte,
Jari Saramaki,
Vincent D. Blondel
Abstract:
We study the effect of latency on binary-choice opinion formation models. Latency is introduced into the models as an additional dynamic rule: after a voter changes its opinion, it enters a waiting period of stochastic length where no further changes take place. We first focus on the voter model and show that as a result of introducing latency, the average magnetization is not conserved, and the…
▽ More
We study the effect of latency on binary-choice opinion formation models. Latency is introduced into the models as an additional dynamic rule: after a voter changes its opinion, it enters a waiting period of stochastic length where no further changes take place. We first focus on the voter model and show that as a result of introducing latency, the average magnetization is not conserved, and the system is driven toward zero magnetization, independently of initial conditions. The model is studied analytically in the mean-field case and by simulations in one dimension. We also address the behavior of the Majority Rule model with added latency, and show that the competition between imitation and latency leads to a rich phenomenology.
△ Less
Submitted 10 November, 2008;
originally announced November 2008.
-
The Continuous Skolem-Pisot Problem: On the Complexity of Reachability for Linear Ordinary Differential Equations
Authors:
Paul Bell,
Jean-Charles Delvenne,
Raphael Jungers,
Vincent D. Blondel
Abstract:
We study decidability and complexity questions related to a continuous analogue of the Skolem-Pisot problem concerning the zeros and nonnegativity of a linear recurrent sequence. In particular, we show that the continuous version of the nonnegativity problem is NP-hard in general and we show that the presence of a zero is decidable for several subcases, including instances of depth two or less,…
▽ More
We study decidability and complexity questions related to a continuous analogue of the Skolem-Pisot problem concerning the zeros and nonnegativity of a linear recurrent sequence. In particular, we show that the continuous version of the nonnegativity problem is NP-hard in general and we show that the presence of a zero is decidable for several subcases, including instances of depth two or less, although the decidability in general is left open. The problems may also be stated as reachability problems related to real zeros of exponential polynomials or solutions to initial value problems of linear differential equations, which are interesting problems in their own right.
△ Less
Submitted 23 April, 2009; v1 submitted 12 September, 2008;
originally announced September 2008.
-
On Krause's multi-agent consensus model with state-dependent connectivity (Extended version)
Authors:
Vincent D. Blondel,
Julien M. Hendrickx,
John N. Tsitsiklis
Abstract:
We study a model of opinion dynamics introduced by Krause: each agent has an opinion represented by a real number, and updates its opinion by averaging all agent opinions that differ from its own by less than 1. We give a new proof of convergence into clusters of agents, with all agents in the same cluster holding the same opinion. We then introduce a particular notion of equilibrium stability a…
▽ More
We study a model of opinion dynamics introduced by Krause: each agent has an opinion represented by a real number, and updates its opinion by averaging all agent opinions that differ from its own by less than 1. We give a new proof of convergence into clusters of agents, with all agents in the same cluster holding the same opinion. We then introduce a particular notion of equilibrium stability and provide lower bounds on the inter-cluster distances at a stable equilibrium. To better understand the behavior of the system when the number of agents is large, we also introduce and study a variant involving a continuum of agents, obtaining partial convergence results and lower bounds on inter-cluster distances, under some mild assumptions.
△ Less
Submitted 12 March, 2009; v1 submitted 13 July, 2008;
originally announced July 2008.
-
The Role of Second Trials in Cascades of Information over Networks
Authors:
C. de Kerchove,
G. Krings,
R. Lambiotte,
V. D. Blondel,
P. Van Dooren
Abstract:
We study the propagation of information in social networks. To do so, we focus on a cascade model where nodes are infected with {probability $p_1$ after their first contact with the information and with probability $p_2$ at all subsequent contacts.} The diffusion starts from one random node and leads to a cascade of infection. It is shown that first and {subsequent} trials play different roles i…
▽ More
We study the propagation of information in social networks. To do so, we focus on a cascade model where nodes are infected with {probability $p_1$ after their first contact with the information and with probability $p_2$ at all subsequent contacts.} The diffusion starts from one random node and leads to a cascade of infection. It is shown that first and {subsequent} trials play different roles in the propagation and that the size of the cascade depends in a non-trivial way on $p_1$, $p_2$ and on the network structure. Second trials are shown to amplify the propagation in dense parts of the network while first trials are {dominant for the exploration of} new parts of the network and launching new seeds of infection.
△ Less
Submitted 11 January, 2009; v1 submitted 30 June, 2008;
originally announced June 2008.
-
Fast unfolding of communities in large networks
Authors:
Vincent D. Blondel,
Jean-Loup Guillaume,
Renaud Lambiotte,
Etienne Lefebvre
Abstract:
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying lan…
▽ More
We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .
△ Less
Submitted 25 July, 2008; v1 submitted 4 March, 2008;
originally announced March 2008.
-
Geographical dispersal of mobile communication networks
Authors:
Renaud Lambiotte,
Vincent D. Blondel,
Cristobald de Kerchove,
Etienne Huens,
Christophe Prieur,
Zbigniew Smoreda,
Paul Van Dooren
Abstract:
In this paper, we analyze statistical properties of a communication network constructed from the records of a mobile phone company. The network consists of 2.5 million customers that have placed 810 millions of communications (phone calls and text messages) over a period of 6 months and for whom we have geographical home localization information. It is shown that the degree distribution in this…
▽ More
In this paper, we analyze statistical properties of a communication network constructed from the records of a mobile phone company. The network consists of 2.5 million customers that have placed 810 millions of communications (phone calls and text messages) over a period of 6 months and for whom we have geographical home localization information. It is shown that the degree distribution in this network has a power-law degree distribution $k^{-5}$ and that the probability that two customers are connected by a link follows a gravity model, i.e. decreases like $d^{-2}$, where $d$ is the distance between the customers. We also consider the geographical extension of communication triangles and we show that communication triangles are not only composed of geographically adjacent nodes but that they may extend over large distances. This last property is not captured by the existing models of geographical networks and in a last section we propose a new model that reproduces the observed property. Our model, which is based on the migration and on the local adaptation of agents, is then studied analytically and the resulting predictions are confirmed by computer simulations.
△ Less
Submitted 1 May, 2008; v1 submitted 15 February, 2008;
originally announced February 2008.
-
Descent methods for Nonnegative Matrix Factorization
Authors:
Ngoc-Diep Ho,
Paul Van Dooren,
Vincent D. Blondel
Abstract:
In this paper, we present several descent methods that can be applied to nonnegative matrix factorization and we analyze a recently developped fast block coordinate method called Rank-one Residue Iteration (RRI). We also give a comparison of these different methods and show that the new block coordinate method has better properties in terms of approximation error and complexity. By interpreting…
▽ More
In this paper, we present several descent methods that can be applied to nonnegative matrix factorization and we analyze a recently developped fast block coordinate method called Rank-one Residue Iteration (RRI). We also give a comparison of these different methods and show that the new block coordinate method has better properties in terms of approximation error and complexity. By interpreting this method as a rank-one approximation of the residue matrix, we prove that it \emph{converges} and also extend it to the nonnegative tensor factorization and introduce some variants of the method by imposing some additional controllable constraints such as: sparsity, discreteness and smoothness.
△ Less
Submitted 24 August, 2009; v1 submitted 21 January, 2008;
originally announced January 2008.
-
Overlap-free words and spectra of matrices
Authors:
Raphael M. Jungers,
Vladimir Y. Protasov,
Vincent D. Blondel
Abstract:
Overlap-free words are words over the binary alphabet $A=\{a, b\}$ that do not contain factors of the form $xvxvx$, where $x \in A$ and $v \in A^*$. We analyze the asymptotic growth of the number $u_n$ of overlap-free words of length $n$ as $ n \to \infty$. We obtain explicit formulas for the minimal and maximal rates of growth of $u_n$ in terms of spectral characteristics (the lower spectral ra…
▽ More
Overlap-free words are words over the binary alphabet $A=\{a, b\}$ that do not contain factors of the form $xvxvx$, where $x \in A$ and $v \in A^*$. We analyze the asymptotic growth of the number $u_n$ of overlap-free words of length $n$ as $ n \to \infty$. We obtain explicit formulas for the minimal and maximal rates of growth of $u_n$ in terms of spectral characteristics (the lower spectral radius and the joint spectral radius) of certain sets of matrices of dimension $20 \times 20$. Using these descriptions we provide new estimates of the rates of growth that are within 0.4% and $0.03 %$ of their exact values. The best previously known bounds were within 11% and 3% respectively. We then prove that the value of $u_n$ actually has the same rate of growth for ``almost all'' natural numbers $n$. This ``average'' growth is distinct from the maximal and minimal rates and can also be expressed in terms of a spectral quantity (the Lyapunov exponent). We use this expression to estimate it. In order to obtain our estimates, we introduce new algorithms to compute spectral characteristics of sets of matrices. These algorithms can be used in other contexts and are of independent interest.
△ Less
Submitted 12 September, 2007;
originally announced September 2007.
-
Local Leaders in Random Networks
Authors:
V. D. Blondel,
J. -L. Guillaume,
J. M. Hendrickx,
C. de Kerchove,
R. Lambiotte
Abstract:
We consider local leaders in random uncorrelated networks, i.e. nodes whose degree is higher or equal than the degree of all of their neighbors. An analytical expression is found for the probability of a node of degree $k$ to be a local leader. This quantity is shown to exhibit a transition from a situation where high degree nodes are local leaders to a situation where they are not when the tail…
▽ More
We consider local leaders in random uncorrelated networks, i.e. nodes whose degree is higher or equal than the degree of all of their neighbors. An analytical expression is found for the probability of a node of degree $k$ to be a local leader. This quantity is shown to exhibit a transition from a situation where high degree nodes are local leaders to a situation where they are not when the tail of the degree distribution behaves like the power-law $\sim k^{-γ_c}$ with $γ_c=3$. Theoretical results are verified by computer simulations and the importance of finite-size effects is discussed.
△ Less
Submitted 27 July, 2007;
originally announced July 2007.
-
Distance distribution in random graphs and application to networks exploration
Authors:
Vincent D. Blondel,
Jean-Loup Guillaume,
Julien M. Hendrickx,
Raphael M. Jungers
Abstract:
We consider the problem of determining the proportion of edges that are discovered in an Erdos-Renyi graph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of exist…
▽ More
We consider the problem of determining the proportion of edges that are discovered in an Erdos-Renyi graph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of existence of the edges exhibits intriguing oscillatory behavior. In order to perform our analysis, we introduce a new way of computing the distribution of distances between nodes. Our method outperforms previous similar analyses and leads to estimates that coincide remarkably well with numerical simulations. It allows us to characterize the phase transitions appearing when the connectivity probability varies.
△ Less
Submitted 19 November, 2007; v1 submitted 22 June, 2007;
originally announced June 2007.
-
Linear time algorithms for Clobber
Authors:
Vincent D. Blondel,
Julien M. Hendrickx,
Raphael M. Jungers
Abstract:
We prove that the single-player game clobber is solvable in linear time when played on a line or on a cycle. For this purpose, we show that this game is equivalent to an optimization problem on a set of words defined by seven classes of forbidden patterns. We also prove that, playing on the cycle, it is always possible to remove at least 2n/3 pawns, and we give a conformation for which it is not…
▽ More
We prove that the single-player game clobber is solvable in linear time when played on a line or on a cycle. For this purpose, we show that this game is equivalent to an optimization problem on a set of words defined by seven classes of forbidden patterns. We also prove that, playing on the cycle, it is always possible to remove at least 2n/3 pawns, and we give a conformation for which it is not possible to do better, answering questions recently asked by Faria et al.
△ Less
Submitted 12 March, 2007;
originally announced March 2007.
-
On the Finiteness Property for Rational Matrices
Authors:
Raphael M. Jungers,
Vincent D. Blondel
Abstract:
We analyze the periodicity of optimal long products of matrices. A set of matrices is said to have the finiteness property if the maximal rate of growth of long products of matrices taken from the set can be obtained by a periodic product. It was conjectured a decade ago that all finite sets of real matrices have the finiteness property. This conjecture, known as the ``finiteness conjecture", is…
▽ More
We analyze the periodicity of optimal long products of matrices. A set of matrices is said to have the finiteness property if the maximal rate of growth of long products of matrices taken from the set can be obtained by a periodic product. It was conjectured a decade ago that all finite sets of real matrices have the finiteness property. This conjecture, known as the ``finiteness conjecture", is now known to be false but no explicit counterexample to the conjecture is available and in particular it is unclear if a counterexample is possible whose matrices have rational or binary entries. In this paper, we prove that finite sets of nonnegative rational matrices have the finiteness property if and only if \emph{pairs} of \emph{binary} matrices do. We also show that all {pairs} of $2 \times 2$ binary matrices have the finiteness property. These results have direct implications for the stability problem for sets of matrices. Stability is algorithmically decidable for sets of matrices that have the finiteness property and so it follows from our results that if all pairs of binary matrices have the finiteness property then stability is decidable for sets of nonnegative rational matrices. This would be in sharp contrast with the fact that the related problem of boundedness is known to be undecidable for sets of nonnegative rational matrices.
△ Less
Submitted 16 February, 2007;
originally announced February 2007.
-
Observable Graphs
Authors:
Raphael M. Jungers,
Vincent D. Blondel
Abstract:
An edge-colored directed graph is \emph{observable} if an agent that moves along its edges is able to determine his position in the graph after a sufficiently long observation of the edge colors. When the agent is able to determine his position only from time to time, the graph is said to be \emph{partly observable}. Observability in graphs is desirable in situations where autonomous agents are…
▽ More
An edge-colored directed graph is \emph{observable} if an agent that moves along its edges is able to determine his position in the graph after a sufficiently long observation of the edge colors. When the agent is able to determine his position only from time to time, the graph is said to be \emph{partly observable}. Observability in graphs is desirable in situations where autonomous agents are moving on a network and one wants to localize them (or the agent wants to localize himself) with limited information. In this paper, we completely characterize observable and partly observable graphs and show how these concepts relate to observable discrete event systems and to local automata. Based on these characterizations, we provide polynomial time algorithms to decide observability, to decide partial observability, and to compute the minimal number of observations necessary for finding the position of an agent. In particular we prove that in the worst case this minimal number of observations increases quadratically with the number of nodes in the graph.
From this it follows that it may be necessary for an agent to pass through the same node several times before he is finally able to determine his position in the graph. We then consider the more difficult question of assigning colors to a graph so as to make it observable and we prove that two different versions of this problem are NP-complete.
△ Less
Submitted 16 February, 2007;
originally announced February 2007.
-
Primitive operations for the construction and reorganization of minimally persistent formations
Authors:
Julien M. Hendrickx,
Baris Fidan,
Changbin Yu,
Brian D. O. Anderson,
Vincent D. Blondel
Abstract:
In this paper, we study the construction and transformation of two-dimensional persistent graphs. Persistence is a generalization to directed graphs of the undirected notion of rigidity. In the context of moving autonomous agent formations, persistence characterizes the efficacy of a directed structure of unilateral distances constraints seeking to preserve a formation shape. Analogously to the…
▽ More
In this paper, we study the construction and transformation of two-dimensional persistent graphs. Persistence is a generalization to directed graphs of the undirected notion of rigidity. In the context of moving autonomous agent formations, persistence characterizes the efficacy of a directed structure of unilateral distances constraints seeking to preserve a formation shape. Analogously to the powerful results about Henneberg sequences in minimal rigidity theory, we propose different types of directed graph operations allowing one to sequentially build any minimally persistent graph (i.e. persistent graph with a minimal number of edges for a given number of vertices), each intermediate graph being also minimally persistent. We also consider the more generic problem of obtaining one minimally persistent graph from another, which corresponds to the on-line reorganization of an autonomous agent formation. We prove that we can obtain any minimally persistent formation from any other one by a sequence of elementary local operations such that minimal persistence is preserved throughout the reorganization process.
△ Less
Submitted 8 September, 2006;
originally announced September 2006.
-
Efficient algorithms for deciding the type of growth of products of integer matrices
Authors:
Raphaël Jungers,
Vladimir Protasov,
Vincent D. Blondel
Abstract:
For a given finite set $Σ$ of matrices with nonnegative integer entries we study the growth of $$ \max_t(Σ) = \max\{\|A_{1}... A_{t}\|: A_i \in Σ\}.$$ We show how to determine in polynomial time whether the growth with $t$ is bounded, polynomial, or exponential, and we characterize precisely all possible behaviors.
For a given finite set $Σ$ of matrices with nonnegative integer entries we study the growth of $$ \max_t(Σ) = \max\{\|A_{1}... A_{t}\|: A_i \in Σ\}.$$ We show how to determine in polynomial time whether the growth with $t$ is bounded, polynomial, or exponential, and we characterize precisely all possible behaviors.
△ Less
Submitted 11 April, 2006;
originally announced April 2006.
-
On the complexity of computing the capacity of codes that avoid forbidden difference patterns
Authors:
Vincent D. Blondel,
Raphael Jungers,
Vladimir Protasov
Abstract:
We consider questions related to the computation of the capacity of codes that avoid forbidden difference patterns. The maximal number of $n$-bit sequences whose pairwise differences do not contain some given forbidden difference patterns increases exponentially with $n$. The exponent is the capacity of the forbidden patterns, which is given by the logarithm of the joint spectral radius of a set…
▽ More
We consider questions related to the computation of the capacity of codes that avoid forbidden difference patterns. The maximal number of $n$-bit sequences whose pairwise differences do not contain some given forbidden difference patterns increases exponentially with $n$. The exponent is the capacity of the forbidden patterns, which is given by the logarithm of the joint spectral radius of a set of matrices constructed from the forbidden difference patterns. We provide a new family of bounds that allows for the approximation, in exponential time, of the capacity with arbitrary high degree of accuracy. We also provide a polynomial time algorithm for the problem of determining if the capacity of a set is positive, but we prove that the same problem becomes NP-hard when the sets of forbidden patterns are defined over an extended set of symbols. Finally, we prove the existence of extremal norms for the sets of matrices arising in the capacity computation. This result makes it possible to apply a specific (even though non polynomial) approximation algorithm. We illustrate this fact by computing exactly the capacity of codes that were only known approximately.
△ Less
Submitted 10 January, 2006;
originally announced January 2006.
-
Computationally efficient approximations of the joint spectral radius
Authors:
Vincent Blondel,
Yurii Nesterov
Abstract:
The joint spectral radius of a set of matrices is a measure of the maximal asymptotic growth rate that can be obtained by forming long products of matrices taken from the set. This quantity appears in a number of application contexts but is notoriously difficult to compute and to approximate. We introduce in this paper a procedure for approximating the joint spectral radius of a finite set of ma…
▽ More
The joint spectral radius of a set of matrices is a measure of the maximal asymptotic growth rate that can be obtained by forming long products of matrices taken from the set. This quantity appears in a number of application contexts but is notoriously difficult to compute and to approximate. We introduce in this paper a procedure for approximating the joint spectral radius of a finite set of matrices with arbitrary high accuracy. Our approximation procedure is polynomial in the size of the matrices once the number of matrices and the desired accuracy are fixed.
△ Less
Submitted 28 July, 2004;
originally announced July 2004.
-
A measure of similarity between graph vertices
Authors:
Vincent Blondel,
Anahi Gajardo,
Maureen Heymans,
Pierre Senellart,
Paul Van Dooren
Abstract:
We introduce a concept of similarity between vertices of directed graphs. Let G_A and G_B be two directed graphs. We define a similarity matrix whose (i, j)-th real entry expresses how similar vertex j (in G_A) is to vertex i (in G_B. The similarity matrix can be obtained as the limit of the normalized even iterates of a linear transformation. In the special case where G_A=G_B=G, the matrix is s…
▽ More
We introduce a concept of similarity between vertices of directed graphs. Let G_A and G_B be two directed graphs. We define a similarity matrix whose (i, j)-th real entry expresses how similar vertex j (in G_A) is to vertex i (in G_B. The similarity matrix can be obtained as the limit of the normalized even iterates of a linear transformation. In the special case where G_A=G_B=G, the matrix is square and the (i, j)-th entry is the similarity score between the vertices i and j of G. We point out that Kleinberg's "hub and authority" method to identify web-pages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant eigenvector of a non-negative matrix. Potential applications of our similarity concept are numerous. We illustrate an application for the automatic extraction of synonyms in a monolingual dictionary.
△ Less
Submitted 28 July, 2004;
originally announced July 2004.
-
Decidability and Universality in Symbolic Dynamical Systems
Authors:
Jean-Charles Delvenne,
Petr Kurka,
Vincent Blondel
Abstract:
Many different definitions of computational universality for various types of dynamical systems have flourished since Turing's work. We propose a general definition of universality that applies to arbitrary discrete time symbolic dynamical systems. Universality of a system is defined as undecidability of a model-checking problem. For Turing machines, counter machines and tag systems, our definit…
▽ More
Many different definitions of computational universality for various types of dynamical systems have flourished since Turing's work. We propose a general definition of universality that applies to arbitrary discrete time symbolic dynamical systems. Universality of a system is defined as undecidability of a model-checking problem. For Turing machines, counter machines and tag systems, our definition coincides with the classical one. It yields, however, a new definition for cellular automata and subshifts. Our definition is robust with respect to initial condition, which is a desirable feature for physical realizability.
We derive necessary conditions for undecidability and universality. For instance, a universal system must have a sensitive point and a proper subsystem. We conjecture that universal systems have infinite number of subsystems. We also discuss the thesis according to which computation should occur at the `edge of chaos' and we exhibit a universal chaotic system.
△ Less
Submitted 8 July, 2005; v1 submitted 7 April, 2004;
originally announced April 2004.