-
Human mobility is well described by closed-form gravity-like models learned automatically from data
Authors:
Oriol Cabanas-Tirapu,
Lluís Danús,
Esteban Moro,
Marta Sales-Pardo,
Roger Guimerà
Abstract:
Modeling of human mobility is critical to address questions in urban planning and transportation, as well as global challenges in sustainability, public health, and economic development. However, our understanding and ability to model mobility flows within and between urban areas are still incomplete. At one end of the modeling spectrum we have simple so-called gravity models, which are easy to in…
▽ More
Modeling of human mobility is critical to address questions in urban planning and transportation, as well as global challenges in sustainability, public health, and economic development. However, our understanding and ability to model mobility flows within and between urban areas are still incomplete. At one end of the modeling spectrum we have simple so-called gravity models, which are easy to interpret and provide modestly accurate predictions of mobility flows. At the other end, we have complex machine learning and deep learning models, with tens of features and thousands of parameters, which predict mobility more accurately than gravity models at the cost of not being interpretable and not providing insight on human behavior. Here, we show that simple machine-learned, closed-form models of mobility are able to predict mobility flows more accurately, overall, than either gravity or complex machine and deep learning models. At the same time, these models are simple and gravity-like, and can be interpreted in terms similar to standard gravity models. Furthermore, these models work for different datasets and at different scales, suggesting that they may capture the fundamental universal features of human mobility.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Bayesian estimation of information-theoretic metrics for sparsely sampled distributions
Authors:
Angelo Piga,
Lluc Font-Pomarol,
Marta Sales-Pardo,
Roger Guimerà
Abstract:
Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Existing approaches to address these problems have shortcomings: they are biased, heuristic, work only for some distribut…
▽ More
Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Existing approaches to address these problems have shortcomings: they are biased, heuristic, work only for some distributions, and/or cannot be applied to all information-theoretic metrics. Here, we propose a fast, semi-analytical estimator for sparsely sampled distributions that is efficient, precise, and general. Its derivation is grounded in probabilistic considerations and uses a hierarchical Bayesian approach to extract as much information as possible from the few observations available. Our approach provides estimates of the Shannon entropy with precision at least comparable to the state of the art, and most often better. It can also be used to obtain accurate estimates of any other information-theoretic metric, including the notoriously challenging Kullback-Leibler divergence. Here, again, our approach performs consistently better than existing estimators.
△ Less
Submitted 22 February, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Differences in collaboration structures and impact among prominent researchers in Europe and North America
Authors:
Lluis Danus,
Carles Muntaner,
Alexander Krauss,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Scientists collaborate through intricate networks, which impact the quality and scope of their research. At the same time, funding and institutional arrangements, as well as scientific and political cultures, affect the structure of collaboration networks. Since such arrangements and cultures differ across regions in the world in systematic ways, we surmise that collaboration networks and impact s…
▽ More
Scientists collaborate through intricate networks, which impact the quality and scope of their research. At the same time, funding and institutional arrangements, as well as scientific and political cultures, affect the structure of collaboration networks. Since such arrangements and cultures differ across regions in the world in systematic ways, we surmise that collaboration networks and impact should also differ systematically across regions. To test this, we compare the structure of collaboration networks among prominent researchers in North America and Europe. We find that prominent researchers in Europe establish denser collaboration networks, whereas those in North-America establish more decentralized networks. We also find that the impact of the publications of prominent researchers in North America is significantly higher than for those in Europe, both when they collaborate with other prominent researchers and when they do not. Although Europeans collaborate with other prominent researchers more often, which increases their impact, we also find that repeated collaboration among prominent researchers decreases the synergistic effect of collaborating.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Fundamental limits to learning closed-form mathematical models from data
Authors:
Oscar Fajardo-Fontiveros,
Ignasi Reichardt,
Harry R. De Los Rios,
Jordi Duch,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Given a finite and noisy dataset generated with a closed-form mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this model-learning problem displays a transition from a low-noise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model t…
▽ More
Given a finite and noisy dataset generated with a closed-form mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this model-learning problem displays a transition from a low-noise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model to be learned by any method. Both in the low-noise phase and in the high-noise phase, probabilistic model selection leads to optimal generalization to unseen data. This is in contrast to standard machine learning approaches, including artificial neural networks, which in this particular problem are limited, in the low-noise phase, by their ability to interpolate. In the transition region between the learnable and unlearnable phases, generalization is hard for all approaches including probabilistic model selection.
△ Less
Submitted 16 December, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Node metadata can produce predictability transitions in network inference problems
Authors:
Oscar Fajardo-Fontiveros,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help to solve network inference problems. Indeed, several approaches have been proposed to introduce metadata into probabilistic network models and to use them to make better inferences. Howeve…
▽ More
Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help to solve network inference problems. Indeed, several approaches have been proposed to introduce metadata into probabilistic network models and to use them to make better inferences. However, we know little about the effect of such metadata in the inference process. Here, we investigate this issue. We find that, rather than affecting inference gradually, adding metadata causes abrupt transitions in the inference process and in our ability to make accurate predictions, from a situation in which metadata does not play any role to a situation in which metadata completely dominates the inference process. When network data and metadata are partly correlated, metadata optimally contributes to the inference process at the transition between data-dominated and metadata-dominated regimes.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Complex decision-making strategies in a stock market experiment explained as the combination of few simple strategies
Authors:
Gael Poux-Medard,
Sergio Cobo-Lopez,
Jordi Duch,
Roger Guimera,
Marta Sales-Pardo
Abstract:
Many studies have shown that there are regularities in the way human beings make decisions. However, our ability to obtain models that capture such regularities and can accurately predict unobserved decisions is still limited. We tackle this problem in the context of individuals who are given information relative to the evolution of market prices and asked to guess the direction of the market. We…
▽ More
Many studies have shown that there are regularities in the way human beings make decisions. However, our ability to obtain models that capture such regularities and can accurately predict unobserved decisions is still limited. We tackle this problem in the context of individuals who are given information relative to the evolution of market prices and asked to guess the direction of the market. We use a networks inference approach with stochastic block models (SBM) to find the model and network representation that is most predictive of unobserved decisions. Our results suggest that users mostly use recent information (about the market and about their previous decisions) to guess. Furthermore, the analysis of SBM groups reveals a set of strategies used by players to process information and make decisions that is analogous to behaviors observed in other contexts. Our study provides and example on how to quantitatively explore human behavior strategies by representing decisions as networks and using rigorous inference and model-selection approaches.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
A Bayesian machine scientist to aid in the solution of challenging scientific problems
Authors:
Roger Guimera,
Ignasi Reichardt,
Antoni Aguilar-Mogas,
Francesco A Massucci,
Manuel Miranda,
Jordi Pallares,
Marta Sales-Pardo
Abstract:
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, w…
▽ More
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Optimal prediction of decisions and model selection in social dilemmas using block models
Authors:
Sergio Cobo-Lopez,
Antonia Godoy-Lorite,
Jordi Duch,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Advancing our understanding of human behavior hinges on the ability of theories to unveil the mechanisms underlying such behaviors. Measuring the ability of theories and models to predict unobserved behaviors provides a principled method to evaluate their merit and, thus, to help establish which mechanisms are most plausible. Here, we propose models and develop rigorous inference approaches to pre…
▽ More
Advancing our understanding of human behavior hinges on the ability of theories to unveil the mechanisms underlying such behaviors. Measuring the ability of theories and models to predict unobserved behaviors provides a principled method to evaluate their merit and, thus, to help establish which mechanisms are most plausible. Here, we propose models and develop rigorous inference approaches to predict strategic decisions in dyadic social dilemmas. In particular, we use bipartite stochastic block models that incorporate information about the dilemmas faced by individuals. We show, combining these models with empirical data on strategic decisions in dyadic social dilemmas, that individual strategic decisions are to a large extent predictable, despite not being "rational." The analysis of these models also allows us to conclude that: (i) individuals do not perceive games according their game-theoretical structure; (ii) individuals make decisions using combinations of multiple simple strategies, which our approach reveals naturally.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Network-based models for social recommender systems
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
With the overwhelming online products available in recent years, there is an increasing need to filter and deliver relevant personalized advice for users. Recommender systems solve this problem by modeling and predicting individual preferences for a great variety of items such as movies, books or research articles. In this chapter, we explore rigorous network-based models that outperform leading a…
▽ More
With the overwhelming online products available in recent years, there is an increasing need to filter and deliver relevant personalized advice for users. Recommender systems solve this problem by modeling and predicting individual preferences for a great variety of items such as movies, books or research articles. In this chapter, we explore rigorous network-based models that outperform leading approaches for recommendation. The network models we consider are based on the explicit assumption that there are groups of individuals and of items, and that the preferences of an individual for an item are determined only by their group memberships. The accurate prediction of individual user preferences over items can be accomplished by different methodologies, such as Monte Carlo sampling or Expectation-Maximization methods, the latter resulting in a scalable algorithm which is suitable for large datasets.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Tensorial and bipartite block models for link prediction in layered networks and temporal networks
Authors:
Marc Tarres-Deulofeu,
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
Many real-world complex systems are well represented as multilayer networks; predicting interactions in those systems is one of the most pressing problems in predictive network science. To address this challenge, we introduce two stochastic block models for multilayer and temporal networks; one of them uses nodes as its fundamental unit, whereas the other focuses on links. We also develop scalable…
▽ More
Many real-world complex systems are well represented as multilayer networks; predicting interactions in those systems is one of the most pressing problems in predictive network science. To address this challenge, we introduce two stochastic block models for multilayer and temporal networks; one of them uses nodes as its fundamental unit, whereas the other focuses on links. We also develop scalable algorithms for inferring the parameters of these models. Because our models describe all layers simultaneously, our approach takes full advantage of the information contained in the whole network when making predictions about any particular layer. We illustrate the potential of our approach by analyzing two empirical datasets---a temporal network of email communications, and a network of drug interactions for treating different cancer types. We find that modeling all layers simultaneously does result, in general, in more accurate link prediction. However, the most predictive model depends on the dataset under consideration; whereas the node-based model is more appropriate for predicting drug interactions, the link-based model is more appropriate for predicting email communication.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
Inferring propagation paths for sparsely observed perturbations on complex networks
Authors:
Francesco Alessandro Massucci,
Jonathan Wheeler,
Raul Beltran-Debon,
Jorge Joven,
Marta Sales-Pardo,
Roger Guimera
Abstract:
In a complex system, perturbations propagate by following paths on the network of interactions among the system's units. In contrast to what happens with the spreading of epidemics, observations of general perturbations are often very sparse in time (there is a single observation of the perturbed system) and in "space" (only a few perturbed and unperturbed units are observed). A major challenge in…
▽ More
In a complex system, perturbations propagate by following paths on the network of interactions among the system's units. In contrast to what happens with the spreading of epidemics, observations of general perturbations are often very sparse in time (there is a single observation of the perturbed system) and in "space" (only a few perturbed and unperturbed units are observed). A major challenge in many areas, from biology to the social sciences, is to infer the propagation paths from observations of the effects of perturbation under these sparsity conditions. We address this problem and show that it is possible to go beyond the usual approach of using the shortest paths connecting the known perturbed nodes. Specifically, we show that a simple and general probabilistic model, which we solved using belief propagation, provides fast and accurate estimates of the probabilities of nodes being perturbed.
△ Less
Submitted 4 December, 2017;
originally announced January 2018.
-
Journal of Open Source Software (JOSS): design and first-year review
Authors:
Arfon M Smith,
Kyle E Niemeyer,
Daniel S Katz,
Lorena A Barba,
George Githinji,
Melissa Gymrek,
Kathryn D Huff,
Christopher R Madan,
Abigail Cabunoc Mayes,
Kevin M Moerman,
Pjotr Prins,
Karthik Ram,
Ariel Rokem,
Tracy K Teal,
Roman Valls Guimera,
Jacob T Vanderplas
Abstract:
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit s…
▽ More
This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a DOI, deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative.
△ Less
Submitted 24 January, 2018; v1 submitted 7 July, 2017;
originally announced July 2017.
-
Accurate and scalable social recommendation using mixed-membership stochastic block models
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Cristopher Moore,
Marta Sales-Pardo
Abstract:
With ever-increasing amounts of online information available, modeling and predicting individual preferences-for books or articles, for example-is becoming more and more important. Good predictions enable us to improve advice to users, and obtain a better understanding of the socio-psychological processes that determine those preferences. We have developed a collaborative filtering model, with an…
▽ More
With ever-increasing amounts of online information available, modeling and predicting individual preferences-for books or articles, for example-is becoming more and more important. Good predictions enable us to improve advice to users, and obtain a better understanding of the socio-psychological processes that determine those preferences. We have developed a collaborative filtering model, with an associated scalable algorithm, that makes accurate predictions of individuals' preferences. Our approach is based on the explicit assumption that there are groups of individuals and of items, and that the preferences of an individual for an item are determined only by their group memberships. Importantly, we allow each individual and each item to belong simultaneously to mixtures of different groups and, unlike many popular approaches, such as matrix factorization, we do not assume implicitly or explicitly that individuals in each group prefer items in a single group of items. The resulting overlap** groups and the predicted preferences can be inferred with a expectation-maximization algorithm whose running time scales linearly (per iteration). Our approach enables us to predict individual preferences in large datasets, and is considerably more accurate than the current algorithms for such large datasets.
△ Less
Submitted 6 April, 2016; v1 submitted 5 April, 2016;
originally announced April 2016.
-
Long-term evolution of email networks: Statistical regularities, predictability and stability of social behaviors
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
In social networks, individuals constantly drop ties and replace them by new ones in a highly unpredictable fashion. This highly dynamical nature of social ties has important implications for processes such as the spread of information or of epidemics. Several studies have demonstrated the influence of a number of factors on the intricate microscopic process of tie replacement, but the macroscopic…
▽ More
In social networks, individuals constantly drop ties and replace them by new ones in a highly unpredictable fashion. This highly dynamical nature of social ties has important implications for processes such as the spread of information or of epidemics. Several studies have demonstrated the influence of a number of factors on the intricate microscopic process of tie replacement, but the macroscopic long-term effects of such changes remain largely unexplored. Here we investigate whether, despite the inherent randomness at the microscopic level, there are macroscopic statistical regularities in the long-term evolution of social networks. In particular, we analyze the email network of a large organization with over 1,000 individuals throughout four consecutive years. We find that, although the evolution of individual ties is highly unpredictable, the macro-evolution of social communication networks follows well-defined statistical patterns, characterized by exponentially decaying log-variations of the weight of social ties and of individuals' social strength. At the same time, we find that individuals have social signatures and communication strategies that are remarkably stable over the scale of several years.
△ Less
Submitted 21 January, 2016; v1 submitted 4 June, 2015;
originally announced June 2015.
-
Multilayer stochastic block models reveal the multilayer structure of complex networks
Authors:
Toni Valles-Catala,
Francesco A. Massucci,
Roger Guimera,
Marta Sales-Pardo
Abstract:
In complex systems, the network of interactions we observe between system's components is the aggregate of the interactions that occur through different mechanisms or layers. Recent studies reveal that the existence of multiple interaction layers can have a dramatic impact in the dynamical processes occurring on these systems. However, these studies assume that the interactions between systems com…
▽ More
In complex systems, the network of interactions we observe between system's components is the aggregate of the interactions that occur through different mechanisms or layers. Recent studies reveal that the existence of multiple interaction layers can have a dramatic impact in the dynamical processes occurring on these systems. However, these studies assume that the interactions between systems components in each one of the layers are known, while typically for real-world systems we do not have that information. Here, we address the issue of uncovering the different interaction layers from aggregate data by introducing multilayer stochastic block models (SBMs), a generalization of single-layer SBMs that considers different mechanisms of layer aggregation. First, we find the complete probabilistic solution to the problem of finding the optimal multilayer SBM for a given aggregate observed network. Because this solution is computationally intractable, we propose an approximation that enables us to verify that multilayer SBMs are more predictive of network structure in real-world complex systems.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
Predicting human preferences using the block structure of complex social networks
Authors:
Roger Guimera,
Alejandro Llorente,
Esteban Moro,
Marta Sales-Pardo
Abstract:
With ever-increasing available data, predicting individuals' preferences and hel** them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have bee…
▽ More
With ever-increasing available data, predicting individuals' preferences and hel** them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.
-
Optimal network topologies for local search with congestion
Authors:
R. Guimera,
A. Arenas,
A. Diaz-Guilera,
F. Vega-Redondo,
A. Cabrales
Abstract:
The problem of searchability in decentralized complex networks is of great importance in computer science, economy and sociology. We present a formalism that is able to cope simultaneously with the problem of search and the congestion effects that arise when parallel searches are performed, and obtain expressions for the average search cost--written in terms of the search algorithm and the topol…
▽ More
The problem of searchability in decentralized complex networks is of great importance in computer science, economy and sociology. We present a formalism that is able to cope simultaneously with the problem of search and the congestion effects that arise when parallel searches are performed, and obtain expressions for the average search cost--written in terms of the search algorithm and the topological properties of the network--both in presence and abscence of congestion. This formalism is used to obtain optimal network structures for a system using a local search algorithm. It is found that only two classes of networks can be optimal: star-like configurations, when the number of parallel searches is small, and homogeneous-isotropic configurations, when the number of parallel searches is large.
△ Less
Submitted 21 October, 2002; v1 submitted 21 June, 2002;
originally announced June 2002.