-
Human mobility is well described by closed-form gravity-like models learned automatically from data
Authors:
Oriol Cabanas-Tirapu,
Lluís Danús,
Esteban Moro,
Marta Sales-Pardo,
Roger Guimerà
Abstract:
Modeling of human mobility is critical to address questions in urban planning and transportation, as well as global challenges in sustainability, public health, and economic development. However, our understanding and ability to model mobility flows within and between urban areas are still incomplete. At one end of the modeling spectrum we have simple so-called gravity models, which are easy to in…
▽ More
Modeling of human mobility is critical to address questions in urban planning and transportation, as well as global challenges in sustainability, public health, and economic development. However, our understanding and ability to model mobility flows within and between urban areas are still incomplete. At one end of the modeling spectrum we have simple so-called gravity models, which are easy to interpret and provide modestly accurate predictions of mobility flows. At the other end, we have complex machine learning and deep learning models, with tens of features and thousands of parameters, which predict mobility more accurately than gravity models at the cost of not being interpretable and not providing insight on human behavior. Here, we show that simple machine-learned, closed-form models of mobility are able to predict mobility flows more accurately, overall, than either gravity or complex machine and deep learning models. At the same time, these models are simple and gravity-like, and can be interpreted in terms similar to standard gravity models. Furthermore, these models work for different datasets and at different scales, suggesting that they may capture the fundamental universal features of human mobility.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Bayesian estimation of information-theoretic metrics for sparsely sampled distributions
Authors:
Angelo Piga,
Lluc Font-Pomarol,
Marta Sales-Pardo,
Roger Guimerà
Abstract:
Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Existing approaches to address these problems have shortcomings: they are biased, heuristic, work only for some distribut…
▽ More
Estimating the Shannon entropy of a discrete distribution from which we have only observed a small sample is challenging. Estimating other information-theoretic metrics, such as the Kullback-Leibler divergence between two sparsely sampled discrete distributions, is even harder. Existing approaches to address these problems have shortcomings: they are biased, heuristic, work only for some distributions, and/or cannot be applied to all information-theoretic metrics. Here, we propose a fast, semi-analytical estimator for sparsely sampled distributions that is efficient, precise, and general. Its derivation is grounded in probabilistic considerations and uses a hierarchical Bayesian approach to extract as much information as possible from the few observations available. Our approach provides estimates of the Shannon entropy with precision at least comparable to the state of the art, and most often better. It can also be used to obtain accurate estimates of any other information-theoretic metric, including the notoriously challenging Kullback-Leibler divergence. Here, again, our approach performs consistently better than existing estimators.
△ Less
Submitted 22 February, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
Differences in collaboration structures and impact among prominent researchers in Europe and North America
Authors:
Lluis Danus,
Carles Muntaner,
Alexander Krauss,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Scientists collaborate through intricate networks, which impact the quality and scope of their research. At the same time, funding and institutional arrangements, as well as scientific and political cultures, affect the structure of collaboration networks. Since such arrangements and cultures differ across regions in the world in systematic ways, we surmise that collaboration networks and impact s…
▽ More
Scientists collaborate through intricate networks, which impact the quality and scope of their research. At the same time, funding and institutional arrangements, as well as scientific and political cultures, affect the structure of collaboration networks. Since such arrangements and cultures differ across regions in the world in systematic ways, we surmise that collaboration networks and impact should also differ systematically across regions. To test this, we compare the structure of collaboration networks among prominent researchers in North America and Europe. We find that prominent researchers in Europe establish denser collaboration networks, whereas those in North-America establish more decentralized networks. We also find that the impact of the publications of prominent researchers in North America is significantly higher than for those in Europe, both when they collaborate with other prominent researchers and when they do not. Although Europeans collaborate with other prominent researchers more often, which increases their impact, we also find that repeated collaboration among prominent researchers decreases the synergistic effect of collaborating.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Fundamental limits to learning closed-form mathematical models from data
Authors:
Oscar Fajardo-Fontiveros,
Ignasi Reichardt,
Harry R. De Los Rios,
Jordi Duch,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Given a finite and noisy dataset generated with a closed-form mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this model-learning problem displays a transition from a low-noise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model t…
▽ More
Given a finite and noisy dataset generated with a closed-form mathematical model, when is it possible to learn the true generating model from the data alone? This is the question we investigate here. We show that this model-learning problem displays a transition from a low-noise phase in which the true model can be learned, to a phase in which the observation noise is too high for the true model to be learned by any method. Both in the low-noise phase and in the high-noise phase, probabilistic model selection leads to optimal generalization to unseen data. This is in contrast to standard machine learning approaches, including artificial neural networks, which in this particular problem are limited, in the low-noise phase, by their ability to interpolate. In the transition region between the learnable and unlearnable phases, generalization is hard for all approaches including probabilistic model selection.
△ Less
Submitted 16 December, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Node metadata can produce predictability transitions in network inference problems
Authors:
Oscar Fajardo-Fontiveros,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help to solve network inference problems. Indeed, several approaches have been proposed to introduce metadata into probabilistic network models and to use them to make better inferences. Howeve…
▽ More
Network inference is the process of learning the properties of complex networks from data. Besides using information about known links in the network, node attributes and other forms of network metadata can help to solve network inference problems. Indeed, several approaches have been proposed to introduce metadata into probabilistic network models and to use them to make better inferences. However, we know little about the effect of such metadata in the inference process. Here, we investigate this issue. We find that, rather than affecting inference gradually, adding metadata causes abrupt transitions in the inference process and in our ability to make accurate predictions, from a situation in which metadata does not play any role to a situation in which metadata completely dominates the inference process. When network data and metadata are partly correlated, metadata optimally contributes to the inference process at the transition between data-dominated and metadata-dominated regimes.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Complex decision-making strategies in a stock market experiment explained as the combination of few simple strategies
Authors:
Gael Poux-Medard,
Sergio Cobo-Lopez,
Jordi Duch,
Roger Guimera,
Marta Sales-Pardo
Abstract:
Many studies have shown that there are regularities in the way human beings make decisions. However, our ability to obtain models that capture such regularities and can accurately predict unobserved decisions is still limited. We tackle this problem in the context of individuals who are given information relative to the evolution of market prices and asked to guess the direction of the market. We…
▽ More
Many studies have shown that there are regularities in the way human beings make decisions. However, our ability to obtain models that capture such regularities and can accurately predict unobserved decisions is still limited. We tackle this problem in the context of individuals who are given information relative to the evolution of market prices and asked to guess the direction of the market. We use a networks inference approach with stochastic block models (SBM) to find the model and network representation that is most predictive of unobserved decisions. Our results suggest that users mostly use recent information (about the market and about their previous decisions) to guess. Furthermore, the analysis of SBM groups reveals a set of strategies used by players to process information and make decisions that is analogous to behaviors observed in other contexts. Our study provides and example on how to quantitatively explore human behavior strategies by representing decisions as networks and using rigorous inference and model-selection approaches.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Bayesian machine scientist to compare data collapses for the Nikuradse dataset
Authors:
Ignasi Reichardt,
Jordi Pallares Marta Sales-Pardo,
Roger Guimera
Abstract:
Ever since Nikuradse's experiments on turbulent friction in 1933, there have been theoretical attempts to describe his measurements by collapsing the data into single-variable functions. However, this approach, which is common in other areas of physics and in other fields, is limited by the lack of rigorous quantitative methods to compare alternative data collapses. Here, we address this limitatio…
▽ More
Ever since Nikuradse's experiments on turbulent friction in 1933, there have been theoretical attempts to describe his measurements by collapsing the data into single-variable functions. However, this approach, which is common in other areas of physics and in other fields, is limited by the lack of rigorous quantitative methods to compare alternative data collapses. Here, we address this limitation by using an unsupervised method to find analytic functions that optimally describe each of the data collapses for the Nikuradse dataset. By descaling these analytic functions, we show that a low dispersion of the scaled data does not guarantee that a data collapse is a good description of the original data. In fact, we find that, out of all the proposed data collapses, the original one proposed by Prandtl and Nikuradse over 80 years ago provides the best description of the data so far, and that it also agrees well with recent experimental data, provided that some model parameters are allowed to vary across experiments.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
A Bayesian machine scientist to aid in the solution of challenging scientific problems
Authors:
Roger Guimera,
Ignasi Reichardt,
Antoni Aguilar-Mogas,
Francesco A Massucci,
Manuel Miranda,
Jordi Pallares,
Marta Sales-Pardo
Abstract:
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, w…
▽ More
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Optimal prediction of decisions and model selection in social dilemmas using block models
Authors:
Sergio Cobo-Lopez,
Antonia Godoy-Lorite,
Jordi Duch,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Advancing our understanding of human behavior hinges on the ability of theories to unveil the mechanisms underlying such behaviors. Measuring the ability of theories and models to predict unobserved behaviors provides a principled method to evaluate their merit and, thus, to help establish which mechanisms are most plausible. Here, we propose models and develop rigorous inference approaches to pre…
▽ More
Advancing our understanding of human behavior hinges on the ability of theories to unveil the mechanisms underlying such behaviors. Measuring the ability of theories and models to predict unobserved behaviors provides a principled method to evaluate their merit and, thus, to help establish which mechanisms are most plausible. Here, we propose models and develop rigorous inference approaches to predict strategic decisions in dyadic social dilemmas. In particular, we use bipartite stochastic block models that incorporate information about the dilemmas faced by individuals. We show, combining these models with empirical data on strategic decisions in dyadic social dilemmas, that individual strategic decisions are to a large extent predictable, despite not being "rational." The analysis of these models also allows us to conclude that: (i) individuals do not perceive games according their game-theoretical structure; (ii) individuals make decisions using combinations of multiple simple strategies, which our approach reveals naturally.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Network-based models for social recommender systems
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
With the overwhelming online products available in recent years, there is an increasing need to filter and deliver relevant personalized advice for users. Recommender systems solve this problem by modeling and predicting individual preferences for a great variety of items such as movies, books or research articles. In this chapter, we explore rigorous network-based models that outperform leading a…
▽ More
With the overwhelming online products available in recent years, there is an increasing need to filter and deliver relevant personalized advice for users. Recommender systems solve this problem by modeling and predicting individual preferences for a great variety of items such as movies, books or research articles. In this chapter, we explore rigorous network-based models that outperform leading approaches for recommendation. The network models we consider are based on the explicit assumption that there are groups of individuals and of items, and that the preferences of an individual for an item are determined only by their group memberships. The accurate prediction of individual user preferences over items can be accomplished by different methodologies, such as Monte Carlo sampling or Expectation-Maximization methods, the latter resulting in a scalable algorithm which is suitable for large datasets.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Tensorial and bipartite block models for link prediction in layered networks and temporal networks
Authors:
Marc Tarres-Deulofeu,
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
Many real-world complex systems are well represented as multilayer networks; predicting interactions in those systems is one of the most pressing problems in predictive network science. To address this challenge, we introduce two stochastic block models for multilayer and temporal networks; one of them uses nodes as its fundamental unit, whereas the other focuses on links. We also develop scalable…
▽ More
Many real-world complex systems are well represented as multilayer networks; predicting interactions in those systems is one of the most pressing problems in predictive network science. To address this challenge, we introduce two stochastic block models for multilayer and temporal networks; one of them uses nodes as its fundamental unit, whereas the other focuses on links. We also develop scalable algorithms for inferring the parameters of these models. Because our models describe all layers simultaneously, our approach takes full advantage of the information contained in the whole network when making predictions about any particular layer. We illustrate the potential of our approach by analyzing two empirical datasets---a temporal network of email communications, and a network of drug interactions for treating different cancer types. We find that modeling all layers simultaneously does result, in general, in more accurate link prediction. However, the most predictive model depends on the dataset under consideration; whereas the node-based model is more appropriate for predicting drug interactions, the link-based model is more appropriate for predicting email communication.
△ Less
Submitted 5 March, 2018;
originally announced March 2018.
-
Inferring propagation paths for sparsely observed perturbations on complex networks
Authors:
Francesco Alessandro Massucci,
Jonathan Wheeler,
Raul Beltran-Debon,
Jorge Joven,
Marta Sales-Pardo,
Roger Guimera
Abstract:
In a complex system, perturbations propagate by following paths on the network of interactions among the system's units. In contrast to what happens with the spreading of epidemics, observations of general perturbations are often very sparse in time (there is a single observation of the perturbed system) and in "space" (only a few perturbed and unperturbed units are observed). A major challenge in…
▽ More
In a complex system, perturbations propagate by following paths on the network of interactions among the system's units. In contrast to what happens with the spreading of epidemics, observations of general perturbations are often very sparse in time (there is a single observation of the perturbed system) and in "space" (only a few perturbed and unperturbed units are observed). A major challenge in many areas, from biology to the social sciences, is to infer the propagation paths from observations of the effects of perturbation under these sparsity conditions. We address this problem and show that it is possible to go beyond the usual approach of using the shortest paths connecting the known perturbed nodes. Specifically, we show that a simple and general probabilistic model, which we solved using belief propagation, provides fast and accurate estimates of the probabilities of nodes being perturbed.
△ Less
Submitted 4 December, 2017;
originally announced January 2018.
-
Consistencies and inconsistencies between model selection and link prediction in networks
Authors:
Toni Vallès-Català,
Tiago P. Peixoto,
Roger Guimerà,
Marta Sales-Pardo
Abstract:
A principled approach to understand network structures is to formulate generative models. Given a collection of models, however, an outstanding key task is to determine which one provides a more accurate description of the network at hand, discounting statistical fluctuations. This problem can be approached using two principled criteria that at first may seem equivalent: selecting the most plausib…
▽ More
A principled approach to understand network structures is to formulate generative models. Given a collection of models, however, an outstanding key task is to determine which one provides a more accurate description of the network at hand, discounting statistical fluctuations. This problem can be approached using two principled criteria that at first may seem equivalent: selecting the most plausible model in terms of its posterior probability; or selecting the model with the highest predictive performance in terms of identifying missing links. Here we show that while these two approaches yield consistent results in most of cases, there are also notable instances where they do not, that is, where the most plausible model is not the most predictive. We show that in the latter case the improvement of predictive performance can in fact lead to overfitting both in artificial and empirical settings. Furthermore, we show that, in general, the predictive performance is higher when we average over collections of models that are individually less plausible, than when we consider only the single most plausible model.
△ Less
Submitted 28 June, 2018; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Bone fusion in normal and pathological development is constrained by the network architecture of the human skull
Authors:
Borja Esteve-Altava,
Toni Valles-Catala,
Roger Guimera,
Marta Sales-Pardo,
Diego Rasskin-Gutman
Abstract:
The premature fusion of cranial bones, craniosynostosis, affects the correct development of the skull producing morphological malformations in newborns. To assess the susceptibility of each craniofacial articulation to close prematurely, we used a network model of the skull to quantify the link reliability (an index based on stochastic block modeling and Bayesian inference) of each articulation. W…
▽ More
The premature fusion of cranial bones, craniosynostosis, affects the correct development of the skull producing morphological malformations in newborns. To assess the susceptibility of each craniofacial articulation to close prematurely, we used a network model of the skull to quantify the link reliability (an index based on stochastic block modeling and Bayesian inference) of each articulation. We show that, of the 93 human skull articulations at birth, the few articulations that are associated with nonsyndromic craniosynostosis conditions have statistically significant lower reliability scores than the others. In a similar way, articulations that close during the normal postnatal development of the skull have also lower reliability scores than those articulations that persist through adult live. These results indicate a relationship between the architecture of the skull network and the specific articulations that close during normal development and in pathological conditions. Our findings suggest that the topological arrangement of skull bones might act as an epigenetic factor, predisposing some articulations to closure, both in normal and pathological development, and also affecting the long-term evolution of the skull.
△ Less
Submitted 27 December, 2016; v1 submitted 27 July, 2016;
originally announced September 2016.
-
iMet: A computational tool for structural annotation of unknown metabolites from tandem mass spectra
Authors:
Antoni Aguilar-Mogas,
Marta Sales-Pardo,
Miriam Navarro,
Ralf Tautenhahn,
Roger Guimerà,
Oscar Yanes
Abstract:
Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that could potentially allow the annotation of metabolites not discovered previously. iMet uses MS/MS s…
▽ More
Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that could potentially allow the annotation of metabolites not discovered previously. iMet uses MS/MS spectra to identify metabolites structurally similar to an unknown metabolite, and gives a net atomic addition or removal that converts the known metabolite into the unknown one. We validate the algorithm with 148 metabolites, and show that for 89% of them at least one of the top four matches identified by iMet enables the proper annotation of the unknown metabolite. iMet is freely available at http://imet.seeslab.net.
△ Less
Submitted 14 July, 2016;
originally announced July 2016.
-
Accurate and scalable social recommendation using mixed-membership stochastic block models
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Cristopher Moore,
Marta Sales-Pardo
Abstract:
With ever-increasing amounts of online information available, modeling and predicting individual preferences-for books or articles, for example-is becoming more and more important. Good predictions enable us to improve advice to users, and obtain a better understanding of the socio-psychological processes that determine those preferences. We have developed a collaborative filtering model, with an…
▽ More
With ever-increasing amounts of online information available, modeling and predicting individual preferences-for books or articles, for example-is becoming more and more important. Good predictions enable us to improve advice to users, and obtain a better understanding of the socio-psychological processes that determine those preferences. We have developed a collaborative filtering model, with an associated scalable algorithm, that makes accurate predictions of individuals' preferences. Our approach is based on the explicit assumption that there are groups of individuals and of items, and that the preferences of an individual for an item are determined only by their group memberships. Importantly, we allow each individual and each item to belong simultaneously to mixtures of different groups and, unlike many popular approaches, such as matrix factorization, we do not assume implicitly or explicitly that individuals in each group prefer items in a single group of items. The resulting overlap** groups and the predicted preferences can be inferred with a expectation-maximization algorithm whose running time scales linearly (per iteration). Our approach enables us to predict individual preferences in large datasets, and is considerably more accurate than the current algorithms for such large datasets.
△ Less
Submitted 6 April, 2016; v1 submitted 5 April, 2016;
originally announced April 2016.
-
Scaling and optimal synergy: Two principles determining microbial growth in complex media
Authors:
Francesco Alessandro Massucci,
Roger Guimerà,
Luís A. Nunes Amaral,
Marta Sales-Pardo
Abstract:
High-throughput experimental techniques and bioinformatics tools make it possible to obtain reconstructions of the metabolism of microbial species. Combined with mathematical frameworks such as flux balance analysis, which assumes that nutrients are used so as to maximize growth, these reconstructions enable us to predict microbial growth.
Although such predictions are generally accurate, these…
▽ More
High-throughput experimental techniques and bioinformatics tools make it possible to obtain reconstructions of the metabolism of microbial species. Combined with mathematical frameworks such as flux balance analysis, which assumes that nutrients are used so as to maximize growth, these reconstructions enable us to predict microbial growth.
Although such predictions are generally accurate, these approaches do not give insights on how different nutrients are used to produce growth, and thus are difficult to generalize to new media or to different organisms.
Here, we propose a systems-level phenomenological model of metabolism inspired by the virial expansion. Our model predicts biomass production given the nutrient uptakes and a reduced set of parameters, which can be easily determined experimentally. To validate our model, we test it against in silico simulations and experimental measurements of growth, and find good agreement. From a biological point of view, our model uncovers the impact that individual nutrients and the synergistic interaction between nutrient pairs have on growth, and suggests that we can understand the growth maximization principle as the optimization of nutrient synergies.
△ Less
Submitted 6 July, 2015;
originally announced July 2015.
-
Long-term evolution of email networks: Statistical regularities, predictability and stability of social behaviors
Authors:
Antonia Godoy-Lorite,
Roger Guimera,
Marta Sales-Pardo
Abstract:
In social networks, individuals constantly drop ties and replace them by new ones in a highly unpredictable fashion. This highly dynamical nature of social ties has important implications for processes such as the spread of information or of epidemics. Several studies have demonstrated the influence of a number of factors on the intricate microscopic process of tie replacement, but the macroscopic…
▽ More
In social networks, individuals constantly drop ties and replace them by new ones in a highly unpredictable fashion. This highly dynamical nature of social ties has important implications for processes such as the spread of information or of epidemics. Several studies have demonstrated the influence of a number of factors on the intricate microscopic process of tie replacement, but the macroscopic long-term effects of such changes remain largely unexplored. Here we investigate whether, despite the inherent randomness at the microscopic level, there are macroscopic statistical regularities in the long-term evolution of social networks. In particular, we analyze the email network of a large organization with over 1,000 individuals throughout four consecutive years. We find that, although the evolution of individual ties is highly unpredictable, the macro-evolution of social communication networks follows well-defined statistical patterns, characterized by exponentially decaying log-variations of the weight of social ties and of individuals' social strength. At the same time, we find that individuals have social signatures and communication strategies that are remarkably stable over the scale of several years.
△ Less
Submitted 21 January, 2016; v1 submitted 4 June, 2015;
originally announced June 2015.
-
Predicting future conflict between team-members with parameter-free models of social networks
Authors:
Nuria Rovira-Asenjo,
Tania Gumi,
Marta Sales-Pardo,
Roger Guimera
Abstract:
Despite the well-documented benefits of working in teams, teamwork also results in communication, coordination and management costs, and may lead to personal conflict between team members. In a context where teams play an increasingly important role, it is of major importance to understand conflict and to develop diagnostic tools to avert it. Here, we investigate empirically whether it is possible…
▽ More
Despite the well-documented benefits of working in teams, teamwork also results in communication, coordination and management costs, and may lead to personal conflict between team members. In a context where teams play an increasingly important role, it is of major importance to understand conflict and to develop diagnostic tools to avert it. Here, we investigate empirically whether it is possible to quantitatively predict future conflict in small teams using parameter-free models of social network structure. We analyze data of conflict appearance and resolution between 86 team members in 16 small teams, all working in a real project for nine consecutive months. We find that group-based models of complex networks successfully anticipate conflict in small teams whereas micro-based models of structural balance, which have been traditionally used to model conflict, do not.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
A network inference method for large-scale unsupervised identification of novel drug-drug interactions
Authors:
Roger Guimera,
Marta Sales-Pardo
Abstract:
Characterizing interactions between drugs is important to avoid potentially harmful combinations, to reduce off-target effects of treatments and to fight antibiotic resistant pathogens, among others. Here we present a network inference algorithm to predict uncharacterized drug-drug interactions. Our algorithm takes, as its only input, sets of previously reported interactions, and does not require…
▽ More
Characterizing interactions between drugs is important to avoid potentially harmful combinations, to reduce off-target effects of treatments and to fight antibiotic resistant pathogens, among others. Here we present a network inference algorithm to predict uncharacterized drug-drug interactions. Our algorithm takes, as its only input, sets of previously reported interactions, and does not require any pharmacological or biochemical information about the drugs, their targets or their mechanisms of action. Because the models we use are abstract, our approach can deal with adverse interactions, synergistic/antagonistic/suppressing interactions, or any other type of drug interaction. We show that our method is able to accurately predict interactions, both in exhaustive pairwise interaction data between small sets of drugs, and in large-scale databases. We also demonstrate that our algorithm can be used efficiently to discover interactions of new drugs as part of the drug discovery process.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Multilayer stochastic block models reveal the multilayer structure of complex networks
Authors:
Toni Valles-Catala,
Francesco A. Massucci,
Roger Guimera,
Marta Sales-Pardo
Abstract:
In complex systems, the network of interactions we observe between system's components is the aggregate of the interactions that occur through different mechanisms or layers. Recent studies reveal that the existence of multiple interaction layers can have a dramatic impact in the dynamical processes occurring on these systems. However, these studies assume that the interactions between systems com…
▽ More
In complex systems, the network of interactions we observe between system's components is the aggregate of the interactions that occur through different mechanisms or layers. Recent studies reveal that the existence of multiple interaction layers can have a dramatic impact in the dynamical processes occurring on these systems. However, these studies assume that the interactions between systems components in each one of the layers are known, while typically for real-world systems we do not have that information. Here, we address the issue of uncovering the different interaction layers from aggregate data by introducing multilayer stochastic block models (SBMs), a generalization of single-layer SBMs that considers different mechanisms of layer aggregation. First, we find the complete probabilistic solution to the problem of finding the optimal multilayer SBM for a given aggregate observed network. Because this solution is computationally intractable, we propose an approximation that enables us to verify that multilayer SBMs are more predictive of network structure in real-world complex systems.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.
-
The Possible Role of Resource Requirements and Academic Career-Choice Risk on Gender Differences in Publication Rate and Impact
Authors:
Jordi Duch,
Xiao Han T. Zeng,
Marta Sales-Pardo,
Filippo Radicchi,
Shayna Otis,
Teresa K. Woodruff,
Luis A. Nunes Amaral
Abstract:
Many studies demonstrate that there is still a significant gender bias, especially at higher career levels, in many areas including science, technology, engineering, and mathematics (STEM). We investigated field-dependent, gender-specific effects of the selective pressures individuals experience as they pursue a career in academia within seven STEM disciplines. We built a unique database that comp…
▽ More
Many studies demonstrate that there is still a significant gender bias, especially at higher career levels, in many areas including science, technology, engineering, and mathematics (STEM). We investigated field-dependent, gender-specific effects of the selective pressures individuals experience as they pursue a career in academia within seven STEM disciplines. We built a unique database that comprises 437,787 publications authored by 4,292 faculty members at top United States research universities. Our analyses reveal that gender differences in publication rate and impact are discipline-specific. Our results also support two hypotheses. First, the widely-reported lower publication rates of female faculty are correlated with the amount of research resources typically needed in the discipline considered, and thus may be explained by the lower level of institutional support historically received by females. Second, in disciplines where pursuing an academic position incurs greater career risk, female faculty tend to have a greater fraction of higher impact publications than males. Our findings have significant, field-specific, policy implications for achieving diversity at the faculty level within the STEM disciplines.
△ Less
Submitted 13 December, 2012;
originally announced December 2012.
-
Justice blocks and predictability of US Supreme Court votes
Authors:
Roger Guimera,
Marta Sales-Pardo
Abstract:
Successful attempts to predict judges' votes shed light into how legal decisions are made and, ultimately, into the behavior and evolution of the judiciary. Here, we investigate to what extent it is possible to make predictions of a justice's vote based on the other justices' votes in the same case. For our predictions, we use models and methods that have been developed to uncover hidden associati…
▽ More
Successful attempts to predict judges' votes shed light into how legal decisions are made and, ultimately, into the behavior and evolution of the judiciary. Here, we investigate to what extent it is possible to make predictions of a justice's vote based on the other justices' votes in the same case. For our predictions, we use models and methods that have been developed to uncover hidden associations between actors in complex social networks. We show that these methods are more accurate at predicting justice's votes than forecasts made by legal experts and by algorithms that take into consideration the content of the cases. We argue that, within our framework, high predictability is a quantitative proxy for stable justice (and case) blocks, which probably reflect stable a priori attitudes toward the law. We find that U. S. Supreme Court justice votes are more predictable than one would expect from an ideal court composed of perfectly independent justices. Deviations from ideal behavior are most apparent in divided 5-4 decisions, where justice blocks seem to be most stable. Moreover, we find evidence that justice predictability decreased during the 50-year period spanning from the Warren Court to the Rehnquist Court, and that aggregate court predictability has been significantly lower during Democratic presidencies. More broadly, our results show that it is possible to use methods developed for the analysis of complex social networks to quantitatively investigate historical questions related to political decision-making.
△ Less
Submitted 17 October, 2012;
originally announced October 2012.
-
Predicting human preferences using the block structure of complex social networks
Authors:
Roger Guimera,
Alejandro Llorente,
Esteban Moro,
Marta Sales-Pardo
Abstract:
With ever-increasing available data, predicting individuals' preferences and hel** them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have bee…
▽ More
With ever-increasing available data, predicting individuals' preferences and hel** them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a "new" computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups.
△ Less
Submitted 3 October, 2012;
originally announced October 2012.
-
Missing and spurious interactions and the reconstruction of complex networks
Authors:
R. Guimera,
M. Sales-Pardo
Abstract:
Network analysis is currently used in a myriad of contexts: from identifying potential drug targets to predicting the spread of epidemics and designing vaccination strategies, and from finding friends to uncovering criminal activity. Despite the promise of the network approach, the reliability of network data is a source of great concern in all fields where complex networks are studied. Here, we p…
▽ More
Network analysis is currently used in a myriad of contexts: from identifying potential drug targets to predicting the spread of epidemics and designing vaccination strategies, and from finding friends to uncovering criminal activity. Despite the promise of the network approach, the reliability of network data is a source of great concern in all fields where complex networks are studied. Here, we present a general mathematical and computational framework to deal with the problem of data reliability in complex networks. In particular, we are able to reliably identify both missing and spurious interactions in noisy network observations. Remarkably, our approach also enables us to obtain, from those noisy observations, network reconstructions that yield estimates of the true network properties that are more accurate than those provided by the observations themselves. Our approach has the potential to guide experiments, to better characterize network data sets, and to drive new discoveries.
△ Less
Submitted 27 April, 2010;
originally announced April 2010.
-
Micro-bias and macro-performance
Authors:
S. M. D. Seaver,
A. A. Moreira,
M. Sales-Pardo,
R. D. Malmgren,
D. Diermeier,
L. A. N. Amaral
Abstract:
We use agent-based modeling to investigate the effect of conservatism and partisanship on the efficiency with which large populations solve the density classification task--a paradigmatic problem for information aggregation and consensus building. We find that conservative agents enhance the populations' ability to efficiently solve the density classification task despite large levels of noise i…
▽ More
We use agent-based modeling to investigate the effect of conservatism and partisanship on the efficiency with which large populations solve the density classification task--a paradigmatic problem for information aggregation and consensus building. We find that conservative agents enhance the populations' ability to efficiently solve the density classification task despite large levels of noise in the system. In contrast, we find that the presence of even a small fraction of partisans holding the minority position will result in deadlock or a consensus on an incorrect answer. Our results provide a possible explanation for the emergence of conservatism and suggest that even low levels of partisanship can lead to significant social costs.
△ Less
Submitted 28 August, 2009;
originally announced August 2009.
-
Detection of node group membership in networks with group overlap
Authors:
Erin N. Sawardecker,
Marta Sales-Pardo,
Luís A. Nunes Amaral
Abstract:
Most networks found in social and biochemical systems have modular structures. An important question prompted by the modularity of these networks is whether nodes can be said to belong to a single group. If they cannot, we would need to consider the role of "overlap** communities." Despite some efforts in this direction, the problem of detecting overlap** groups remains unsolved because ther…
▽ More
Most networks found in social and biochemical systems have modular structures. An important question prompted by the modularity of these networks is whether nodes can be said to belong to a single group. If they cannot, we would need to consider the role of "overlap** communities." Despite some efforts in this direction, the problem of detecting overlap** groups remains unsolved because there is neither a formal definition of overlap** community, nor an ensemble of networks with which to test the performance of group detection algorithms when nodes can belong to more than one group. Here, we introduce an ensemble of networks with overlap** groups. We then apply three group identification methods--modularity maximization, k-clique percolation, and modularity-landscape surveying--to these networks. We find that the modularity-landscape surveying method is the only one able to detect heterogeneities in node memberships, and that those heterogeneities are only detectable when the overlap is small. Surprisingly, we find that the k-clique percolation method is unable to detect node membership for the overlap** case.
△ Less
Submitted 5 December, 2008;
originally announced December 2008.
-
Extracting the hierarchical organization of complex systems
Authors:
M. Sales-Pardo,
R. Guimera,
A. Moreira,
L. Amaral
Abstract:
Extracting understanding from the growing ``sea'' of biological and socio-economic data is one of the most pressing scientific challenges facing us. Here, we introduce and validate an unsupervised method that is able to accurately extract the hierarchical organization of complex biological, social, and technological networks. We define an ensemble of hierarchically nested random graphs, which we…
▽ More
Extracting understanding from the growing ``sea'' of biological and socio-economic data is one of the most pressing scientific challenges facing us. Here, we introduce and validate an unsupervised method that is able to accurately extract the hierarchical organization of complex biological, social, and technological networks. We define an ensemble of hierarchically nested random graphs, which we use to validate the method. We then apply our method to real-world networks, including the air-transportation network, an electronic circuit, an email exchange network, and metabolic networks. We find that our method enables us to obtain an accurate multi-scale descriptions of a complex system.
△ Less
Submitted 11 May, 2007;
originally announced May 2007.
-
Module identification in bipartite and directed networks
Authors:
R. Guimera,
M. Sales-Pardo,
L. A. N. Amaral
Abstract:
Modularity is one of the most prominent properties of real-world complex networks. Here, we address the issue of module identification in two important classes of networks: bipartite networks and directed unipartite networks. Nodes in bipartite networks are divided into two non-overlap** sets, and the links must have one end node from each set. Directed unipartite networks only have one type o…
▽ More
Modularity is one of the most prominent properties of real-world complex networks. Here, we address the issue of module identification in two important classes of networks: bipartite networks and directed unipartite networks. Nodes in bipartite networks are divided into two non-overlap** sets, and the links must have one end node from each set. Directed unipartite networks only have one type of nodes, but links have an origin and an end. We show that directed unipartite networks can be conviniently represented as bipartite networks for module identification purposes. We report a novel approach especially suited for module detection in bipartite networks, and define a set of random networks that enable us to validate the new approach.
△ Less
Submitted 6 September, 2007; v1 submitted 12 January, 2007;
originally announced January 2007.
-
Classes of complex networks defined by role-to-role connectivity profiles
Authors:
R. Guimera,
M. Sales-Pardo,
L. A. N. Amaral
Abstract:
Interactions between units in phyical, biological, technological, and social systems usually give rise to intrincate networks with non-trivial structure, which critically affects the dynamics and properties of the system. The focus of most current research on complex networks is on global network properties. A caveat of this approach is that the relevance of global properties hinges on the premi…
▽ More
Interactions between units in phyical, biological, technological, and social systems usually give rise to intrincate networks with non-trivial structure, which critically affects the dynamics and properties of the system. The focus of most current research on complex networks is on global network properties. A caveat of this approach is that the relevance of global properties hinges on the premise that networks are homogeneous, whereas most real-world networks have a markedly modular structure. Here, we report that networks with different functions, including the Internet, metabolic, air transportation, and protein interaction networks, have distinct patterns of connections among nodes with different roles, and that, as a consequence, complex networks can be classified into two distinct functional classes based on their link type frequency. Importantly, we demonstrate that the above structural features cannot be captured by means of often studied global properties.
△ Less
Submitted 12 January, 2007;
originally announced January 2007.
-
Mesoscopic modeling for nucleic acid chain dynamics
Authors:
M. Sales-Pardo,
R. Guimera,
A. A. Moreira,
J. Widom,
L. A. N. Amaral
Abstract:
To gain a deeper insight into cellular processes such as transcription and translation, one needs to uncover the mechanisms controlling the configurational changes of nucleic acids. As a step toward this aim, we present here a novel mesoscopic-level computational model that provides a new window into nucleic acid dynamics. We model a single-stranded nucleic as a polymer chain whose monomers are…
▽ More
To gain a deeper insight into cellular processes such as transcription and translation, one needs to uncover the mechanisms controlling the configurational changes of nucleic acids. As a step toward this aim, we present here a novel mesoscopic-level computational model that provides a new window into nucleic acid dynamics. We model a single-stranded nucleic as a polymer chain whose monomers are the nucleosides. Each monomer comprises a bead representing the sugar molecule and a pin representing the base. The bead-pin complex can rotate about the backbone of the chain. We consider pairwise stacking and hydrogen-bonding interactions. We use a modified Monte Carlo dynamics that splits the dynamics into translational bead motion and rotational pin motion. By performing a number of tests we first show that our model is physically sound. We then focus on the study of a the kinetics of a DNA hairpin--a single-stranded molecule comprising two complementary segments joined by a non-complementary loop--studied experimentally. We find that results from our simulations agree with experimental observations, demonstrating that our model is a suitable tool for the investigation of the hybridization of single strands.
△ Less
Submitted 1 June, 2005;
originally announced June 2005.
-
Modularity from Fluctuations in Random Graphs and Complex Networks
Authors:
Roger Guimera,
Marta Sales-Pardo,
Luis A. N. Amaral
Abstract:
The mechanisms by which modularity emerges in complex networks are not well understood but recent reports have suggested that modularity may arise from evolutionary selection. We show that finding the modularity of a network is analogous to finding the ground-state energy of a spin system. Moreover, we demonstrate that, due to fluctuations, stochastic network models give rise to modular networks…
▽ More
The mechanisms by which modularity emerges in complex networks are not well understood but recent reports have suggested that modularity may arise from evolutionary selection. We show that finding the modularity of a network is analogous to finding the ground-state energy of a spin system. Moreover, we demonstrate that, due to fluctuations, stochastic network models give rise to modular networks. Specifically, we show both numerically and analytically that random graphs and scale-free networks have modularity. We argue that this fact must be taken into consideration to define statistically-significant modularity in complex networks.
△ Less
Submitted 24 August, 2004; v1 submitted 26 March, 2004;
originally announced March 2004.