-
Integrating behavioral experimental findings into dynamical models to inform social change interventions
Authors:
Radu Tanase,
René Algesheimer,
Manuel S. Mariani
Abstract:
Addressing global challenges -- from public health to climate change -- often involves stimulating the large-scale adoption of new products or behaviors. Research traditions that focus on individual decision making suggest that achieving this objective requires better identifying the drivers of individual adoption choices. On the other hand, computational approaches rooted in complexity science fo…
▽ More
Addressing global challenges -- from public health to climate change -- often involves stimulating the large-scale adoption of new products or behaviors. Research traditions that focus on individual decision making suggest that achieving this objective requires better identifying the drivers of individual adoption choices. On the other hand, computational approaches rooted in complexity science focus on maximizing the propagation of a given product or behavior throughout social networks of interconnected adopters. The integration of these two perspectives -- although advocated by several research communities -- has remained elusive so far. Here we show how achieving this integration could inform seeding policies to facilitate the large-scale adoption of a given behavior or product. Drawing on complex contagion and discrete choice theories, we propose a method to estimate individual-level thresholds to adoption, and validate its predictive power in two choice experiments. By integrating the estimated thresholds into computational simulations, we show that state-of-the-art seeding methods for social influence maximization might be suboptimal if they neglect individual-level behavioral drivers, which can be corrected through the proposed experimental method.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Ranking species in complex ecosystems through nestedness maximization
Authors:
Manuel Sebastian Mariani,
Dario Mazzilli,
Aurelio Patelli,
Flaviano Morone
Abstract:
Identifying the rank of species in a social or ecological network is a difficult task, since the rank of each species is invariably determined by complex interactions stipulated with other species. Simply put, the rank of a species is a function of the ranks of all other species through the adjacency matrix of the network. A common system of ranking is to order species in such a way that their nei…
▽ More
Identifying the rank of species in a social or ecological network is a difficult task, since the rank of each species is invariably determined by complex interactions stipulated with other species. Simply put, the rank of a species is a function of the ranks of all other species through the adjacency matrix of the network. A common system of ranking is to order species in such a way that their neighbours form maximally nested sets, a problem called nested maximization problem (NMP). Here we show that the NMP can be formulated as an instance of the Quadratic Assignment Problem, one of the most important combinatorial optimization problem widely studied in computer science, economics, and operations research. We tackle the problem by Statistical Physics techniques: we derive a set of self-consistent nonlinear equations whose fixed point represents the optimal rankings of species in an arbitrary bipartite mutualistic network, which generalize the Fitness-Complexity equations widely used in the field of economic complexity. Furthermore, we present an efficient algorithm to solve the NMP that outperforms state-of-the-art network-based metrics and genetic algorithms. Eventually, our theoretical framework may be easily generalized to study the relationship between ranking and network structure beyond pairwise interactions, e.g. in higher-order networks.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Equivalence between the Fitness-Complexity and the Sinkhorn-Knopp algorithms
Authors:
Dario Mazzilli,
Manuel Sebastian Mariani,
Flaviano Morone,
Aurelio Patelli
Abstract:
We uncover the connection between the Fitness-Complexity algorithm, developed in the economic complexity field, and the Sinkhorn-Knopp algorithm, widely used in diverse domains ranging from computer science and mathematics to economics. Despite minor formal differences between the two methods, both converge to the same fixed-point solution up to normalization. The discovered connection allows us t…
▽ More
We uncover the connection between the Fitness-Complexity algorithm, developed in the economic complexity field, and the Sinkhorn-Knopp algorithm, widely used in diverse domains ranging from computer science and mathematics to economics. Despite minor formal differences between the two methods, both converge to the same fixed-point solution up to normalization. The discovered connection allows us to derive a rigorous interpretation of the Fitness and the Complexity metrics as the potentials of a suitable energy function. Under this interpretation, high-energy products are unfeasible for low-fitness countries, which explains why the algorithm is effective at displaying nested patterns in bipartite networks. We also show that the proposed interpretation reveals the scale invariance of the Fitness-Complexity algorithm, which has practical implications for the algorithm's implementation in different datasets. Further, analysis of empirical trade data under the new perspective reveals three categories of countries that might benefit from different development strategies.
△ Less
Submitted 20 March, 2024; v1 submitted 23 December, 2022;
originally announced December 2022.
-
Locating the eigenshield of a network via perturbation theory
Authors:
Ming-Yang Zhou,
Manuel Sebastian Mariani,
Hao Liao,
Rui Mao,
Yi-Cheng Zhang
Abstract:
The functions of complex networks are usually determined by a small set of vital nodes. Finding the best set of vital nodes (eigenshield nodes) is critical to the network's robustness against rumor spreading and cascading failures, which makes it one of the fundamental problems in network science. The problem is challenging as it requires to maximize the influence of nodes in the set while simulta…
▽ More
The functions of complex networks are usually determined by a small set of vital nodes. Finding the best set of vital nodes (eigenshield nodes) is critical to the network's robustness against rumor spreading and cascading failures, which makes it one of the fundamental problems in network science. The problem is challenging as it requires to maximize the influence of nodes in the set while simultaneously minimizing the redundancies between the set's nodes. However, the redundancy mechanism is rarely investigated by previous studies. Here we introduce the matrix perturbation framework to find a small ``eigenshield" set of nodes that, when removed, lead to the largest drop in the network's spectral radius. We show that finding the ``eigenshield" nodes can be translated into the optimization of an objective function that simultaneously accounts for the individual influence of each node and redundancy between different nodes.
We analytically quantify the influence redundancy that explains why an important node might play an insignificant role in the ``eigenshield" node set. Extensive experiments under diverse influence maximization problems, ranging from network dismantling to spreading maximization, demonstrate that the eigenshield detection tends to significantly outperforms state-of-the-art methods across most problems. Our findings shed light on the mechanisms that may lie at the core of the function of vital nodes in complex network.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Forecasting countries' gross domestic product from patent data
Authors:
Yucheng Ye,
Shuqi Xu,
Manuel Sebastian Mariani,
Linyuan Lü
Abstract:
Recent strides in economic complexity have shown that the future economic development of nations can be predicted with a single "economic fitness" variable, which captures countries' competitiveness in international trade. The predictions by this low-dimensional approach could match or even outperform predictions based on much more sophisticated methods, such as those by the International Monetary…
▽ More
Recent strides in economic complexity have shown that the future economic development of nations can be predicted with a single "economic fitness" variable, which captures countries' competitiveness in international trade. The predictions by this low-dimensional approach could match or even outperform predictions based on much more sophisticated methods, such as those by the International Monetary Fund (IMF). However, all prior works in economic complexity aimed to quantify countries' fitness from World Trade export data, without considering the possibility to infer countries' potential for growth from alternative sources of data. Here, motivated by the long-standing relationship between technological development and economic growth, we aim to forecast countries' growth from patent data. Specifically, we construct a citation network between countries from the European Patent Office (EPO) dataset. Initial results suggest that the H-index centrality in this network is a potential candidate to gauge national economic performance. To validate this conjecture, we construct a two-dimensional plane defined by the H-index and GDP per capita, and use a forecasting method based on dynamical systems to test the predicting accuracy of the H-index. We find that the predictions based on the H-index-GDP plane outperform the predictions by IMF by approximately 35%, and they marginally outperform those by the economic fitness extracted from trade data. Our results could inspire further attempts to identify predictors of national growth from different sources of data related to scientific and technological innovation.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
The different structure of economic ecosystems at the scales of companies and countries
Authors:
Dario Laudati,
Manuel S. Mariani,
Luciano Pietronero,
Andrea Zaccaria
Abstract:
A key element to understand complex systems is the relationship between the spatial scale of investigation and the structure of the interrelation among its elements. When it comes to economic systems, it is now well-known that the country-product bipartite network exhibits a nested structure, which is the foundation of different algorithms that have been used to scientifically investigate countrie…
▽ More
A key element to understand complex systems is the relationship between the spatial scale of investigation and the structure of the interrelation among its elements. When it comes to economic systems, it is now well-known that the country-product bipartite network exhibits a nested structure, which is the foundation of different algorithms that have been used to scientifically investigate countries' development and forecast national economic growth. Changing the subject from countries to companies, a significantly different scenario emerges. Through the analysis of a unique dataset of Italian firms' exports and a worldwide dataset comprising countries' exports, here we find that, while a globally nested structure is observed at the country level, a local, in-block nested structure emerges at the level of firms. Remarkably, this in-block nestedness is statistically significant with respect to suitable null models and the algorithmic partitions of products into blocks have a high correspondence with exogenous product classifications. These findings lay a solid foundation for develo** a scientific approach based on the physics of complex systems to the analysis of companies, which has been lacking until now.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Beyond network centrality: Individual-level behavioral traits for predicting information superspreaders in social media
Authors:
Fang Zhou,
Linyuan Lü,
Jianguo Liu,
Manuel Sebastian Mariani
Abstract:
Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected ``hub" individuals. However, we demonstrate here that information superspreaders in online social…
▽ More
Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected ``hub" individuals. However, we demonstrate here that information superspreaders in online social media are best understood and predicted by simultaneously considering two individual-level behavioral traits: influence and susceptibility. Specifically, we derive a nonlinear network-based algorithm to quantify individuals' influence and susceptibility from multiple spreading event data. By applying the algorithm to large-scale data from Twitter and Weibo, we demonstrate that individuals' estimated influence and susceptibility scores enable predictions of future superspreaders above and beyond network centrality, and reveal new insights on the network position of the superspreaders.
△ Less
Submitted 17 March, 2024; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Citations or dollars? Early signals of a firm's research success
Authors:
Shuqi Xu,
Manuel S. Mariani,
Linyuan Lü,
Lorenzo Napolitano,
Emanuele Pugliese,
Andrea Zaccaria
Abstract:
Scientific and technological progress is largely driven by firms in many domains, including artificial intelligence and vaccine development. However, we do not know yet whether the success of firms' research activities exhibits dynamic regularities and some degree of predictability. By inspecting the research lifecycles of 7,440 firms, we find that the economic value of a firm's early patents is a…
▽ More
Scientific and technological progress is largely driven by firms in many domains, including artificial intelligence and vaccine development. However, we do not know yet whether the success of firms' research activities exhibits dynamic regularities and some degree of predictability. By inspecting the research lifecycles of 7,440 firms, we find that the economic value of a firm's early patents is an accurate predictor of various dimensions of a firm's future research success. At the same time, a smaller set of future top-performers do not generate early patents of high economic value, but they are detectable via the technological value of their early patents. Importantly, the observed predictability cannot be explained by a cumulative advantage mechanism, and the observed heterogeneity of the firms' temporal success patterns markedly differs from patterns previously observed for individuals' research careers. Our results uncover the dynamical regularities of the research success of firms, and they could inform managerial strategies as well as policies to promote entrepreneurship and accelerate human progress.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Detecting new edge types in a temporal network model
Authors:
Wenjie Jia,
Manuel S. Mariani,
Linyuan Lü,
Tao Jiang
Abstract:
Networks representing complex systems in nature and society usually involve multiple interaction types. These types suggest essential information on the interactions between components, but not all of the existing types are usually discovered. Therefore, detecting the undiscovered edge types is crucial for deepening our understanding of the network structure. Although previous studies have discuss…
▽ More
Networks representing complex systems in nature and society usually involve multiple interaction types. These types suggest essential information on the interactions between components, but not all of the existing types are usually discovered. Therefore, detecting the undiscovered edge types is crucial for deepening our understanding of the network structure. Although previous studies have discussed the edge label detection problem, we still lack effective methods for uncovering previously-undetected edge types. Here, we develop an effective technique to detect undiscovered new edge types in networks by leveraging a novel temporal network model. Both analytical and numerical results show that the prediction accuracy of our method is perfect when the model networks' time parameter approaches infinity. Furthermore, we find that when time is finite, our method is still significantly more accurate than the baseline.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
The fragility of opinion formation in a complex world
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust…
▽ More
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust. We show numerically and analytically that the observer's resulting opinions are highly inconsistent (they tend to be independent of the observer's initial opinions) and unstable (they exhibit wide stochastic variations). Opinion inconsistency and instability increase with the world complexity represented by the number of subjects, which can be prevented by suitably expanding the observer's initial amount of information. Our findings imply that even an individual who initially trusts credible information sources may end up trusting the deceptive ones if at least a small number of trust relations exist between the credible and deceptive sources.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Network-based ranking in social systems: three challenges
Authors:
Manuel S. Mariani,
Linyuan Lü
Abstract:
Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the…
▽ More
Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the ubiquitous and successful applications of these algorithms, we argue that our understanding of their performance and their applications to real-world problems face three fundamental challenges: (i) Rankings might be biased by various factors; (2) their effectiveness might be limited to specific problems; and (3) agents' decisions driven by rankings might result in potentially vicious feedback mechanisms and unhealthy systemic consequences. Methods rooted in network science and agent-based modeling can help us to understand and overcome these challenges.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Absence of a resolution limit in in-block nestedness
Authors:
Manuel S. Mariani,
María J. Palazzi,
Albert Solé-Ribalta,
Javier Borge-Holthoefer,
Claudio J. Tessone
Abstract:
Originally a speculative pattern in ecological networks, the hybrid or compound nested-modular pattern has been confirmed, during the last decade, as a relevant structural arrangement that emerges in a variety of contexts --in ecological mutualistic system and beyond. This implies shifting the focus from the measurement of nestedness as a global property (macro level), to the detection of blocks (…
▽ More
Originally a speculative pattern in ecological networks, the hybrid or compound nested-modular pattern has been confirmed, during the last decade, as a relevant structural arrangement that emerges in a variety of contexts --in ecological mutualistic system and beyond. This implies shifting the focus from the measurement of nestedness as a global property (macro level), to the detection of blocks (meso level) that internally exhibit a high degree of nestedness. Unfortunately, the availability and understanding of the methods to properly detect in-block nested partitions lie behind the empirical findings: while a precise quality function of in-block nestedness has been proposed, we lack an understanding of its possible inherent constraints. Specifically, while it is well known that Newman-Girvan's modularity, and related quality functions, notoriously suffer from a resolution limit that impairs their ability to detect small blocks, the potential existence of resolution limits for in-block nestedness is unexplored. Here, we provide empirical, numerical and analytical evidence that the in-block nestedness function lacks a resolution limit, and thus our capacity to detect correct partitions in networks via its maximization depends solely on the accuracy of the optimization algorithms.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Simple regularities in the dynamics of online news impact
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity…
▽ More
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity patterns found in many other social and information systems. In particular, we find that a simple exponential distribution yields a better fit to the empirical news impact distributions than a power-law distribution. This observation is explained by the lack or limited influence of the otherwise omnipresent rich-get-richer mechanism in the analyzed data. The temporal dynamics of the news impact exhibits a universal exponential decay which allows us to collapse individual news trajectories into an elementary single curve. We also show how daily variations of user activity directly influence the dynamics of the article impact. Our findings challenge the universal applicability of popularity dynamics patterns found in other social contexts.
△ Less
Submitted 22 January, 2021; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
Authors:
Shuqi Xu,
Manuel Sebastian Mariani,
Linyuan Lü,
Matúš Medo
Abstract:
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metr…
▽ More
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
The wisdom of the few: Predicting collective success from individual behavior
Authors:
Manuel S. Mariani,
Yanina Gimenez,
Jorge Brea,
Martin Minnoni,
René Algesheimer,
Claudio J. Tessone
Abstract:
Can we predict top-performing products, services, or businesses by only monitoring the behavior of a small set of individuals? Although most previous studies focused on the predictive power of "hub" individuals with many social contacts, which sources of customer behavioral data are needed to address this question remains unclear, mostly due to the scarcity of available datasets that simultaneousl…
▽ More
Can we predict top-performing products, services, or businesses by only monitoring the behavior of a small set of individuals? Although most previous studies focused on the predictive power of "hub" individuals with many social contacts, which sources of customer behavioral data are needed to address this question remains unclear, mostly due to the scarcity of available datasets that simultaneously capture individuals' purchasing patterns and social interactions. Here, we address this question in a unique, large-scale dataset that combines individuals' credit-card purchasing history with their social and mobility traits across an entire nation. Surprisingly, we find that the purchasing history alone enables the detection of small sets of ``discoverers" whose early purchases offer reliable success predictions for the brick-and-mortar stores they visit. In contrast with the assumptions by most existing studies on word-of-mouth processes, the hubs selected by social network centrality are not consistently predictive of success. Our findings show that companies and organizations with access to large-scale purchasing data can detect the discoverers and leverage their behavior to anticipate market trends, without the need for social network data.
△ Less
Submitted 9 June, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Recommending investors for new startups by integrating network diffusion and investors' domain preference
Authors:
Shuqi Xu,
Qianming Zhang,
Linyuan Lv,
Manuel Sebastian Mariani
Abstract:
Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatical…
▽ More
Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatically match startups with potentially relevant investors would be highly desirable. Here, we analyze 34,469 valid investment events collected from www.itjuzi.com and consider the cold-start problem of recommending investors for new startups. We address this problem by constructing different tripartite network representations of the data where nodes represent investors, companies, and companies' domains. First, we find that investors have strong domain preferences when investing, which motivates us to introduce virtual links between investors and investment domains in the tripartite network construction. Our analysis of the recommendation performance of diffusion-based algorithms applied to various network representations indicates that prospective investors for new startups are effectively revealed by integrating network diffusion processes with investors' domain preference.
△ Less
Submitted 16 January, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Nestedness in complex networks: Observation, emergence, and implications
Authors:
Manuel Sebastian Mariani,
Zhuo-Ming Ren,
Jordi Bascompte,
Claudio Juan Tessone
Abstract:
The observed architecture of ecological and socio-economic networks differs significantly from that of random networks. From a network science standpoint, non-random structural patterns observed in real networks call for an explanation of their emergence and an understanding of their potential systemic consequences. This article focuses on one of these patterns: nestedness. Given a network of inte…
▽ More
The observed architecture of ecological and socio-economic networks differs significantly from that of random networks. From a network science standpoint, non-random structural patterns observed in real networks call for an explanation of their emergence and an understanding of their potential systemic consequences. This article focuses on one of these patterns: nestedness. Given a network of interacting nodes, nestedness can be described as the tendency for nodes to interact with subsets of the interaction partners of better-connected nodes. Known since more than $80$ years in biogeography, nestedness has been found in systems as diverse as ecological mutualistic organizations, world trade, inter-organizational relations, among many others. This review article focuses on three main pillars: the existing methodologies to observe nestedness in networks; the main theoretical mechanisms conceived to explain the emergence of nestedness in ecological and socio-economic networks; the implications of a nested topology of interactions for the stability and feasibility of a given interacting system. We survey results from variegated disciplines, including statistical physics, graph theory, ecology, and theoretical economics. Nestedness was found to emerge both in bipartite networks and, more recently, in unipartite ones; this review is the first comprehensive attempt to unify both streams of studies, usually disconnected from each other. We believe that the truly interdisciplinary endeavour -- while rooted in a complex systems perspective -- may inspire new models and algorithms whose realm of application will undoubtedly transcend disciplinary boundaries.
△ Less
Submitted 18 May, 2019;
originally announced May 2019.
-
Temporal similarity metrics for latent network reconstruction: The role of time-lag decay
Authors:
Hao Liao,
Ming-Kai Liu,
Manuel Sebastian Mariani,
Mingyang Zhou,
Xingtong Wu
Abstract:
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similar…
▽ More
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similarity metrics commonly used in the link prediction literature, we introduce new node-node temporal similarity metrics. The new metrics take as input the time-series of multiple independent spreading processes, based on the hypothesis that two nodes are more likely to be connected if they were often infected at similar points in time. This hypothesis is implemented by introducing a time-lag function which penalizes distant infection times. We find that the choice of this time-lag strongly affects the metrics' reconstruction accuracy, depending on the network's clustering coefficient and we provide an extensive comparative analysis of static and temporal similarity metrics for network reconstruction. Our findings shed new light on the notion of similarity between pairs of nodes in complex networks.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Fast influencers in complex networks
Authors:
Fang Zhou,
Linyuan Lü,
Manuel Sebastian Mariani
Abstract:
Influential nodes in complex networks are typically defined as those nodes that maximize the asymptotic reach of a spreading process of interest. However, for practical applications such as viral marketing and online information spreading, one is often interested in maximizing the reach of the process in a short amount of time. The traditional definition of influencers in network-related studies f…
▽ More
Influential nodes in complex networks are typically defined as those nodes that maximize the asymptotic reach of a spreading process of interest. However, for practical applications such as viral marketing and online information spreading, one is often interested in maximizing the reach of the process in a short amount of time. The traditional definition of influencers in network-related studies from diverse research fields narrows down the focus to the late-time state of the spreading processes, leaving the following question unsolved: which nodes are able to initiate large-scale spreading processes, in a limited amount of time? Here, we find that there is a fundamental difference between the nodes -- which we call "fast influencers" -- that initiate the largest-reach processes in a short amount of time, and the traditional, "late-time" influencers. Stimulated by this observation, we provide an extensive benchmarking of centrality metrics with respect to their ability to identify both the fast and late-time influencers. We find that local network properties can be used to uncover the fast influencers. In particular, a parsimonious, local centrality metric (which we call social capital) achieves optimal or nearly-optimal performance in the fast influencer identification for all the analyzed empirical networks. Local metrics tend to be also competitive in the traditional, late-time influencer identification task.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Optimal timescale for community detection in growing networks
Authors:
Matus Medo,
An Zeng,
Yi-Cheng Zhang,
Manuel S. Mariani
Abstract:
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information…
▽ More
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the system's dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a network's nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the system's intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.
△ Less
Submitted 1 August, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
The long-term impact of ranking algorithms in growing networks
Authors:
Shilun Zhang,
Matúš Medo,
Linyuan Lü,
Manuel Sebastian Mariani
Abstract:
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this…
▽ More
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this work, we fill this gap by introducing a new model of network growth that allows us to compare the properties of the networks generated under the influence of different ranking algorithms. We show that by correcting for the omnipresent age bias of popularity-based ranking algorithms, the resulting networks exhibit a significantly larger agreement between the nodes' inherent quality and their long-term popularity, and a less concentrated popularity distribution. To further promote popularity diversity, we introduce and validate a perturbation of the original rankings where a small number of randomly-selected nodes are promoted to the top of the ranking. Our findings move the first steps toward a model-based understanding of the long-term impact of popularity-based ranking algorithms, and could be used as an informative tool for the design of improved information filtering tools.
△ Less
Submitted 19 November, 2018; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Influencers identification in complex networks through reaction-diffusion dynamics
Authors:
Flavio Iannelli,
Manuel Sebastian Mariani,
Igor M. Sokolov
Abstract:
A pivotal idea in network science, marketing research and innovation diffusion theories is that a small group of nodes -- called influencers -- have the largest impact on social contagion and epidemic processes in networks. Despite the long-standing interest in the influencers identification problem in socio-economic and biological networks, there is not yet agreement on which is the best identifi…
▽ More
A pivotal idea in network science, marketing research and innovation diffusion theories is that a small group of nodes -- called influencers -- have the largest impact on social contagion and epidemic processes in networks. Despite the long-standing interest in the influencers identification problem in socio-economic and biological networks, there is not yet agreement on which is the best identification strategy. State-of-the-art strategies are typically based either on heuristic centrality metrics or on analytic arguments that only hold for specific network topologies or peculiar dynamical regimes. Here, we leverage the recently introduced random-walk effective distance -- a topological metric that estimates almost perfectly the arrival time of diffusive spreading processes on networks -- to introduce a new centrality metric which quantifies how close a node is to the other nodes. We show that the new centrality metric significantly outperforms state-of-the-art metrics in detecting the influencers for global contagion processes. Our findings reveal the essential role of the network effective distance for the influencers identification and lead us closer to the optimal solution of the problem.
△ Less
Submitted 14 November, 2018; v1 submitted 3 March, 2018;
originally announced March 2018.
-
Revealing In-Block Nestedness: detection and benchmarking
Authors:
Albert Solé-Ribalta,
Claudio J. Tessone,
Manuel S. Mariani,
Javier Borge-Holthoefer
Abstract:
As new instances of nested organization --beyond ecological networks-- are discovered, scholars are debating around the co-existence of two apparently incompatible macroscale architectures: nestedness and modularity. The discussion is far from being solved, mainly for two reasons. First, nestedness and modularity appear to emerge from two contradictory dynamics, cooperation and competition. Second…
▽ More
As new instances of nested organization --beyond ecological networks-- are discovered, scholars are debating around the co-existence of two apparently incompatible macroscale architectures: nestedness and modularity. The discussion is far from being solved, mainly for two reasons. First, nestedness and modularity appear to emerge from two contradictory dynamics, cooperation and competition. Second, existing methods to assess the presence of nestedness and modularity are flawed when it comes to the evaluation of concurrently nested and modular structures. In this work, we tackle the latter problem, presenting the concept of \textit{in-block nestedness}, a structural property determining to what extent a network is composed of blocks whose internal connectivity exhibits nestedness. We then put forward a set of optimization methods that allow us to identify such organization successfully, both in synthetic and in a large number of real networks. These findings challenge our understanding of the topology of ecological and social systems, calling for new models to explain how such patterns emerge.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Early identification of important patents through network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
François Lafond
Abstract:
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effect…
▽ More
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effectively uncover these patents shortly after they are issued, we need to go beyond raw citation counts and take into account both the citation network topology and temporal information. In particular, an age-normalized measure of patent centrality, called rescaled PageRank, allows us to identify the significant patents earlier than citation count and PageRank score. In addition, we find that while high-impact patents tend to rely on other high-impact patents in a similar way as scientific papers, the patents' citation dynamics is significantly slower than that of papers, which makes the early identification of significant patents more challenging than that of significant papers.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Ranking in evolving complex networks
Authors:
Hao Liao,
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang,
Ming-Yang Zhou
Abstract:
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allo…
▽ More
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Well-established ranking algorithms (such as the popular Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. The recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of real network traffic, prediction of future links, and identification of highly-significant nodes.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
Quantifying and suppressing ranking bias in a large citation network
Authors:
Giacomo Vaccario,
Matus Medo,
Nicolas Wider,
Manuel Sebastian Mariani
Abstract:
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relati…
▽ More
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. We propose a general normalization procedure motivated by the $z$-score which produces much less biased rankings when applied to citation count and PageRank score.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Randomizing growing networks with a time-respecting null model
Authors:
Zhuo-Ming Ren,
Manuel Sebastian Mariani,
Yi-Cheng Zhang,
Matus Medo
Abstract:
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the…
▽ More
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the time evolution of individual nodes' degree values. By preserving the temporal linking patterns of the analyzed system, the proposed model is able to factor out the effect of the system's temporal patterns on its structure. We apply the model to the citation network of Physical Review scholarly papers and the citation network of US movies. The model reveals that the two datasets are strikingly different with respect to their degree-degree correlations, and we discuss the important implications of this finding on the information provided by paradigmatic node centrality metrics such as indegree and Google's PageRank. The randomization methodology proposed here can be used to assess the significance of any structural property in growing networks, which could bring new insights into the problems where null models play a critical role, such as the detection of communities and network motifs.
△ Less
Submitted 16 November, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Identification of milestone papers through time-balanced network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze…
▽ More
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the $449,935$ papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, is the best-performing metric overall in identifying the Milestone Letters. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks. An interactive Web platform where it is possible to view the ranking of the APS papers by rescaled PageRank is available at the address \url{http://www.sciencenow.info}.
△ Less
Submitted 8 November, 2016; v1 submitted 30 August, 2016;
originally announced August 2016.
-
The mathematics of non-linear metrics for nested networks
Authors:
Rui-Jie Wu,
Gui-Yuan Shi,
Yi-Cheng Zhang,
Manuel Sebastian Mariani
Abstract:
Numerical analysis of data from international trade and ecological networks has shown that the non-linear fitness-complexity metric is the best candidate to rank nodes by importance in bipartite networks that exhibit a nested structure. Despite its relevance for real networks, the mathematical properties of the metric and its variants remain largely unexplored. Here, we perform an analytic and num…
▽ More
Numerical analysis of data from international trade and ecological networks has shown that the non-linear fitness-complexity metric is the best candidate to rank nodes by importance in bipartite networks that exhibit a nested structure. Despite its relevance for real networks, the mathematical properties of the metric and its variants remain largely unexplored. Here, we perform an analytic and numeric study of the fitness-complexity metric and a new variant, called minimal extremal metric. We rigorously derive exact expressions for node scores for perfectly nested networks and show that these expressions explain the non-trivial convergence properties of the metrics. A comparison between the fitness-complexity metric and the minimal extremal metric on real data reveals that the latter can produce improved rankings if the input data are reliable.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.
-
Measuring economic complexity of countries and products: which metric to use?
Authors:
Manuel Sebastian Mariani,
Alexandre Vidmer,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Evaluating the economies of countries and their relations with products in the global market is a central problem in economics, with far-reaching implications to our theoretical understanding of the international trade as well as to practical applications, such as policy making and financial investment planning. The recent Economic Complexity approach aims to quantify the competitiveness of countr…
▽ More
Evaluating the economies of countries and their relations with products in the global market is a central problem in economics, with far-reaching implications to our theoretical understanding of the international trade as well as to practical applications, such as policy making and financial investment planning. The recent Economic Complexity approach aims to quantify the competitiveness of countries and the quality of the exported products based on the empirical observation that the most competitive countries have diversified exports, whereas develo** countries only export few low quality products -- typically those exported by many other countries. Two different metrics, Fitness-Complexity and the Method of Reflections, have been proposed to measure country and product score in the Economic Complexity framework. We use international trade data and a recent ranking evaluation measure to quantitatively compare the ability of the two metrics to rank countries and products according to their importance in the network. The results show that the Fitness-Complexity metric outperforms the Method of Reflections in both the ranking of products and the ranking of countries. We also investigate a Generalization of the Fitness-Complexity metric and show that it can produce improved rankings provided that the input data are reliable.
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Identification and modeling of discoverers in online social systems
Authors:
Matus Medo,
Manuel S. Mariani,
An Zeng,
Yi-Cheng Zhang
Abstract:
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social s…
▽ More
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social systems, a small fraction of them is consistently successful in discovering popular items long before they actually become popular. We argue that these users, whom we refer to as discoverers, are fundamentally different from the previously known opinion leaders, influentials, and innovators. We use the proposed framework to demonstrate that discoverers are present in a wide range of systems. Once identified, they can be used to predict the future success of items. We propose a network model which reproduces the discovery patterns observed in the real data. Furthermore, data produced by the model pose a fundamental challenge to classical ranking algorithms which neglect the time of link creation and thus fail to discriminate between discoverers and ordinary users in the data. Our results open the door to qualitative and quantitative study of fine temporal patterns in social systems and have far-reaching implications for network modeling and algorithm design.
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Ranking nodes in growing networks: When PageRank fails
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's perf…
▽ More
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.
-
Calorimetric glass transition in a mean field theory approach
Authors:
Manuel Sebastian Mariani,
Giorgio Parisi,
Corrado Rainone
Abstract:
The study of the properties of glass-forming liquids is difficult for many reasons. Analytic solutions of mean field models are usually available only for systems embedded in a space with an unphysically high number of spatial dimensions; on the experimental and numerical side, the study of the properties of metastable glassy states requires to thermalize the system in the supercooled liquid phase…
▽ More
The study of the properties of glass-forming liquids is difficult for many reasons. Analytic solutions of mean field models are usually available only for systems embedded in a space with an unphysically high number of spatial dimensions; on the experimental and numerical side, the study of the properties of metastable glassy states requires to thermalize the system in the supercooled liquid phase, where the thermalization time may be extremely large. We consider here an hard-sphere mean field model which is solvable in any number of spatial dimensions; moreover we easily obtain thermalized configurations even in the glass phase. We study the three dimensional version of this model and we perform Monte Carlo simulations which mimic heating and cooling experiments performed on ultra-stable glasses. The numerical findings are in good agreement with the analytical results and qualitatively capture the features of ultra-stable glasses observed in experiments.
△ Less
Submitted 4 November, 2014;
originally announced November 2014.