-
Enhancing Bayesian model updating in structural health monitoring via learnable map**s
Authors:
Matteo Torzoni,
Andrea Manzoni,
Stefano Mariani
Abstract:
In the context of structural health monitoring (SHM), the selection and extraction of damage-sensitive features from raw sensor recordings represent a critical step towards solving the inverse problem underlying the structural health identification. This work introduces a new way to enhance stochastic approaches to SHM through the use of deep neural networks. A learnable feature extractor and a fe…
▽ More
In the context of structural health monitoring (SHM), the selection and extraction of damage-sensitive features from raw sensor recordings represent a critical step towards solving the inverse problem underlying the structural health identification. This work introduces a new way to enhance stochastic approaches to SHM through the use of deep neural networks. A learnable feature extractor and a feature-oriented surrogate model are synergistically exploited to evaluate a likelihood function within a Markov chain Monte Carlo sampling algorithm. The feature extractor undergoes a supervised pairwise training to map sensor recordings onto a low-dimensional metric space, which encapsulates the sensitivity to structural health parameters. The surrogate model maps the structural health parameters onto their feature description. The procedure enables the updating of beliefs about structural health parameters, effectively replacing the need for a computationally expensive numerical (finite element) model. A preliminary offline phase involves the generation of a labeled dataset to train both the feature extractor and the surrogate model. Within a simulation-based SHM framework, training vibration responses are cost-effectively generated by means of a multi-fidelity surrogate modeling strategy to approximate sensor recordings under varying damage and operational conditions. The multi-fidelity surrogate exploits model order reduction and artificial neural networks to speed up the data generation phase while ensuring the damage-sensitivity of the approximated signals. The proposed strategy is assessed through three synthetic case studies, demonstrating remarkable results in terms of accuracy of the estimated quantities and computational efficiency.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Integrating behavioral experimental findings into dynamical models to inform social change interventions
Authors:
Radu Tanase,
René Algesheimer,
Manuel S. Mariani
Abstract:
Addressing global challenges -- from public health to climate change -- often involves stimulating the large-scale adoption of new products or behaviors. Research traditions that focus on individual decision making suggest that achieving this objective requires better identifying the drivers of individual adoption choices. On the other hand, computational approaches rooted in complexity science fo…
▽ More
Addressing global challenges -- from public health to climate change -- often involves stimulating the large-scale adoption of new products or behaviors. Research traditions that focus on individual decision making suggest that achieving this objective requires better identifying the drivers of individual adoption choices. On the other hand, computational approaches rooted in complexity science focus on maximizing the propagation of a given product or behavior throughout social networks of interconnected adopters. The integration of these two perspectives -- although advocated by several research communities -- has remained elusive so far. Here we show how achieving this integration could inform seeding policies to facilitate the large-scale adoption of a given behavior or product. Drawing on complex contagion and discrete choice theories, we propose a method to estimate individual-level thresholds to adoption, and validate its predictive power in two choice experiments. By integrating the estimated thresholds into computational simulations, we show that state-of-the-art seeding methods for social influence maximization might be suboptimal if they neglect individual-level behavioral drivers, which can be corrected through the proposed experimental method.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
A digital twin framework for civil engineering structures
Authors:
Matteo Torzoni,
Marco Tezzele,
Stefano Mariani,
Andrea Manzoni,
Karen E. Willcox
Abstract:
The digital twin concept represents an appealing opportunity to advance condition-based and predictive maintenance paradigms for civil engineering systems, thus allowing reduced lifecycle costs, increased system safety, and increased system availability. This work proposes a predictive digital twin approach to the health monitoring, maintenance, and management planning of civil engineering structu…
▽ More
The digital twin concept represents an appealing opportunity to advance condition-based and predictive maintenance paradigms for civil engineering systems, thus allowing reduced lifecycle costs, increased system safety, and increased system availability. This work proposes a predictive digital twin approach to the health monitoring, maintenance, and management planning of civil engineering structures. The asset-twin coupled dynamical system is encoded employing a probabilistic graphical model, which allows all relevant sources of uncertainty to be taken into account. In particular, the time-repeating observations-to-decisions flow is modeled using a dynamic Bayesian network. Real-time structural health diagnostics are provided by assimilating sensed data with deep learning models. The digital twin state is continually updated in a sequential Bayesian inference fashion. This is then exploited to inform the optimal planning of maintenance and management actions within a dynamic decision-making framework. A preliminary offline phase involves the population of training datasets through a reduced-order numerical model and the computation of a health-dependent control policy. The strategy is assessed on two synthetic case studies, involving a cantilever beam and a railway bridge, demonstrating the dynamic decision-making capabilities of health-aware digital twins.
△ Less
Submitted 31 October, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Columbus: Android App Testing Through Systematic Callback Exploration
Authors:
Priyanka Bose,
Dipanjan Das,
Saastha Vasan,
Sebastiano Mariani,
Ilya Grishchenko,
Andrea Continella,
Antonio Bianchi,
Christopher Kruegel,
Giovanni Vigna
Abstract:
With the continuous rise in the popularity of Android mobile devices, automated testing of apps has become more important than ever. Android apps are event-driven programs. Unfortunately, generating all possible types of events by interacting with the app's interface is challenging for an automated testing approach. Callback-driven testing eliminates the need for event generation by directly invok…
▽ More
With the continuous rise in the popularity of Android mobile devices, automated testing of apps has become more important than ever. Android apps are event-driven programs. Unfortunately, generating all possible types of events by interacting with the app's interface is challenging for an automated testing approach. Callback-driven testing eliminates the need for event generation by directly invoking app callbacks. However, existing callback-driven testing techniques assume prior knowledge of Android callbacks, and they rely on a human expert, who is familiar with the Android API, to write stub code that prepares callback arguments before invocation. Since the Android API is huge and keeps evolving, prior techniques could only support a small fraction of callbacks present in the Android framework.
In this work, we introduce Columbus, a callback-driven testing technique that employs two strategies to eliminate the need for human involvement: (i) it automatically identifies callbacks by simultaneously analyzing both the Android framework and the app under test, and (ii) it uses a combination of under-constrained symbolic execution (primitive arguments), and type-guided dynamic heap introspection (object arguments) to generate valid and effective inputs. Lastly, Columbus integrates two novel feedback mechanisms -- data dependency and crash-guidance, during testing to increase the likelihood of triggering crashes, and maximizing coverage. In our evaluation, Columbus outperforms state-of-the-art model-driven, checkpoint-based, and callback-driven testing tools both in terms of crashes and coverage.
△ Less
Submitted 17 February, 2023;
originally announced February 2023.
-
Space-Fluid Adaptive Sampling by Self-Organisation
Authors:
Roberto Casadei,
Stefano Mariani,
Danilo Pianini,
Mirko Viroli,
Franco Zambonelli
Abstract:
A recurrent task in coordinated systems is managing (estimating, predicting, or controlling) signals that vary in space, such as distributed sensed data or computation outcomes. Especially in large-scale settings, the problem can be addressed through decentralised and situated computing systems: nodes can locally sense, process, and act upon signals, and coordinate with neighbours to implement col…
▽ More
A recurrent task in coordinated systems is managing (estimating, predicting, or controlling) signals that vary in space, such as distributed sensed data or computation outcomes. Especially in large-scale settings, the problem can be addressed through decentralised and situated computing systems: nodes can locally sense, process, and act upon signals, and coordinate with neighbours to implement collective strategies. Accordingly, in this work we devise distributed coordination strategies for the estimation of a spatial phenomenon through collaborative adaptive sampling. Our design is based on the idea of dynamically partitioning space into regions that compete and grow/shrink to provide accurate aggregate sampling. Such regions hence define a sort of virtualised space that is "fluid", since its structure adapts in response to pressure forces exerted by the underlying phenomenon. We provide an adaptive sampling algorithm in the field-based coordination framework, and prove it is self-stabilising and locally optimal. Finally, we verify by simulation that the proposed algorithm effectively carries out a spatially adaptive sampling while maintaining a tuneable trade-off between accuracy and efficiency.
△ Less
Submitted 15 December, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
About Digital Twins, agents, and multiagent systems: a cross-fertilisation journey
Authors:
Stefano Mariani,
Marco Picone,
Alessandro Ricci
Abstract:
Digital Twins (DTs) are rapidly emerging as a fundamental brick of engineering cyber-physical systems, but their notion is still mostly bound to specific business domains (e.g. manufacturing), goals (e.g. product design), or application domains (e.g. the Internet of Things). As such, their value as general purpose engineering abstractions is yet to be fully revealed. In this paper, we relate DTs w…
▽ More
Digital Twins (DTs) are rapidly emerging as a fundamental brick of engineering cyber-physical systems, but their notion is still mostly bound to specific business domains (e.g. manufacturing), goals (e.g. product design), or application domains (e.g. the Internet of Things). As such, their value as general purpose engineering abstractions is yet to be fully revealed. In this paper, we relate DTs with agents and multiagent systems, as the latter are arguably the most rich abstractions available for the engineering of complex socio-technical and cyber-physical systems, and the former could both fill in some gaps in agent-oriented engineering and benefit from an agent-oriented interpretation -- in a cross-fertilisation journey.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Forecasting countries' gross domestic product from patent data
Authors:
Yucheng Ye,
Shuqi Xu,
Manuel Sebastian Mariani,
Linyuan Lü
Abstract:
Recent strides in economic complexity have shown that the future economic development of nations can be predicted with a single "economic fitness" variable, which captures countries' competitiveness in international trade. The predictions by this low-dimensional approach could match or even outperform predictions based on much more sophisticated methods, such as those by the International Monetary…
▽ More
Recent strides in economic complexity have shown that the future economic development of nations can be predicted with a single "economic fitness" variable, which captures countries' competitiveness in international trade. The predictions by this low-dimensional approach could match or even outperform predictions based on much more sophisticated methods, such as those by the International Monetary Fund (IMF). However, all prior works in economic complexity aimed to quantify countries' fitness from World Trade export data, without considering the possibility to infer countries' potential for growth from alternative sources of data. Here, motivated by the long-standing relationship between technological development and economic growth, we aim to forecast countries' growth from patent data. Specifically, we construct a citation network between countries from the European Patent Office (EPO) dataset. Initial results suggest that the H-index centrality in this network is a potential candidate to gauge national economic performance. To validate this conjecture, we construct a two-dimensional plane defined by the H-index and GDP per capita, and use a forecasting method based on dynamical systems to test the predicting accuracy of the H-index. We find that the predictions based on the H-index-GDP plane outperform the predictions by IMF by approximately 35%, and they marginally outperform those by the economic fitness extracted from trade data. Our results could inspire further attempts to identify predictors of national growth from different sources of data related to scientific and technological innovation.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Beyond network centrality: Individual-level behavioral traits for predicting information superspreaders in social media
Authors:
Fang Zhou,
Linyuan Lü,
Jianguo Liu,
Manuel Sebastian Mariani
Abstract:
Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected ``hub" individuals. However, we demonstrate here that information superspreaders in online social…
▽ More
Understanding the heterogeneous role of individuals in large-scale information spreading is essential to manage online behavior as well as its potential offline consequences. To this end, most existing studies from diverse research domains focus on the disproportionate role played by highly-connected ``hub" individuals. However, we demonstrate here that information superspreaders in online social media are best understood and predicted by simultaneously considering two individual-level behavioral traits: influence and susceptibility. Specifically, we derive a nonlinear network-based algorithm to quantify individuals' influence and susceptibility from multiple spreading event data. By applying the algorithm to large-scale data from Twitter and Weibo, we demonstrate that individuals' estimated influence and susceptibility scores enable predictions of future superspreaders above and beyond network centrality, and reveal new insights on the network position of the superspreaders.
△ Less
Submitted 17 March, 2024; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Individual and Collective Autonomous Development
Authors:
Marco Lippi,
Stefano Mariani,
Matteo Martinelli,
Franco Zambonelli
Abstract:
The increasing complexity and unpredictability of many ICT scenarios let us envision that future systems will have to dynamically learn how to act and adapt to face evolving situations with little or no a priori knowledge, both at the level of individual components and at the collective level. In other words, such systems should become able to autonomously develop models of themselves and of their…
▽ More
The increasing complexity and unpredictability of many ICT scenarios let us envision that future systems will have to dynamically learn how to act and adapt to face evolving situations with little or no a priori knowledge, both at the level of individual components and at the collective level. In other words, such systems should become able to autonomously develop models of themselves and of their environment. Autonomous development includes: learning models of own capabilities; learning how to act purposefully towards the achievement of specific goals; and learning how to act collectively, i.e., accounting for the presence of others. In this paper, we introduce the vision of autonomous development in ICT systems, by framing its key concepts and by illustrating suitable application domains. Then, we overview the many research areas that are contributing or can potentially contribute to the realization of the vision, and identify some key research challenges.
△ Less
Submitted 3 October, 2021; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Citations or dollars? Early signals of a firm's research success
Authors:
Shuqi Xu,
Manuel S. Mariani,
Linyuan Lü,
Lorenzo Napolitano,
Emanuele Pugliese,
Andrea Zaccaria
Abstract:
Scientific and technological progress is largely driven by firms in many domains, including artificial intelligence and vaccine development. However, we do not know yet whether the success of firms' research activities exhibits dynamic regularities and some degree of predictability. By inspecting the research lifecycles of 7,440 firms, we find that the economic value of a firm's early patents is a…
▽ More
Scientific and technological progress is largely driven by firms in many domains, including artificial intelligence and vaccine development. However, we do not know yet whether the success of firms' research activities exhibits dynamic regularities and some degree of predictability. By inspecting the research lifecycles of 7,440 firms, we find that the economic value of a firm's early patents is an accurate predictor of various dimensions of a firm's future research success. At the same time, a smaller set of future top-performers do not generate early patents of high economic value, but they are detectable via the technological value of their early patents. Importantly, the observed predictability cannot be explained by a cumulative advantage mechanism, and the observed heterogeneity of the firms' temporal success patterns markedly differs from patterns previously observed for individuals' research careers. Our results uncover the dynamical regularities of the research success of firms, and they could inform managerial strategies as well as policies to promote entrepreneurship and accelerate human progress.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Online structural health monitoring by model order reduction and deep learning algorithms
Authors:
Luca Rosafalco,
Matteo Torzoni,
Andrea Manzoni,
Stefano Mariani,
Alberto Corigliano
Abstract:
Within a structural health monitoring (SHM) framework, we propose a simulation-based classification strategy to move towards online damage localization. The procedure combines parametric Model Order Reduction (MOR) techniques and Fully Convolutional Networks (FCNs) to analyze raw vibration measurements recorded on the monitored structure. First, a dataset of possible structural responses under var…
▽ More
Within a structural health monitoring (SHM) framework, we propose a simulation-based classification strategy to move towards online damage localization. The procedure combines parametric Model Order Reduction (MOR) techniques and Fully Convolutional Networks (FCNs) to analyze raw vibration measurements recorded on the monitored structure. First, a dataset of possible structural responses under varying operational conditions is built through a physics-based model, allowing for a finite set of predefined damage scenarios. Then, the dataset is used for the offline training of the FCN. Because of the extremely large number of model evaluations required by the dataset construction, MOR techniques are employed to reduce the computational burden. The trained classifier is shown to be able to map unseen vibrational recordings, e.g. collected on-the-fly from sensors placed on the structure, to the actual damage state, thus providing information concerning the presence and also the location of damage. The proposed strategy has been validated by means of two case studies, concerning a 2D portal frame and a 3D portal frame railway bridge; MOR techniques have allowed us to respectively speed up the analyses about 30 and 420 times. For both the case studies, after training the classifier has attained an accuracy greater than 85%.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Time-Fluid Field-Based Coordination through Programmable Distributed Schedulers
Authors:
Danilo Pianini,
Roberto Casadei,
Mirko Viroli,
Stefano Mariani,
Franco Zambonelli
Abstract:
Emerging application scenarios, such as cyber-physical systems (CPSs), the Internet of Things (IoT), and edge computing, call for coordination approaches addressing openness, self-adaptation, heterogeneity, and deployment agnosticism. Field-based coordination is one such approach, promoting the idea of programming system coordination declaratively from a global perspective, in terms of functional…
▽ More
Emerging application scenarios, such as cyber-physical systems (CPSs), the Internet of Things (IoT), and edge computing, call for coordination approaches addressing openness, self-adaptation, heterogeneity, and deployment agnosticism. Field-based coordination is one such approach, promoting the idea of programming system coordination declaratively from a global perspective, in terms of functional manipulation and evolution in "space and time" of distributed data structures called fields. More specifically regarding time, in field-based coordination (as in many other distributed approaches to coordination) it is assumed that local activities in each device are regulated by a fair and unsynchronised fixed clock working at the platform level. In this work, we challenge this assumption, and propose an alternative approach where scheduling is programmed in a natural way (along with usual field-based coordination) in terms of causality fields, each enacting a programmable distributed notion of a computation "cause" (why and when a field computation has to be locally computed) and how it should change across time and space. Starting from low-level platform triggers, such causality fields can be organised into multiple layers, up to high-level, collectively-computed time abstractions, to be used at the application level. This reinterpretation of time in terms of articulated causality relations allows us to express what we call "time-fluid" coordination, where scheduling can be finely tuned so as to select the triggers to react to, generally allowing to adaptively balance performance (system reactivity) and cost (resource usage) of computations. We formalise the proposed scheduling framework for field-based coordination in the context of the field calculus, discuss an implementation in the aggregate computing framework, and finally evaluate the approach via simulation on several case studies.
△ Less
Submitted 24 November, 2021; v1 submitted 26 December, 2020;
originally announced December 2020.
-
The fragility of opinion formation in a complex world
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust…
▽ More
With vast amounts of high-quality information at our fingertips, how is it possible that many people believe that the Earth is flat and vaccination harmful? Motivated by this question, we quantify the implications of an opinion formation mechanism whereby an uninformed observer gradually forms opinions about a world composed of subjects interrelated by a signed network of mutual trust and distrust. We show numerically and analytically that the observer's resulting opinions are highly inconsistent (they tend to be independent of the observer's initial opinions) and unstable (they exhibit wide stochastic variations). Opinion inconsistency and instability increase with the world complexity represented by the number of subjects, which can be prevented by suitably expanding the observer's initial amount of information. Our findings imply that even an individual who initially trusts credible information sources may end up trusting the deceptive ones if at least a small number of trust relations exist between the credible and deceptive sources.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Network-based ranking in social systems: three challenges
Authors:
Manuel S. Mariani,
Linyuan Lü
Abstract:
Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the…
▽ More
Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the ubiquitous and successful applications of these algorithms, we argue that our understanding of their performance and their applications to real-world problems face three fundamental challenges: (i) Rankings might be biased by various factors; (2) their effectiveness might be limited to specific problems; and (3) agents' decisions driven by rankings might result in potentially vicious feedback mechanisms and unhealthy systemic consequences. Methods rooted in network science and agent-based modeling can help us to understand and overcome these challenges.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Absence of a resolution limit in in-block nestedness
Authors:
Manuel S. Mariani,
María J. Palazzi,
Albert Solé-Ribalta,
Javier Borge-Holthoefer,
Claudio J. Tessone
Abstract:
Originally a speculative pattern in ecological networks, the hybrid or compound nested-modular pattern has been confirmed, during the last decade, as a relevant structural arrangement that emerges in a variety of contexts --in ecological mutualistic system and beyond. This implies shifting the focus from the measurement of nestedness as a global property (macro level), to the detection of blocks (…
▽ More
Originally a speculative pattern in ecological networks, the hybrid or compound nested-modular pattern has been confirmed, during the last decade, as a relevant structural arrangement that emerges in a variety of contexts --in ecological mutualistic system and beyond. This implies shifting the focus from the measurement of nestedness as a global property (macro level), to the detection of blocks (meso level) that internally exhibit a high degree of nestedness. Unfortunately, the availability and understanding of the methods to properly detect in-block nested partitions lie behind the empirical findings: while a precise quality function of in-block nestedness has been proposed, we lack an understanding of its possible inherent constraints. Specifically, while it is well known that Newman-Girvan's modularity, and related quality functions, notoriously suffer from a resolution limit that impairs their ability to detect small blocks, the potential existence of resolution limits for in-block nestedness is unexplored. Here, we provide empirical, numerical and analytical evidence that the in-block nestedness function lacks a resolution limit, and thus our capacity to detect correct partitions in networks via its maximization depends solely on the accuracy of the optimization algorithms.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Fully convolutional networks for structural health monitoring through multivariate time series classification
Authors:
Luca Rosafalco,
Andrea Manzoni,
Stefano Mariani,
Alberto Corigliano
Abstract:
We propose a novel approach to Structural Health Monitoring (SHM), aiming at the automatic identification of damage-sensitive features from data acquired through pervasive sensor systems. Damage detection and localization are formulated as classification problems, and tackled through Fully Convolutional Networks (FCNs). A supervised training of the proposed network architecture is performed on dat…
▽ More
We propose a novel approach to Structural Health Monitoring (SHM), aiming at the automatic identification of damage-sensitive features from data acquired through pervasive sensor systems. Damage detection and localization are formulated as classification problems, and tackled through Fully Convolutional Networks (FCNs). A supervised training of the proposed network architecture is performed on data extracted from numerical simulations of a physics-based model (playing the role of digital twin of the structure to be monitored) accounting for different damage scenarios. By relying on this simplified model of the structure, several load conditions are considered during the training phase of the FCN, whose architecture has been designed to deal with time series of different length. The training of the neural network is done before the monitoring system starts operating, thus enabling a real time damage classification. The numerical performances of the proposed strategy are assessed on a numerical benchmark case consisting of an eight-story shear building subjected to two load types, one of which modeling random vibrations due to low-energy seismicity. Measurement noise has been added to the responses of the structure to mimic the outputs of a real monitoring system. Extremely good classification capacities are shown: among the nine possible alternatives (represented by the healthy state and by a damage at any floor), damage is correctly classified in up to 95% of cases, thus showing the strong potential of the proposed approach in view of the application to real-life cases.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Simple regularities in the dynamics of online news impact
Authors:
Matúš Medo,
Manuel S. Mariani,
Linyuan Lü
Abstract:
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity…
▽ More
Online news can quickly reach and affect millions of people, yet we do not know yet whether there exist potential dynamical regularities that govern their impact on the public. We use data from two major news outlets, BBC and New York Times, where the number of user comments can be used as a proxy of news impact. We find that the impact dynamics of online news articles does not exhibit popularity patterns found in many other social and information systems. In particular, we find that a simple exponential distribution yields a better fit to the empirical news impact distributions than a power-law distribution. This observation is explained by the lack or limited influence of the otherwise omnipresent rich-get-richer mechanism in the analyzed data. The temporal dynamics of the news impact exhibits a universal exponential decay which allows us to collapse individual news trajectories into an elementary single curve. We also show how daily variations of user activity directly influence the dynamics of the article impact. Our findings challenge the universal applicability of popularity dynamics patterns found in other social contexts.
△ Less
Submitted 22 January, 2021; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data
Authors:
Shuqi Xu,
Manuel Sebastian Mariani,
Linyuan Lü,
Matúš Medo
Abstract:
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metr…
▽ More
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics' ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics' performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
The wisdom of the few: Predicting collective success from individual behavior
Authors:
Manuel S. Mariani,
Yanina Gimenez,
Jorge Brea,
Martin Minnoni,
René Algesheimer,
Claudio J. Tessone
Abstract:
Can we predict top-performing products, services, or businesses by only monitoring the behavior of a small set of individuals? Although most previous studies focused on the predictive power of "hub" individuals with many social contacts, which sources of customer behavioral data are needed to address this question remains unclear, mostly due to the scarcity of available datasets that simultaneousl…
▽ More
Can we predict top-performing products, services, or businesses by only monitoring the behavior of a small set of individuals? Although most previous studies focused on the predictive power of "hub" individuals with many social contacts, which sources of customer behavioral data are needed to address this question remains unclear, mostly due to the scarcity of available datasets that simultaneously capture individuals' purchasing patterns and social interactions. Here, we address this question in a unique, large-scale dataset that combines individuals' credit-card purchasing history with their social and mobility traits across an entire nation. Surprisingly, we find that the purchasing history alone enables the detection of small sets of ``discoverers" whose early purchases offer reliable success predictions for the brick-and-mortar stores they visit. In contrast with the assumptions by most existing studies on word-of-mouth processes, the hubs selected by social network centrality are not consistently predictive of success. Our findings show that companies and organizations with access to large-scale purchasing data can detect the discoverers and leverage their behavior to anticipate market trends, without the need for social network data.
△ Less
Submitted 9 June, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Coordination of Autonomous Vehicles: Taxonomy and Survey
Authors:
Stefano Mariani,
Giacomo Cabri,
Franco Zambonelli
Abstract:
In the near future, our streets will be populated by myriads of autonomous self-driving vehicles to serve our diverse mobility needs. This will raise the need to coordinate their movements in order to properly handle both access to shared resources (e.g., intersections and parking slots) and the execution of mobility tasks (e.g., platooning and ramp merging). In this paper, we firstly introduce th…
▽ More
In the near future, our streets will be populated by myriads of autonomous self-driving vehicles to serve our diverse mobility needs. This will raise the need to coordinate their movements in order to properly handle both access to shared resources (e.g., intersections and parking slots) and the execution of mobility tasks (e.g., platooning and ramp merging). In this paper, we firstly introduce the general issues associated to coordination of autonomous vehicles, by identifying and framing the key classes of coordination problems. Following, we overview the different approaches that can be adopted to manage such coordination problems, by classifying them in terms of the degree of autonomy in decision making that is left to autonomous vehicles during coordination. Finally, we overview some further peculiar challenges that research will have to address before autonomously coordinated vehicles can safely hit our streets.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Recommending investors for new startups by integrating network diffusion and investors' domain preference
Authors:
Shuqi Xu,
Qianming Zhang,
Linyuan Lv,
Manuel Sebastian Mariani
Abstract:
Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatical…
▽ More
Over the past decade, many startups have sprung up, which create a huge demand for financial support from venture investors. However, due to the information asymmetry between investors and companies, the financing process is usually challenging and time-consuming, especially for the startups that have not yet obtained any investment. Because of this, effective data-driven techniques to automatically match startups with potentially relevant investors would be highly desirable. Here, we analyze 34,469 valid investment events collected from www.itjuzi.com and consider the cold-start problem of recommending investors for new startups. We address this problem by constructing different tripartite network representations of the data where nodes represent investors, companies, and companies' domains. First, we find that investors have strong domain preferences when investing, which motivates us to introduce virtual links between investors and investment domains in the tripartite network construction. Our analysis of the recommendation performance of diffusion-based algorithms applied to various network representations indicates that prospective investors for new startups are effectively revealed by integrating network diffusion processes with investors' domain preference.
△ Less
Submitted 16 January, 2020; v1 submitted 5 December, 2019;
originally announced December 2019.
-
Temporal similarity metrics for latent network reconstruction: The role of time-lag decay
Authors:
Hao Liao,
Ming-Kai Liu,
Manuel Sebastian Mariani,
Mingyang Zhou,
Xingtong Wu
Abstract:
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similar…
▽ More
When investigating the spreading of a piece of information or the diffusion of an innovation, we often lack information on the underlying propagation network. Reconstructing the hidden propagation paths based on the observed diffusion process is a challenging problem which has recently attracted attention from diverse research fields. To address this reconstruction problem, based on static similarity metrics commonly used in the link prediction literature, we introduce new node-node temporal similarity metrics. The new metrics take as input the time-series of multiple independent spreading processes, based on the hypothesis that two nodes are more likely to be connected if they were often infected at similar points in time. This hypothesis is implemented by introducing a time-lag function which penalizes distant infection times. We find that the choice of this time-lag strongly affects the metrics' reconstruction accuracy, depending on the network's clustering coefficient and we provide an extensive comparative analysis of static and temporal similarity metrics for network reconstruction. Our findings shed new light on the notion of similarity between pairs of nodes in complex networks.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
Fast influencers in complex networks
Authors:
Fang Zhou,
Linyuan Lü,
Manuel Sebastian Mariani
Abstract:
Influential nodes in complex networks are typically defined as those nodes that maximize the asymptotic reach of a spreading process of interest. However, for practical applications such as viral marketing and online information spreading, one is often interested in maximizing the reach of the process in a short amount of time. The traditional definition of influencers in network-related studies f…
▽ More
Influential nodes in complex networks are typically defined as those nodes that maximize the asymptotic reach of a spreading process of interest. However, for practical applications such as viral marketing and online information spreading, one is often interested in maximizing the reach of the process in a short amount of time. The traditional definition of influencers in network-related studies from diverse research fields narrows down the focus to the late-time state of the spreading processes, leaving the following question unsolved: which nodes are able to initiate large-scale spreading processes, in a limited amount of time? Here, we find that there is a fundamental difference between the nodes -- which we call "fast influencers" -- that initiate the largest-reach processes in a short amount of time, and the traditional, "late-time" influencers. Stimulated by this observation, we provide an extensive benchmarking of centrality metrics with respect to their ability to identify both the fast and late-time influencers. We find that local network properties can be used to uncover the fast influencers. In particular, a parsimonious, local centrality metric (which we call social capital) achieves optimal or nearly-optimal performance in the fast influencer identification for all the analyzed empirical networks. Local metrics tend to be also competitive in the traditional, late-time influencer identification task.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Optimal timescale for community detection in growing networks
Authors:
Matus Medo,
An Zeng,
Yi-Cheng Zhang,
Manuel S. Mariani
Abstract:
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information…
▽ More
Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the system's dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a network's nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the system's intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.
△ Less
Submitted 1 August, 2019; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Logic Programming as a Service
Authors:
Roberta Calegari,
Enrico Denti,
Stefano Mariani,
Andrea Omicini
Abstract:
New generations of distributed systems are opening novel perspectives for logic programming (LP): on the one hand, service-oriented architectures represent nowadays the standard approach for distributed systems engineering; on the other hand, pervasive systems mandate for situated intelligence. In this paper we introduce the notion of Logic Programming as a Service (LPaaS) as a means to address th…
▽ More
New generations of distributed systems are opening novel perspectives for logic programming (LP): on the one hand, service-oriented architectures represent nowadays the standard approach for distributed systems engineering; on the other hand, pervasive systems mandate for situated intelligence. In this paper we introduce the notion of Logic Programming as a Service (LPaaS) as a means to address the needs of pervasive intelligent systems through logic engines exploited as a distributed service. First we define the abstract architectural model by re-interpreting classical LP notions in the new context; then we elaborate on the nature of LP interpreted as a service by describing the basic LPaaS interface. Finally, we show how LPaaS works in practice by discussing its implementation in terms of distributed tuProlog engines, accounting for basic issues such as interoperability and configurability.
△ Less
Submitted 25 September, 2018; v1 submitted 7 June, 2018;
originally announced June 2018.
-
The long-term impact of ranking algorithms in growing networks
Authors:
Shilun Zhang,
Matúš Medo,
Linyuan Lü,
Manuel Sebastian Mariani
Abstract:
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this…
▽ More
When we search online for content, we are constantly exposed to rankings. For example, web search results are presented as a ranking, and online bookstores often show us lists of best-selling books. While popularity-based ranking algorithms (like Google's PageRank) have been extensively studied in previous works, we still lack a clear understanding of their potential systemic consequences. In this work, we fill this gap by introducing a new model of network growth that allows us to compare the properties of the networks generated under the influence of different ranking algorithms. We show that by correcting for the omnipresent age bias of popularity-based ranking algorithms, the resulting networks exhibit a significantly larger agreement between the nodes' inherent quality and their long-term popularity, and a less concentrated popularity distribution. To further promote popularity diversity, we introduce and validate a perturbation of the original rankings where a small number of randomly-selected nodes are promoted to the top of the ranking. Our findings move the first steps toward a model-based understanding of the long-term impact of popularity-based ranking algorithms, and could be used as an informative tool for the design of improved information filtering tools.
△ Less
Submitted 19 November, 2018; v1 submitted 31 May, 2018;
originally announced May 2018.
-
Influencers identification in complex networks through reaction-diffusion dynamics
Authors:
Flavio Iannelli,
Manuel Sebastian Mariani,
Igor M. Sokolov
Abstract:
A pivotal idea in network science, marketing research and innovation diffusion theories is that a small group of nodes -- called influencers -- have the largest impact on social contagion and epidemic processes in networks. Despite the long-standing interest in the influencers identification problem in socio-economic and biological networks, there is not yet agreement on which is the best identifi…
▽ More
A pivotal idea in network science, marketing research and innovation diffusion theories is that a small group of nodes -- called influencers -- have the largest impact on social contagion and epidemic processes in networks. Despite the long-standing interest in the influencers identification problem in socio-economic and biological networks, there is not yet agreement on which is the best identification strategy. State-of-the-art strategies are typically based either on heuristic centrality metrics or on analytic arguments that only hold for specific network topologies or peculiar dynamical regimes. Here, we leverage the recently introduced random-walk effective distance -- a topological metric that estimates almost perfectly the arrival time of diffusive spreading processes on networks -- to introduce a new centrality metric which quantifies how close a node is to the other nodes. We show that the new centrality metric significantly outperforms state-of-the-art metrics in detecting the influencers for global contagion processes. Our findings reveal the essential role of the network effective distance for the influencers identification and lead us closer to the optimal solution of the problem.
△ Less
Submitted 14 November, 2018; v1 submitted 3 March, 2018;
originally announced March 2018.
-
Revealing In-Block Nestedness: detection and benchmarking
Authors:
Albert Solé-Ribalta,
Claudio J. Tessone,
Manuel S. Mariani,
Javier Borge-Holthoefer
Abstract:
As new instances of nested organization --beyond ecological networks-- are discovered, scholars are debating around the co-existence of two apparently incompatible macroscale architectures: nestedness and modularity. The discussion is far from being solved, mainly for two reasons. First, nestedness and modularity appear to emerge from two contradictory dynamics, cooperation and competition. Second…
▽ More
As new instances of nested organization --beyond ecological networks-- are discovered, scholars are debating around the co-existence of two apparently incompatible macroscale architectures: nestedness and modularity. The discussion is far from being solved, mainly for two reasons. First, nestedness and modularity appear to emerge from two contradictory dynamics, cooperation and competition. Second, existing methods to assess the presence of nestedness and modularity are flawed when it comes to the evaluation of concurrently nested and modular structures. In this work, we tackle the latter problem, presenting the concept of \textit{in-block nestedness}, a structural property determining to what extent a network is composed of blocks whose internal connectivity exhibits nestedness. We then put forward a set of optimization methods that allow us to identify such organization successfully, both in synthetic and in a large number of real networks. These findings challenge our understanding of the topology of ecological and social systems, calling for new models to explain how such patterns emerge.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Early identification of important patents through network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
François Lafond
Abstract:
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effect…
▽ More
One of the most challenging problems in technological forecasting is to identify as early as possible those technologies that have the potential to lead to radical changes in our society. In this paper, we use the US patent citation network (1926-2010) to test our ability to early identify a list of historically significant patents through citation network analysis. We show that in order to effectively uncover these patents shortly after they are issued, we need to go beyond raw citation counts and take into account both the citation network topology and temporal information. In particular, an age-normalized measure of patent centrality, called rescaled PageRank, allows us to identify the significant patents earlier than citation count and PageRank score. In addition, we find that while high-impact patents tend to rely on other high-impact patents in a similar way as scientific papers, the patents' citation dynamics is significantly slower than that of papers, which makes the early identification of significant patents more challenging than that of significant papers.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Ranking in evolving complex networks
Authors:
Hao Liao,
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang,
Ming-Yang Zhou
Abstract:
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allo…
▽ More
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Well-established ranking algorithms (such as the popular Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. The recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of real network traffic, prediction of future links, and identification of highly-significant nodes.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
Quantifying and suppressing ranking bias in a large citation network
Authors:
Giacomo Vaccario,
Matus Medo,
Nicolas Wider,
Manuel Sebastian Mariani
Abstract:
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relati…
▽ More
It is widely recognized that citation counts for papers from different fields cannot be directly compared because different scientific fields adopt different citation practices. Citation counts are also strongly biased by paper age since older papers had more time to attract citations. Various procedures aim at suppressing these biases and give rise to new normalized indicators, such as the relative citation count. We use a large citation dataset from Microsoft Academic Graph and a new statistical framework based on the Mahalanobis distance to show that the rankings by well known indicators, including the relative citation count and Google's PageRank score, are significantly biased by paper field and age. We propose a general normalization procedure motivated by the $z$-score which produces much less biased rankings when applied to citation count and PageRank score.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Randomizing growing networks with a time-respecting null model
Authors:
Zhuo-Ming Ren,
Manuel Sebastian Mariani,
Yi-Cheng Zhang,
Matus Medo
Abstract:
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the…
▽ More
Complex networks are often used to represent systems that are not static but grow with time: people make new friendships, new papers are published and refer to the existing ones, and so forth. To assess the statistical significance of measurements made on such networks, we propose a randomization methodology---a time-respecting null model---that preserves both the network's degree sequence and the time evolution of individual nodes' degree values. By preserving the temporal linking patterns of the analyzed system, the proposed model is able to factor out the effect of the system's temporal patterns on its structure. We apply the model to the citation network of Physical Review scholarly papers and the citation network of US movies. The model reveals that the two datasets are strikingly different with respect to their degree-degree correlations, and we discuss the important implications of this finding on the information provided by paradigmatic node centrality metrics such as indegree and Google's PageRank. The randomization methodology proposed here can be used to assess the significance of any structural property in growing networks, which could bring new insights into the problems where null models play a critical role, such as the detection of communities and network motifs.
△ Less
Submitted 16 November, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Identification of milestone papers through time-balanced network centrality
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze…
▽ More
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used - often in controversial ways - as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the $449,935$ papers published by the American Physical Society (APS) journals between 1893 and 2009, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, is the best-performing metric overall in identifying the Milestone Letters. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks. An interactive Web platform where it is possible to view the ranking of the APS papers by rescaled PageRank is available at the address \url{http://www.sciencenow.info}.
△ Less
Submitted 8 November, 2016; v1 submitted 30 August, 2016;
originally announced August 2016.
-
The mathematics of non-linear metrics for nested networks
Authors:
Rui-Jie Wu,
Gui-Yuan Shi,
Yi-Cheng Zhang,
Manuel Sebastian Mariani
Abstract:
Numerical analysis of data from international trade and ecological networks has shown that the non-linear fitness-complexity metric is the best candidate to rank nodes by importance in bipartite networks that exhibit a nested structure. Despite its relevance for real networks, the mathematical properties of the metric and its variants remain largely unexplored. Here, we perform an analytic and num…
▽ More
Numerical analysis of data from international trade and ecological networks has shown that the non-linear fitness-complexity metric is the best candidate to rank nodes by importance in bipartite networks that exhibit a nested structure. Despite its relevance for real networks, the mathematical properties of the metric and its variants remain largely unexplored. Here, we perform an analytic and numeric study of the fitness-complexity metric and a new variant, called minimal extremal metric. We rigorously derive exact expressions for node scores for perfectly nested networks and show that these expressions explain the non-trivial convergence properties of the metrics. A comparison between the fitness-complexity metric and the minimal extremal metric on real data reveals that the latter can produce improved rankings if the input data are reliable.
△ Less
Submitted 21 March, 2016;
originally announced March 2016.
-
Identification and modeling of discoverers in online social systems
Authors:
Matus Medo,
Manuel S. Mariani,
An Zeng,
Yi-Cheng Zhang
Abstract:
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social s…
▽ More
The dynamics of individuals is of essential importance for understanding the evolution of social systems. Most existing models assume that individuals in diverse systems, ranging from social networks to e-commerce, all tend to what is already popular. We develop an analytical time-aware framework which shows that when individuals make choices -- which item to buy, for example -- in online social systems, a small fraction of them is consistently successful in discovering popular items long before they actually become popular. We argue that these users, whom we refer to as discoverers, are fundamentally different from the previously known opinion leaders, influentials, and innovators. We use the proposed framework to demonstrate that discoverers are present in a wide range of systems. Once identified, they can be used to predict the future success of items. We propose a network model which reproduces the discovery patterns observed in the real data. Furthermore, data produced by the model pose a fundamental challenge to classical ranking algorithms which neglect the time of link creation and thus fail to discriminate between discoverers and ordinary users in the data. Our results open the door to qualitative and quantitative study of fine temporal patterns in social systems and have far-reaching implications for network modeling and algorithm design.
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Ranking nodes in growing networks: When PageRank fails
Authors:
Manuel Sebastian Mariani,
Matus Medo,
Yi-Cheng Zhang
Abstract:
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's perf…
▽ More
PageRank is arguably the most popular ranking algorithm which is being applied in real systems ranging from information to biological and infrastructure networks. Despite its outstanding popularity and broad use in different areas of science, the relation between the algorithm's efficacy and properties of the network on which it acts has not yet been fully understood. We study here PageRank's performance on a network model supported by real data, and show that realistic temporal effects make PageRank fail in individuating the most valuable nodes for a broad range of model parameters. Results on real data are in qualitative agreement with our model-based findings. This failure of PageRank reveals that the static approach to information filtering is inappropriate for a broad class of growing systems, and suggest that time-dependent algorithms that are based on the temporal linking patterns of these systems are needed to better rank the nodes.
△ Less
Submitted 3 September, 2015;
originally announced September 2015.