-
A survey on the impact of AI-based recommenders on human behaviours: methodologies, outcomes and future directions
Authors:
Luca Pappalardo,
Emanuele Ferragina,
Salvatore Citraro,
Giuliano Cornacchia,
Mirco Nanni,
Giulio Rossetti,
Gizem Gezici,
Fosca Giannotti,
Margherita Lalli,
Daniele Gambetta,
Giovanni Mauro,
Virginia Morini,
Valentina Pansanella,
Dino Pedreschi
Abstract:
Recommendation systems and assistants (in short, recommenders) are ubiquitous in online platforms and influence most actions of our day-to-day lives, suggesting items or providing solutions based on users' preferences or requests. This survey analyses the impact of recommenders in four human-AI ecosystems: social media, online retail, urban map** and generative AI ecosystems. Its scope is to sys…
▽ More
Recommendation systems and assistants (in short, recommenders) are ubiquitous in online platforms and influence most actions of our day-to-day lives, suggesting items or providing solutions based on users' preferences or requests. This survey analyses the impact of recommenders in four human-AI ecosystems: social media, online retail, urban map** and generative AI ecosystems. Its scope is to systematise a fast-growing field in which terminologies employed to classify methodologies and outcomes are fragmented and unsystematic. We follow the customary steps of qualitative systematic review, gathering 144 articles from different disciplines to develop a parsimonious taxonomy of: methodologies employed (empirical, simulation, observational, controlled), outcomes observed (concentration, model collapse, diversity, echo chamber, filter bubble, inequality, polarisation, radicalisation, volume), and their level of analysis (individual, item, model, and systemic). We systematically discuss all findings of our survey substantively and methodologically, highlighting also potential avenues for future research. This survey is addressed to scholars and practitioners interested in different human-AI ecosystems, policymakers and institutional stakeholders who want to understand better the measurable outcomes of recommenders, and tech companies who wish to obtain a systematic view of the impact of their recommenders.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
AI, Meet Human: Learning Paradigms for Hybrid Decision Making Systems
Authors:
Clara Punzi,
Roberto Pellungrini,
Mattia Setzu,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classifi…
▽ More
Everyday we increasingly rely on machine learning models to automate and support high-stake tasks and decisions. This growing presence means that humans are now constantly interacting with machine learning-based systems, training and using models everyday. Several different techniques in computer science literature account for the human interaction with machine learning systems, but their classification is sparse and the goals varied. This survey proposes a taxonomy of Hybrid Decision Making Systems, providing both a conceptual and technical framework for understanding how current computer science literature models interaction between humans and machines.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Human-AI Coevolution
Authors:
Dino Pedreschi,
Luca Pappalardo,
Emanuele Ferragina,
Ricardo Baeza-Yates,
Albert-Laszlo Barabasi,
Frank Dignum,
Virginia Dignum,
Tina Eliassi-Rad,
Fosca Giannotti,
Janos Kertesz,
Alistair Knott,
Yannis Ioannidis,
Paul Lukowicz,
Andrea Passarella,
Alex Sandy Pentland,
John Shawe-Taylor,
Alessandro Vespignani
Abstract:
Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla…
▽ More
Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political.
△ Less
Submitted 3 May, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Dense Hebbian neural networks: a replica symmetric picture of supervised learning
Authors:
Elena Agliari,
Linda Albanese,
Francesco Alemanno,
Andrea Alessandrelli,
Adriano Barra,
Fosca Giannotti,
Daniele Lotito,
Dino Pedreschi
Abstract:
We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the train…
▽ More
We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the training dataset, network storage and noise, that is valid in the limit of large network size and structureless datasets: these networks may work in a ultra-storage regime (where they can handle a huge amount of patterns, if compared with shallow neural networks) or in a ultra-detection regime (where they can perform pattern recognition at prohibitive signal-to-noise ratios, if compared with shallow neural networks). Guided by the random theory as a reference framework, we also test numerically learning, storing and retrieval capabilities shown by these networks on structured datasets as MNist and Fashion MNist. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate supervised learning in neural networks, beyond the shallow limit, in general.
△ Less
Submitted 2 July, 2023; v1 submitted 25 November, 2022;
originally announced December 2022.
-
Dense Hebbian neural networks: a replica symmetric picture of unsupervised learning
Authors:
Elena Agliari,
Linda Albanese,
Francesco Alemanno,
Andrea Alessandrelli,
Adriano Barra,
Fosca Giannotti,
Daniele Lotito,
Dino Pedreschi
Abstract:
We consider dense, associative neural-networks trained with no supervision and we investigate their computational capabilities analytically, via a statistical-mechanics approach, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as the quality and quantity of the training dataset and the…
▽ More
We consider dense, associative neural-networks trained with no supervision and we investigate their computational capabilities analytically, via a statistical-mechanics approach, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as the quality and quantity of the training dataset and the network storage, valid in the limit of large network size and structureless datasets. Moreover, we establish a bridge between macroscopic observables standardly used in statistical mechanics and loss functions typically used in the machine learning. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate neural networks in general.
△ Less
Submitted 2 July, 2023; v1 submitted 25 November, 2022;
originally announced November 2022.
-
How Routing Strategies Impact Urban Emissions
Authors:
Giuliano Cornacchia,
Matteo Böhm,
Giovanni Mauro,
Mirco Nanni,
Dino Pedreschi,
Luca Pappalardo
Abstract:
Navigation apps use routing algorithms to suggest the best path to reach a user's desired destination. Although undoubtedly useful, navigation apps' impact on the urban environment (e.g., carbon dioxide emissions and population exposure to pollution) is still largely unclear. In this work, we design a simulation framework to assess the impact of routing algorithms on carbon dioxide emissions withi…
▽ More
Navigation apps use routing algorithms to suggest the best path to reach a user's desired destination. Although undoubtedly useful, navigation apps' impact on the urban environment (e.g., carbon dioxide emissions and population exposure to pollution) is still largely unclear. In this work, we design a simulation framework to assess the impact of routing algorithms on carbon dioxide emissions within an urban environment. Using APIs from TomTom and OpenStreetMap, we find that settings in which either all vehicles or none of them follow a navigation app's suggestion lead to the worst impact in terms of CO2 emissions. In contrast, when just a portion (around half) of vehicles follow these suggestions, and some degree of randomness is added to the remaining vehicles' paths, we observe a reduction in the overall CO2 emissions over the road network. Our work is a first step towards designing next-generation routing principles that may increase urban well-being while satisfying individual needs.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Measuring the Salad Bowl: Superdiversity on Twitter
Authors:
Laura Pollacci,
Alina Sirbu,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Superdiversity refers to large cultural diversity in a population due to immigration. In this paper, we introduce a superdiversity index based on the changes in the emotional content of words used by a multi-cultural community, compared to the standard language. To compute our index we use Twitter data and we develop an algorithm to extend a dictionary for lexicon-based sentiment analysis. We vali…
▽ More
Superdiversity refers to large cultural diversity in a population due to immigration. In this paper, we introduce a superdiversity index based on the changes in the emotional content of words used by a multi-cultural community, compared to the standard language. To compute our index we use Twitter data and we develop an algorithm to extend a dictionary for lexicon-based sentiment analysis. We validate our index by comparing it with official immigration statistics available from the European Commission's Joint Research Center, through the D4I data challenge. We show that, in general, our measure correlates with immigration rates, at various geographical resolutions. Our method produces very good results across languages, being tested here both on English and Italian tweets. We argue that our index has predictive power in regions where exact data on immigration is not available, paving the way for a nowcasting model of immigration rates.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Benchmarking and Survey of Explanation Methods for Black Box Models
Authors:
Francesco Bodria,
Fosca Giannotti,
Riccardo Guidotti,
Francesca Naretto,
Dino Pedreschi,
Salvatore Rinzivillo
Abstract:
The widespread adoption of black-box models in Artificial Intelligence has enhanced the need for explanation methods to reveal how these obscure models reach specific decisions. Retrieving explanations is fundamental to unveil possible biases and to resolve practical or ethical issues. Nowadays, the literature is full of methods with different explanations. We provide a categorization of explanati…
▽ More
The widespread adoption of black-box models in Artificial Intelligence has enhanced the need for explanation methods to reveal how these obscure models reach specific decisions. Retrieving explanations is fundamental to unveil possible biases and to resolve practical or ethical issues. Nowadays, the literature is full of methods with different explanations. We provide a categorization of explanation methods based on the type of explanation returned. We present the most recent and widely used explainers, and we show a visual comparison among explanations and a quantitative benchmarking.
△ Less
Submitted 25 February, 2021;
originally announced February 2021.
-
GLocalX -- From Local to Global Explanations of Black Box AI Models
Authors:
Mattia Setzu,
Riccardo Guidotti,
Anna Monreale,
Franco Turini,
Dino Pedreschi,
Fosca Giannotti
Abstract:
Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are "…
▽ More
Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are "black boxes" which we are not able to understand. Relying on these models has a multifaceted impact and raises significant concerns about their transparency. Applications in sensitive and critical domains are a strong motivational factor in trying to understand the behavior of black boxes. We propose to address this issue by providing an interpretable layer on top of black box models by aggregating "local" explanations. We present GLocalX, a "local-first" model agnostic explanation method. Starting from local explanations expressed in form of local decision rules, GLocalX iteratively generalizes them into global explanations by hierarchically aggregating them. Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely. We validate GLocalX in a set of experiments in standard and constrained settings with limited or no access to either data or local explanations. Experiments show that GLocalX is able to accurately emulate several models with simple and small models, reaching state-of-the-art performance against natively global solutions. Our findings show how it is often possible to achieve a high level of both accuracy and comprehensibility of classification models, even in complex domains with high-dimensional data, without necessarily trading one property for the other. This is a key requirement for a trustworthy AI, necessary for adoption in high-stakes decision making applications.
△ Less
Submitted 26 January, 2021; v1 submitted 19 January, 2021;
originally announced January 2021.
-
Predicting seasonal influenza using supermarket retail records
Authors:
Ioanna Miliou,
Xinyue Xiong,
Salvatore Rinzivillo,
Qian Zhang,
Giulio Rossetti,
Fosca Giannotti,
Dino Pedreschi,
Alessandro Vespignani
Abstract:
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy…
▽ More
Increased availability of epidemiological data, novel digital data streams, and the rise of powerful machine learning approaches have generated a surge of research activity on real-time epidemic forecast systems. In this paper, we propose the use of a novel data source, namely retail market data to improve seasonal influenza forecasting. Specifically, we consider supermarket retail data as a proxy signal for influenza, through the identification of sentinel baskets, i.e., products bought together by a population of selected customers. We develop a nowcasting and forecasting framework that provides estimates for influenza incidence in Italy up to 4 weeks ahead. We make use of the Support Vector Regression (SVR) model to produce the predictions of seasonal flu incidence. Our predictions outperform both a baseline autoregressive model and a second baseline based on product purchases. The results show quantitatively the value of incorporating retail market data in forecasting models, acting as a proxy that can be used for the real-time analysis of epidemics.
△ Less
Submitted 17 December, 2020; v1 submitted 8 December, 2020;
originally announced December 2020.
-
FairLens: Auditing Black-box Clinical Decision Support Systems
Authors:
Cecilia Panigutti,
Alan Perotti,
Andrè Panisson,
Paolo Bajardi,
Dino Pedreschi
Abstract:
The pervasive application of algorithmic decision-making is raising concerns on the risk of unintended bias in AI systems deployed in critical settings such as healthcare. The detection and mitigation of biased models is a very delicate task which should be tackled with care and involving domain experts in the loop. In this paper we introduce FairLens, a methodology for discovering and explaining…
▽ More
The pervasive application of algorithmic decision-making is raising concerns on the risk of unintended bias in AI systems deployed in critical settings such as healthcare. The detection and mitigation of biased models is a very delicate task which should be tackled with care and involving domain experts in the loop. In this paper we introduce FairLens, a methodology for discovering and explaining biases. We show how our tool can be used to audit a fictional commercial black-box model acting as a clinical decision support system. In this scenario, the healthcare facility experts can use FairLens on their own historical data to discover the model's biases before incorporating it into the clinical decision flow. FairLens first stratifies the available patient data according to attributes such as age, ethnicity, gender and insurance; it then assesses the model performance on such subgroups of patients identifying those in need of expert evaluation. Finally, building on recent state-of-the-art XAI (eXplainable Artificial Intelligence) techniques, FairLens explains which elements in patients' clinical history drive the model error in the selected subgroup. Therefore, FairLens allows experts to investigate whether to trust the model and to spotlight group-specific biases that might constitute potential fairness issues.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
The relationship between human mobility and viral transmissibility during the COVID-19 epidemics in Italy
Authors:
Paolo Cintia,
Luca Pappalardo,
Salvatore Rinzivillo,
Daniele Fadda,
Tobia Boschi,
Fosca Giannotti,
Francesca Chiaromonte,
Pietro Bonato,
Francesco Fabbri,
Francesco Penone,
Marcello Savarese,
Francesco Calabrese,
Giorgio Guzzetta,
Flavia Riccardo,
Valentina Marziano,
Piero Poletti,
Filippo Trentini,
Antonino Bella,
Xanthi Andrianou,
Martina Del Manso,
Massimo Fabiani,
Stefania Bellino,
Stefano Boros,
Alberto Mateo Urdiales,
Maria Fenicia Vescio
, et al. (7 additional authors not shown)
Abstract:
In 2020, countries affected by the COVID-19 pandemic implemented various non-pharmaceutical interventions to contrast the spread of the virus and its impact on their healthcare systems and economies. Using Italian data at different geographic scales, we investigate the relationship between human mobility, which subsumes many facets of the population's response to the changing situation, and the sp…
▽ More
In 2020, countries affected by the COVID-19 pandemic implemented various non-pharmaceutical interventions to contrast the spread of the virus and its impact on their healthcare systems and economies. Using Italian data at different geographic scales, we investigate the relationship between human mobility, which subsumes many facets of the population's response to the changing situation, and the spread of COVID-19. Leveraging mobile phone data from February through September 2020, we find a striking relationship between the decrease in mobility flows and the net reproduction number. We find that the time needed to switch off mobility and bring the net reproduction number below the critical threshold of 1 is about one week. Moreover, we observe a strong relationship between the number of days spent above such threshold before the lockdown-induced drop in mobility flows and the total number of infections per 100k inhabitants. Estimating the statistical effect of mobility flows on the net reproduction number over time, we document a 2-week lag positive association, strong in March and April, and weaker but still significant in June. Our study demonstrates the value of big mobility data to monitor the epidemic and inform control interventions during its unfolding.
△ Less
Submitted 1 April, 2021; v1 submitted 4 June, 2020;
originally announced June 2020.
-
Mobile phone data analytics against the COVID-19 epidemics in Italy: flow diversity and local job markets during the national lockdown
Authors:
Pietro Bonato,
Paolo Cintia,
Francesco Fabbri,
Daniele Fadda,
Fosca Giannotti,
Pier Luigi Lopalco,
Sara Mazzilli,
Mirco Nanni,
Luca Pappalardo,
Dino Pedreschi,
Francesco Penone,
Salvatore Rinzivillo,
Giulio Rossetti,
Marcello Savarese,
Lara Tavoschi
Abstract:
Understanding collective mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in stand-by to fight the diffusion of the epidemics. In this report, we use mobile phone data to infer the movements of people between Italian provinces and municipalities, and we analyze the incoming, outcoming and internal mobility flows before and during the n…
▽ More
Understanding collective mobility patterns is crucial to plan the restart of production and economic activities, which are currently put in stand-by to fight the diffusion of the epidemics. In this report, we use mobile phone data to infer the movements of people between Italian provinces and municipalities, and we analyze the incoming, outcoming and internal mobility flows before and during the national lockdown (March 9th, 2020) and after the closure of non-necessary productive and economic activities (March 23th, 2020). The population flow across provinces and municipalities enable for the modelling of a risk index tailored for the mobility of each municipality or province. Such an index would be a useful indicator to drive counter-measures in reaction to a sudden reactivation of the epidemics. Mobile phone data, even when aggregated to preserve the privacy of individuals, are a useful data source to track the evolution in time of human mobility, hence allowing for monitoring the effectiveness of control measures such as physical distancing. We address the following analytical questions: How does the mobility structure of a territory change? Do incoming and outcoming flows become more predictable during the lockdown, and what are the differences between weekdays and weekends? Can we detect proper local job markets based on human mobility flows, to eventually shape the borders of a local outbreak?
△ Less
Submitted 23 April, 2020;
originally announced April 2020.
-
Give more data, awareness and control to individual citizens, and they will help COVID-19 containment
Authors:
Mirco Nanni,
Gennady Andrienko,
Albert-László Barabási,
Chiara Boldrini,
Francesco Bonchi,
Ciro Cattuto,
Francesca Chiaromonte,
Giovanni Comandé,
Marco Conti,
Mark Coté,
Frank Dignum,
Virginia Dignum,
Josep Domingo-Ferrer,
Paolo Ferragina,
Fosca Giannotti,
Riccardo Guidotti,
Dirk Helbing,
Kimmo Kaski,
Janos Kertesz,
Sune Lehmann,
Bruno Lepri,
Paul Lukowicz,
Stan Matwin,
David Megías Jiménez,
Anna Monreale
, et al. (14 additional authors not shown)
Abstract:
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri…
▽ More
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.
△ Less
Submitted 16 April, 2020; v1 submitted 10 April, 2020;
originally announced April 2020.
-
Black Box Explanation by Learning Image Exemplars in the Latent Feature Space
Authors:
Riccardo Guidotti,
Anna Monreale,
Stan Matwin,
Dino Pedreschi
Abstract:
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars resp…
▽ More
We present an approach to explain the decisions of black box models for image classification. While using the black box to label images, our explanation method exploits the latent feature space learned through an adversarial autoencoder. The proposed method first generates exemplar images in the latent feature space and learns a decision tree classifier. Then, it selects and decodes exemplars respecting local decision rules. Finally, it visualizes them in a manner that shows to the user how the exemplars can be modified to either stay within their class, or to become counter-factuals by "morphing" into another class. Since we focus on black box decision systems for image classification, the explanation obtained from the exemplars also provides a saliency map highlighting the areas of the image that contribute to its classification, and areas of the image that push it into another class. We present the results of an experimental evaluation on three datasets and two black box models. Besides providing the most useful and interpretable explanations, we show that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and stability.
△ Less
Submitted 27 January, 2020;
originally announced February 2020.
-
Clusters of investors around Initial Public Offering
Authors:
Margarita Baltakienė,
Kęstutis Baltakys,
Juho Kanniainen,
Dino Pedreschi,
Fabrizio Lillo
Abstract:
The complex networks approach has been gaining popularity in analysing investor behaviour and stock markets, but within this approach, initial public offerings (IPO) have barely been explored. We fill this gap in the literature by analysing investor clusters in the first two years after the IPO filing in the Helsinki Stock Exchange by using a statistically validated network method to infer investo…
▽ More
The complex networks approach has been gaining popularity in analysing investor behaviour and stock markets, but within this approach, initial public offerings (IPO) have barely been explored. We fill this gap in the literature by analysing investor clusters in the first two years after the IPO filing in the Helsinki Stock Exchange by using a statistically validated network method to infer investor links based on the co-occurrences of investors' trade timing for 69 IPO stocks. Our findings show that a rather large part of statistically similar network structures form in different securities and persist in time for mature and IPO companies. We also find evidence of institutional herding.
△ Less
Submitted 6 November, 2019; v1 submitted 31 May, 2019;
originally announced May 2019.
-
Open the Black Box Data-Driven Explanation of Black Box Decision Systems
Authors:
Dino Pedreschi,
Fosca Giannotti,
Riccardo Guidotti,
Anna Monreale,
Luca Pappalardo,
Salvatore Ruggieri,
Franco Turini
Abstract:
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong…
▽ More
Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions: (i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation; (ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of the many local explanations into simple global ones, with algorithms that optimize the quality and comprehensibility of explanations.
△ Less
Submitted 26 June, 2018;
originally announced June 2018.
-
Local Rule-Based Explanations of Black Box Decision Systems
Authors:
Riccardo Guidotti,
Anna Monreale,
Salvatore Ruggieri,
Dino Pedreschi,
Franco Turini,
Fosca Giannotti
Abstract:
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. %Therefore, we need explanations that reve…
▽ More
The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. %Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, i.e., explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
Algorithmic bias amplifies opinion polarization: A bounded confidence model
Authors:
Alina Sîrbu,
Dino Pedreschi,
Fosca Giannotti,
János Kertész
Abstract:
The flow of information reaching us via the online media platforms is optimized not by the information content or relevance but by popularity and proximity to the target. This is typically performed in order to maximise platform usage. As a side effect, this introduces an algorithmic bias that is believed to enhance polarization of the societal debate. To study this phenomenon, we modify the well-…
▽ More
The flow of information reaching us via the online media platforms is optimized not by the information content or relevance but by popularity and proximity to the target. This is typically performed in order to maximise platform usage. As a side effect, this introduces an algorithmic bias that is believed to enhance polarization of the societal debate. To study this phenomenon, we modify the well-known continuous opinion dynamics model of bounded confidence in order to account for the algorithmic bias and investigate its consequences. In the simplest version of the original model the pairs of discussion participants are chosen at random and their opinions get closer to each other if they are within a fixed tolerance level. We modify the selection rule of the discussion partners: there is an enhanced probability to choose individuals whose opinions are already close to each other, thus mimicking the behavior of online media which suggest interaction with similar peers. As a result we observe: a) an increased tendency towards polarization, which emerges also in conditions where the original model would predict convergence, and b) a dramatic slowing down of the speed at which the convergence at the asymptotic state is reached, which makes the system highly unstable. Polarization is augmented by a fragmented initial population.
△ Less
Submitted 6 March, 2018;
originally announced March 2018.
-
PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach
Authors:
Luca Pappalardo,
Paolo Cintia,
Paolo Ferragina,
Emanuele Massucco,
Dino Pedreschi,
Fosca Giannotti
Abstract:
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this pap…
▽ More
The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by {\sf PlayeRank} and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility --- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
△ Less
Submitted 25 January, 2019; v1 submitted 14 February, 2018;
originally announced February 2018.
-
A Survey Of Methods For Explaining Black Box Models
Authors:
Riccardo Guidotti,
Anna Monreale,
Salvatore Ruggieri,
Franco Turini,
Dino Pedreschi,
Fosca Giannotti
Abstract:
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications i…
▽ More
In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.
△ Less
Submitted 21 June, 2018; v1 submitted 6 February, 2018;
originally announced February 2018.
-
NDlib: a Python Library to Model and Analyze Diffusion Processes Over Complex Networks
Authors:
Giulio Rossetti,
Letizia Milli,
Salvatore Rinzivillo,
Alina Sirbu,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Nowadays the analysis of dynamics of and on networks represents a hot topic in the Social Network Analysis playground. To support students, teachers, developers and researchers in this work we introduce a novel framework, namely NDlib, an environment designed to describe diffusion simulations. NDlib is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. F…
▽ More
Nowadays the analysis of dynamics of and on networks represents a hot topic in the Social Network Analysis playground. To support students, teachers, developers and researchers in this work we introduce a novel framework, namely NDlib, an environment designed to describe diffusion simulations. NDlib is designed to be a multi-level ecosystem that can be fruitfully used by different user segments. For this reason, upon NDlib, we designed a simulation server that allows remote execution of experiments as well as an online visualization tool that abstracts its programmatic interface and makes available the simulation platform to non-technicians.
△ Less
Submitted 14 December, 2017;
originally announced January 2018.
-
Human Perception of Performance
Authors:
Luca Pappalardo,
Paolo Cintia,
Dino Pedreschi,
Fosca Giannotti,
Albert-Laszlo Barabasi
Abstract:
Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation process remain unclear. Here we analyse a massive dataset capturing players' evaluations by human judges to explore human perception of performance in soccer, the wor…
▽ More
Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation process remain unclear. Here we analyse a massive dataset capturing players' evaluations by human judges to explore human perception of performance in soccer, the world's most popular sport. We use machine learning to design an artificial judge which accurately reproduces human evaluation, allowing us to demonstrate how human observers are biased towards diverse contextual features. By investigating the structure of the artificial judge, we uncover the aspects of the players' behavior which attract the attention of human judges, demonstrating that human evaluation is based on a noticeability heuristic where only feature values far from the norm are considered to rate an individual's performance.
△ Less
Submitted 5 December, 2017;
originally announced December 2017.
-
Next Basket Prediction using Recurring Sequential Patterns
Authors:
Riccardo Guidotti,
Giulio Rossetti,
Luca Pappalardo,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Nowadays, a hot challenge for supermarket chains is to offer personalized services for their customers. Next basket prediction, i.e., supplying the customer a shop** list for the next purchase according to her current needs, is one of these services. Current approaches are not capable to capture at the same time the different factors influencing the customer's decision process: co-occurrency, se…
▽ More
Nowadays, a hot challenge for supermarket chains is to offer personalized services for their customers. Next basket prediction, i.e., supplying the customer a shop** list for the next purchase according to her current needs, is one of these services. Current approaches are not capable to capture at the same time the different factors influencing the customer's decision process: co-occurrency, sequentuality, periodicity and recurrency of the purchased items. To this aim, we define a pattern Temporal Annotated Recurring Sequence (TARS) able to capture simultaneously and adaptively all these factors. We define the method to extract TARS and develop a predictor for next basket named TBP (TARS Based Predictor) that, on top of TARS, is able to to understand the level of the customer's stocks and recommend the set of most necessary items. By adopting the TBP the supermarket chains could crop tailored suggestions for each individual customer which in turn could effectively speed up their shop** sessions. A deep experimentation shows that TARS are able to explain the customer purchase behavior, and that TBP outperforms the state-of-the-art competitors.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
Causal Inference for Social Discrimination Reasoning
Authors:
Bilal Qureshi,
Faisal Kamiran,
Asim Karim,
Salvatore Ruggieri,
Dino Pedreschi
Abstract:
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery ba…
▽ More
The discovery of discriminatory bias in human or automated decision making is a task of increasing importance and difficulty, exacerbated by the pervasive use of machine learning and data mining. Currently, discrimination discovery largely relies upon correlation analysis of decisions records, disregarding the impact of confounding biases. We present a method for causal discrimination discovery based on propensity score analysis, a statistical tool for filtering out the effect of confounding variables. We introduce causal measures of discrimination which quantify the effect of group membership on the decisions, and highlight causal discrimination/favoritism patterns by learning regression trees over the novel measures. We validate our approach on two real world datasets. Our proposed framework for causal discrimination has the potential to enhance the transparency of machine learning with tools for detecting discriminatory bias both in the training data and in the learning algorithms.
△ Less
Submitted 4 November, 2019; v1 submitted 12 August, 2016;
originally announced August 2016.
-
An analytical framework to nowcast well-being using mobile phone data
Authors:
Luca Pappalardo,
Maarten Vanhoof,
Lorenzo Gabrielli,
Zbigniew Smoreda,
Dino Pedreschi,
Fosca Giannotti
Abstract:
An intriguing open question is whether measurements made on Big Data recording human activities can yield us high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory just by observing the behavior of its inhabitants through the lens of Big Data? In this paper, we design a data-driven analytical framework that uses…
▽ More
An intriguing open question is whether measurements made on Big Data recording human activities can yield us high-fidelity proxies of socio-economic development and well-being. Can we monitor and predict the socio-economic development of a territory just by observing the behavior of its inhabitants through the lens of Big Data? In this paper, we design a data-driven analytical framework that uses mobility measures and social measures extracted from mobile phone data to estimate indicators for socio-economic development and well-being. We discover that the diversity of mobility, defined in terms of entropy of the individual users' trajectories, exhibits (i) significant correlation with two different socio-economic indicators and (ii) the highest importance in predictive models built to predict the socio-economic indicators. Our analytical framework opens an interesting perspective to study human behavior through the lens of Big Data by means of new statistical indicators that quantify and possibly "nowcast" the well-being and the socio-economic development of a territory.
△ Less
Submitted 16 March, 2016;
originally announced June 2016.
-
The Inductive Constraint Programming Loop
Authors:
Christian Bessiere,
Luc De Raedt,
Tias Guns,
Lars Kotthoff,
Mirco Nanni,
Siegfried Nijssen,
Barry O'Sullivan,
Anastasia Paparrizou,
Dino Pedreschi,
Helmut Simonis
Abstract:
Constraint programming is used for a variety of real-world optimisation problems, such as planning, scheduling and resource allocation problems. At the same time, one continuously gathers vast amounts of data about these problems. Current constraint programming software does not exploit such data to update schedules, resources and plans. We propose a new framework, that we call the Inductive Const…
▽ More
Constraint programming is used for a variety of real-world optimisation problems, such as planning, scheduling and resource allocation problems. At the same time, one continuously gathers vast amounts of data about these problems. Current constraint programming software does not exploit such data to update schedules, resources and plans. We propose a new framework, that we call the Inductive Constraint Programming loop. In this approach data is gathered and analyzed systematically, in order to dynamically revise and adapt constraints and optimization criteria. Inductive Constraint Programming aims at bridging the gap between the areas of data mining and machine learning on the one hand, and constraint programming on the other hand.
△ Less
Submitted 12 October, 2015;
originally announced October 2015.
-
A planetary nervous system for social mining and collective awareness
Authors:
Fosca Giannotti,
Dino Pedreschi,
Alex,
Pentland,
Paul Lukowicz,
Donald Kossmann,
James Crowley,
Dirk Helbing
Abstract:
We present a research roadmap of a Planetary Nervous System (PNS), capable of sensing and mining the digital breadcrumbs of human activities and unveiling the knowledge hidden in the big data for addressing the big questions about social complexity. We envision the PNS as a globally distributed, self-organizing, techno-social system for answering analytical questions about the status of world-wide…
▽ More
We present a research roadmap of a Planetary Nervous System (PNS), capable of sensing and mining the digital breadcrumbs of human activities and unveiling the knowledge hidden in the big data for addressing the big questions about social complexity. We envision the PNS as a globally distributed, self-organizing, techno-social system for answering analytical questions about the status of world-wide society, based on three pillars: social sensing, social mining, and the idea of trust networks and privacy-aware social mining. We discuss the ingredients of a science and a technology necessary to build the PNS upon the three mentioned pillars, beyond the limitations of their respective state-of-art. Social sensing is aimed at develo** better methods for harvesting the big data from the techno-social ecosystem and make them available for mining, learning and analysis at a properly high abstraction level.Social mining is the problem of discovering patterns and models of human behaviour from the sensed data across the various social dimensions by data mining, machine learning and social network analysis. Trusted networks and privacy-aware social mining is aimed at creating a new deal around the questions of privacy and data ownership empowering individual persons with full awareness and control on own personal data, so that users may allow access and use of their data for their own good and the common good. The PNS will provide a goal-oriented knowledge discovery framework, made of technology and people, able to configure itself to the aim of answering questions about the pulse of global society. Given an analytical request, the PNS activates a process composed by a variety of interconnected tasks exploiting the social sensing and mining methods within the transparent ecosystem provided by the trusted network.
△ Less
Submitted 2 April, 2013;
originally announced April 2013.
-
FuturICT - The Road towards Ethical ICT
Authors:
Jeroen van den Hoven,
Dirk Helbing,
Dino Pedreschi,
Josep Domingo-Ferrer,
Fosca Gianotti,
Markus Christen
Abstract:
The pervasive use of information and communication technology (ICT) in modern societies enables countless opportunities for individuals, institutions, businesses and scientists, but also raises difficult ethical and social problems. In particular, ICT helped to make societies more complex and thus harder to understand, which impedes social and political interventions to avoid harm and to increase…
▽ More
The pervasive use of information and communication technology (ICT) in modern societies enables countless opportunities for individuals, institutions, businesses and scientists, but also raises difficult ethical and social problems. In particular, ICT helped to make societies more complex and thus harder to understand, which impedes social and political interventions to avoid harm and to increase the common good. To overcome this obstacle, the large-scale EU flagship proposal FuturICT intends to create a platform for accessing global human knowledge as a public good and instruments to increase our understanding of the information society by making use of ICT-based research. In this contribution, we outline the ethical justification for such an endeavor. We argue that the ethical issues raised by FuturICT research projects overlap substantially with many of the known ethical problems emerging from ICT use in general. By referring to the notion of Value Sensitive Design, we show for the example of privacy how this core value of responsible ICT can be protected in pursuing research in the framework of FuturICT. In addition, we discuss further ethical issues and outline the institutional design of FuturICT allowing to address them.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.
-
A Classification for Community Discovery Methods in Complex Networks
Authors:
Michele Coscia,
Fosca Giannotti,
Dino Pedreschi
Abstract:
In the last few years many real-world networks have been found to show a so-called community structure organization. Much effort has been devoted in the literature to develop methods and algorithms that can efficiently highlight this hidden structure of the network, traditionally by partitioning the graph. Since network representation can be very complex and can contain different variants in the t…
▽ More
In the last few years many real-world networks have been found to show a so-called community structure organization. Much effort has been devoted in the literature to develop methods and algorithms that can efficiently highlight this hidden structure of the network, traditionally by partitioning the graph. Since network representation can be very complex and can contain different variants in the traditional graph model, each algorithm in the literature focuses on some of these properties and establishes, explicitly or implicitly, its own definition of community. According to this definition it then extracts the communities that are able to reflect only some of the features of real communities. The aim of this survey is to provide a manual for the community discovery problem. Given a meta definition of what a community in a social network is, our aim is to organize the main categories of community discovery based on their own definition of community. Given a desired definition of community and the features of a problem (size of network, direction of edges, multidimensionality, and so on) this review paper is designed to provide a set of approaches that researchers could focus on.
△ Less
Submitted 15 June, 2012;
originally announced June 2012.
-
DEMON: a Local-First Discovery Method for Overlap** Communities
Authors:
Michele Coscia,
Giulio Rossetti,
Fosca Giannotti,
Dino Pedreschi
Abstract:
Community discovery in complex networks is an interesting problem with a number of applications, especially in the knowledge extraction task in social and information networks. However, many large networks often lack a particular community organization at a global level. In these cases, traditional graph partitioning algorithms fail to let the latent knowledge embedded in modular structure emerge,…
▽ More
Community discovery in complex networks is an interesting problem with a number of applications, especially in the knowledge extraction task in social and information networks. However, many large networks often lack a particular community organization at a global level. In these cases, traditional graph partitioning algorithms fail to let the latent knowledge embedded in modular structure emerge, because they impose a top-down global view of a network. We propose here a simple local-first approach to community discovery, able to unveil the modular organization of real complex networks. This is achieved by democratically letting each node vote for the communities it sees surrounding it in its limited view of the global system, i.e. its ego neighborhood, using a label propagation algorithm; finally, the local communities are merged into a global collection. We tested this intuition against the state-of-the-art overlap** and non-overlap** community discovery methods, and found that our new method clearly outperforms the others in the quality of the obtained communities, evaluated by using the extracted communities to predict the metadata about the nodes of several real world networks. We also show how our method is deterministic, fully incremental, and has a limited time complexity, so that it can be used on web-scale real networks.
△ Less
Submitted 4 June, 2012;
originally announced June 2012.
-
Classes of Terminating Logic Programs
Authors:
Dino Pedreschi,
Salvatore Ruggieri,
Jan-Georg Smaus
Abstract:
Termination of logic programs depends critically on the selection rule, i.e. the rule that determines which atom is selected in each resolution step. In this article, we classify programs (and queries) according to the selection rules for which they terminate. This is a survey and unified view on different approaches in the literature. For each class, we present a sufficient, for most classes ev…
▽ More
Termination of logic programs depends critically on the selection rule, i.e. the rule that determines which atom is selected in each resolution step. In this article, we classify programs (and queries) according to the selection rules for which they terminate. This is a survey and unified view on different approaches in the literature. For each class, we present a sufficient, for most classes even necessary, criterion for determining that a program is in that class. We study six classes: a program strongly terminates if it terminates for all selection rules; a program input terminates if it terminates for selection rules which only select atoms that are sufficiently instantiated in their input positions, so that these arguments do not get instantiated any further by the unification; a program local delay terminates if it terminates for local selection rules which only select atoms that are bounded w.r.t. an appropriate level map**; a program left-terminates if it terminates for the usual left-to-right selection rule; a program exists-terminates if there exists a selection rule for which it terminates; finally, a program has bounded nondeterminism if it only has finitely many refutations. We propose a semantics-preserving transformation from programs with bounded nondeterminism into strongly terminating programs. Moreover, by unifying different formalisms and making appropriate assumptions, we are able to establish a formal hierarchy between the different classes.
△ Less
Submitted 22 July, 2002; v1 submitted 25 June, 2001;
originally announced June 2001.