Search | arXiv e-print repository

arXiv:2405.19212 [pdf, other]

Partial Information Decomposition for Data Interpretability and Feature Selection

Authors: Charles Westphal, Stephen Hailes, Mirco Musolesi

Abstract: In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the… ▽ More In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlap** information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience. △ Less

Submitted 7 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.13551 [pdf, other]

Large Language Models are Effective Priors for Causal Graph Discovery

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Causal structure discovery from observations can be improved by integrating background knowledge provided by an expert to reduce the hypothesis space. Recently, Large Language Models (LLMs) have begun to be considered as sources of prior information given the low cost of querying them relative to a human expert. In this work, firstly, we propose a set of metrics for assessing LLM judgments for cau… ▽ More Causal structure discovery from observations can be improved by integrating background knowledge provided by an expert to reduce the hypothesis space. Recently, Large Language Models (LLMs) have begun to be considered as sources of prior information given the low cost of querying them relative to a human expert. In this work, firstly, we propose a set of metrics for assessing LLM judgments for causal graph discovery independently of the downstream algorithm. Secondly, we systematically study a set of prompting designs that allows the model to specify priors about the structure of the causal graph. Finally, we present a general methodology for the integration of LLM priors in graph discovery algorithms, finding that they help improve performance on common-sense benchmarks and especially when used for assessing edge directionality. Our work highlights the potential as well as the shortcomings of the use of LLMs in this problem space. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.00099 [pdf, other]

Creative Beam Search: LLM-as-a-Judge For Improving Response Generation

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: Large language models are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge t… ▽ More Large language models are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step. △ Less

Submitted 9 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.06492 [pdf, other]

Graph Reinforcement Learning for Combinatorial Optimization: A Survey and Unifying Perspective

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising… ▽ More Graphs are a natural representation for systems based on relations between connected entities. Combinatorial optimization problems, which arise when considering an objective function related to a process of interest on discrete structures, are often challenging due to the rapid growth of the solution space. The trial-and-error paradigm of Reinforcement Learning has recently emerged as a promising alternative to traditional methods, such as exact algorithms and (meta)heuristics, for discovering better decision-making strategies in a variety of disciplines including chemistry, computer science, and statistics. Despite the fact that they arose in markedly different fields, these techniques share significant commonalities. Therefore, we set out to synthesize this work in a unifying perspective that we term Graph Reinforcement Learning, interpreting it as a constructive decision-making method for graph problems. After covering the relevant technical background, we review works along the dividing line of whether the goal is to optimize graph structure given a process of interest, or to optimize the outcome of the process itself under fixed graph structure. Finally, we discuss the common challenges facing the field and open research questions. In contrast with other surveys, the present work focuses on non-canonical graph problems for which performant algorithms are typically not known and Reinforcement Learning is able to provide efficient and effective solutions. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.07979 [pdf, other]

Do Agents Dream of Electric Sheep?: Improving Generalization in Reinforcement Learning through Generative Learning

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentat… ▽ More The Overfitted Brain hypothesis suggests dreams happen to allow generalization in the human brain. Here, we ask if the same is true for reinforcement learning agents as well. Given limited experience in a real environment, we use imagination-based reinforcement learning to train a policy on dream-like episodes, where non-imaginative, predicted trajectories are modified through generative augmentations. Experiments on four ProcGen environments show that, compared to classic imagination and offline training on collected experience, our method can reach a higher level of generalization when dealing with sparsely rewarded environments. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.04202 [pdf, other]

Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents

Authors: Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

Abstract: Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents. A promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely… ▽ More Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents. A promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents. However, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., caring about maximizing some outcome over time) or norm-based (i.e., focusing on conforming to a specific norm here and now). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using a Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain classes of moral agents are able to steer selfish agents towards more cooperative behavior. △ Less

Submitted 26 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.09193 [pdf, other]

(Ir)rationality and Cognitive Biases in Large Language Models

Authors: Olivia Macmillan-Scott, Mirco Musolesi

Abstract: Do large language models (LLMs) display rational reasoning? LLMs have been shown to contain human biases due to the data they have been trained on; whether this is reflected in rational reasoning remains less clear. In this paper, we answer this question by evaluating seven language models using tasks from the cognitive psychology literature. We find that, like humans, LLMs display irrationality i… ▽ More Do large language models (LLMs) display rational reasoning? LLMs have been shown to contain human biases due to the data they have been trained on; whether this is reflected in rational reasoning remains less clear. In this paper, we answer this question by evaluating seven language models using tasks from the cognitive psychology literature. We find that, like humans, LLMs display irrationality in these tasks. However, the way this irrationality is displayed does not reflect that shown by humans. When incorrect answers are given by LLMs to these tasks, they are often incorrect in ways that differ from human-like biases. On top of this, the LLMs reveal an additional layer of irrationality in the significant inconsistency of the responses. Aside from the experimental results, this paper seeks to make a methodological contribution by showing how we can assess and compare different capabilities of these types of models, in this case with respect to rational reasoning. △ Less

Submitted 15 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.11512 [pdf, other]

Information-Theoretic State Variable Selection for Reinforcement Learning

Authors: Charles Westphal, Stephen Hailes, Mirco Musolesi

Abstract: Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \tex… ▽ More Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 47 pages, 12 figures

arXiv:2312.01818 [pdf, other]

Learning Machine Morality through Experience and Interaction

Authors: Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

Abstract: Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. Traditionally, this has been done by imposing explicit top-down rules or hard constraints on systems, for example by filtering system outputs through pre-defined ethical rules. Recently, instead, entirely bottom-up methods for learn… ▽ More Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. Traditionally, this has been done by imposing explicit top-down rules or hard constraints on systems, for example by filtering system outputs through pre-defined ethical rules. Recently, instead, entirely bottom-up methods for learning implicit preferences from human behavior have become increasingly popular, such as those for training and fine-tuning Large Language Models. In this paper, we provide a systematization of existing approaches to the problem of introducing morality in machines - modeled as a continuum, and argue that the majority of popular techniques lie at the extremes - either being fully hard-coded, or entirely learned, where no explicit statement of any moral principle is required. Given the relative strengths and weaknesses of each type of methodology, we argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents. In particular, we present three case studies of recent works which use learning from experience (i.e., Reinforcement Learning) to explicitly provide moral principles to learning agents - either as intrinsic rewards, moral logical constraints or textual principles for language models. For example, using intrinsic rewards in Social Dilemma games, we demonstrate how it is possible to represent classical moral frameworks for agents. We also present an overview of the existing work in this area in order to provide empirical evidence for the potential of this hybrid approach. We then discuss strategies for evaluating the effectiveness of moral learning agents. Finally, we present open research questions and implications for the future of AI safety and ethics which are emerging from this framework. △ Less

Submitted 19 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.17165 [pdf, ps, other]

(Ir)rationality in AI: State of the Art, Research Challenges and Open Questions

Authors: Olivia Macmillan-Scott, Mirco Musolesi

Abstract: The concept of rationality is central to the field of artificial intelligence. Whether we are seeking to simulate human reasoning, or the goal is to achieve bounded optimality, we generally seek to make artificial agents as rational as possible. Despite the centrality of the concept within AI, there is no unified definition of what constitutes a rational agent. This article provides a survey of ra… ▽ More The concept of rationality is central to the field of artificial intelligence. Whether we are seeking to simulate human reasoning, or the goal is to achieve bounded optimality, we generally seek to make artificial agents as rational as possible. Despite the centrality of the concept within AI, there is no unified definition of what constitutes a rational agent. This article provides a survey of rationality and irrationality in artificial intelligence, and sets out the open questions in this area. The understanding of rationality in other fields has influenced its conception within artificial intelligence, in particular work in economics, philosophy and psychology. Focusing on the behaviour of artificial agents, we consider irrational behaviours that can prove to be optimal in certain scenarios. Some methods have been developed to deal with irrational agents, both in terms of identification and interaction, however work in this area remains limited. Methods that have up to now been developed for other purposes, namely adversarial scenarios, may be adapted to suit interactions with artificial agents. We further discuss the interplay between human and artificial agents, and the role that rationality plays within this interaction; many questions remain in this area, relating to potentially irrational behaviour of both humans and artificial agents. △ Less

Submitted 14 February, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.10026 [pdf, other]

Guaranteeing Control Requirements via Reward Sha** in Reinforcement Learning

Authors: Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi, Mario di Bernardo

Abstract: In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward sha** procedure that (i) ensures the… ▽ More In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward sha** procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements. △ Less

Submitted 20 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.13576 [pdf, other]

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose CD-UCT, a model-based reinforcement learning method for causal discovery based on tree search that builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce… ▽ More Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose CD-UCT, a model-based reinforcement learning method for causal discovery based on tree search that builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce cycles, which enables deeper discrete search and sampling in DAG space. The proposed method can be applied broadly to causal Bayesian networks with both discrete and continuous random variables. We conduct a comprehensive evaluation on synthetic and real-world datasets, showing that CD-UCT substantially outperforms the state-of-the-art model-free reinforcement learning technique and greedy search, constituting a promising advancement for combinatorial methods. △ Less

Submitted 13 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

arXiv:2309.05378 [pdf, other]

doi 10.1609/aaaiss.v2i1.27644

Steps Towards Satisficing Distributed Dynamic Team Trust

Authors: Edmund R. Hunt, Chris Baber, Mehdi Sobhani, Sanja Milivojevic, Sagir Yusuf, Mirco Musolesi, Patrick Waterson, Sally Maynard

Abstract: Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable… ▽ More Defining and measuring trust in dynamic, multiagent teams is important in a range of contexts, particularly in defense and security domains. Team members should be trusted to work towards agreed goals and in accordance with shared values. In this paper, our concern is with the definition of goals and values such that it is possible to define 'trust' in a way that is interpretable, and hence usable, by both humans and robots. We argue that the outcome of team activity can be considered in terms of 'goal', 'individual/team values', and 'legal principles'. We question whether alignment is possible at the level of 'individual/team values', or only at the 'goal' and 'legal principles' levels. We argue for a set of metrics to define trust in human-robot teams that are interpretable by human or robot team members, and consider an experiment that could demonstrate the notion of 'satisficing trust' over the course of a simulated mission. △ Less

Submitted 4 November, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2308.10721 [pdf, other]

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision-Making

Authors: Giovanni Minelli, Mirco Musolesi

Abstract: Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent dec… ▽ More Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental approach as effective technique for improving coordination in multi-agent systems. △ Less

Submitted 9 June, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.00031 [pdf, other]

doi 10.1613/jair.1.15278

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will dis… ▽ More Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area. △ Less

Submitted 8 February, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: Published in JAIR at https://www.jair.org/index.php/jair/article/view/15278

Journal ref: JAIR 79 (2024) 417-446

arXiv:2306.01158 [pdf, other]

Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

Authors: Lorenz Wolf, Mirco Musolesi

Abstract: Existing modular Reinforcement Learning (RL) architectures are generally based on reusable components, also allowing for ``plug-and-play'' integration. However, these modules are homogeneous in nature - in fact, they essentially provide policies obtained via RL through the maximization of individual reward functions. Consequently, such solutions still lack the ability to integrate and process mult… ▽ More Existing modular Reinforcement Learning (RL) architectures are generally based on reusable components, also allowing for ``plug-and-play'' integration. However, these modules are homogeneous in nature - in fact, they essentially provide policies obtained via RL through the maximization of individual reward functions. Consequently, such solutions still lack the ability to integrate and process multiple types of information (i.e., heterogeneous knowledge representations), such as rules, sub-goals, and skills from various sources. In this paper, we discuss several practical examples of heterogeneous knowledge and propose Augmented Modular Reinforcement Learning (AMRL) to address these limitations. Our framework uses a selector to combine heterogeneous modules and seamlessly incorporate different types of knowledge representations and processing mechanisms. Our results demonstrate the performance and efficiency improvements, also in terms of generalization, that can be achieved by augmenting traditional modular RL with heterogeneous knowledge sources and processing mechanisms. △ Less

Submitted 14 April, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: 16 pages, 4 figures

arXiv:2304.00008 [pdf, other]

On the Creativity of Large Language Models

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. One of the most remarkable applications is creative writing, e.g., poetry or storytelling: the generated outputs are often of astonishing quality. However, a natural question arises: can LLMs be really considered creative? In this article we firstly analyze the development of LLMs under the lens of creativit… ▽ More Large Language Models (LLMs) are revolutionizing several areas of Artificial Intelligence. One of the most remarkable applications is creative writing, e.g., poetry or storytelling: the generated outputs are often of astonishing quality. However, a natural question arises: can LLMs be really considered creative? In this article we firstly analyze the development of LLMs under the lens of creativity theories, investigating the key open questions and challenges. In particular, we focus our discussion around the dimensions of value, novelty and surprise as proposed by Margaret Boden in her work. Then, we consider different classic perspectives, namely product, process, press and person. We discuss a set of ``easy'' and ``hard'' problems in machine creativity, presenting them in relation to LLMs. Finally, we examine the societal impact of these technologies with a particular focus on the creative industries, analyzing the opportunities offered by them, the challenges arising by them and the potential associated risks, from both legal and ethical points of view. △ Less

Submitted 9 July, 2023; v1 submitted 27 March, 2023; originally announced April 2023.

arXiv:2301.08491 [pdf, other]

doi 10.24963/ijcai.2023/36

Modeling Moral Choices in Social Dilemmas with Multi-Agent Reinforcement Learning

Authors: Elizaveta Tennant, Stephen Hailes, Mirco Musolesi

Abstract: Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and develo** ethical b… ▽ More Practical uses of Artificial Intelligence (AI) in the real world have demonstrated the importance of embedding moral choices into intelligent agents. They have also highlighted that defining top-down ethical constraints on AI according to any one type of morality is extremely challenging and can pose risks. A bottom-up learning approach may be more appropriate for studying and develo** ethical behavior in AI agents. In particular, we believe that an interesting and insightful starting point is the analysis of emergent behavior of Reinforcement Learning (RL) agents that act according to a predefined set of moral rewards in social dilemmas. In this work, we present a systematic analysis of the choices made by intrinsically-motivated RL agents whose rewards are based on moral theories. We aim to design reward structures that are simplified yet representative of a set of key ethical systems. Therefore, we first define moral reward functions that distinguish between consequence- and norm-based agents, between morality based on societal norms or internal virtues, and between single- and mixed-virtue (e.g., multi-objective) methodologies. Then, we evaluate our approach by modeling repeated dyadic interactions between learning moral agents in three iterated social dilemma games (Prisoner's Dilemma, Volunteer's Dilemma and Stag Hunt). We analyze the impact of different types of morality on the emergence of cooperation, defection or exploitation, and the corresponding social outcomes. Finally, we discuss the implications of these findings for the development of moral agents in artificial and mixed human-AI societies. △ Less

Submitted 30 August, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

Comments: Accepted at IJCAI 2023 (32nd International Joint Conference on Artificial Intelligence - Macao, S.A.R.)

arXiv:2301.08278 [pdf, other]

Investigating the Impact of Direct Punishment on the Emergence of Cooperation in Multi-Agent Reinforcement Learning Systems

Authors: Nayana Dasgupta, Mirco Musolesi

Abstract: Solving the problem of cooperation is fundamentally important for the creation and maintenance of functional societies. Problems of cooperation are omnipresent within human society, with examples ranging from navigating busy road junctions to negotiating treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents capable of navigating these comple… ▽ More Solving the problem of cooperation is fundamentally important for the creation and maintenance of functional societies. Problems of cooperation are omnipresent within human society, with examples ranging from navigating busy road junctions to negotiating treaties. As the use of AI becomes more pervasive throughout society, the need for socially intelligent agents capable of navigating these complex cooperative dilemmas is becoming increasingly evident. Direct punishment is a ubiquitous social mechanism that has been shown to foster the emergence of cooperation in both humans and non-humans. In the natural world, direct punishment is often strongly coupled with partner selection and reputation and used in conjunction with third-party punishment. The interactions between these mechanisms could potentially enhance the emergence of cooperation within populations. However, no previous work has evaluated the learning dynamics and outcomes emerging from Multi-Agent Reinforcement Learning (MARL) populations that combine these mechanisms. This paper addresses this gap. It presents a comprehensive analysis and evaluation of the behaviors and learning dynamics associated with direct punishment, third-party punishment, partner selection, and reputation. Finally, we discuss the implications of using these mechanisms on the design of cooperative AI systems. △ Less

Submitted 17 June, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: 50 pages, 19 figures

arXiv:2212.01343 [pdf, other]

CT-DQN: Control-Tutored Deep Reinforcement Learning

Authors: Francesco De Lellis, Marco Coraggio, Giovanni Russo, Mirco Musolesi, Mario di Bernardo

Abstract: One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approxima… ▽ More One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2209.05208 [pdf, other]

Graph Neural Modeling of Network Flows

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Network flow problems, which involve distributing traffic such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Among them, the general Multi-Commodity Network Flow (MCNF) problem concerns the distribution of multiple flows of different sizes between several sources and sinks, while achieving effective utilization of the links. Due to the appe… ▽ More Network flow problems, which involve distributing traffic such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Among them, the general Multi-Commodity Network Flow (MCNF) problem concerns the distribution of multiple flows of different sizes between several sources and sinks, while achieving effective utilization of the links. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. In this paper, we propose a novel graph learning architecture for network flow problems called Per-Edge Weights (PEW). This method builds on a Graph Attention Network and uses distinctly parametrized message functions along each link. We extensively evaluate the proposed solution through an Internet flow routing case study using $17$ Service Provider topologies and $2$ routing schemes. We show that PEW yields substantial gains over architectures whose global message function constrains the routing unnecessarily. We also find that an MLP is competitive with other standard architectures. Furthermore, we analyze the relationship between graph structure and predictive performance for data-driven routing of flows, an aspect that has not been considered by existing work in the area. △ Less

Submitted 18 March, 2024; v1 submitted 12 September, 2022; originally announced September 2022.

arXiv:2205.13578 [pdf, other]

Dynamic Network Reconfiguration for Entropy Maximization using Deep Reinforcement Learning

Authors: Christoffel Doorman, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: A key problem in network theory is how to reconfigure a graph in order to optimize a quantifiable objective. Given the ubiquity of networked systems, such work has broad practical applications in a variety of situations, ranging from drug and material design to telecommunications. The large decision space of possible reconfigurations, however, makes this problem computationally intensive. In this… ▽ More A key problem in network theory is how to reconfigure a graph in order to optimize a quantifiable objective. Given the ubiquity of networked systems, such work has broad practical applications in a variety of situations, ranging from drug and material design to telecommunications. The large decision space of possible reconfigurations, however, makes this problem computationally intensive. In this paper, we cast the problem of network rewiring for optimizing a specified structural property as a Markov Decision Process (MDP), in which a decision-maker is given a budget of modifications that are performed sequentially. We then propose a general approach based on the Deep Q-Network (DQN) algorithm and graph neural networks (GNNs) that can efficiently learn strategies for rewiring networks. We then discuss a cybersecurity case study, i.e., an application to the computer network reconfiguration problem for intrusion protection. In a typical scenario, an attacker might have a (partial) map of the system they plan to penetrate; if the network is effectively "scrambled", they would not be able to navigate it since their prior knowledge would become obsolete. This can be viewed as an entropy maximization problem, in which the goal is to increase the surprise of the network. Indeed, entropy acts as a proxy measurement of the difficulty of navigating the network topology. We demonstrate the general ability of the proposed method to obtain better entropy gains than random rewiring on synthetic and real-world graphs while being computationally inexpensive, as well as being able to generalize to larger graphs than those seen during training. Simulations of attack scenarios confirm the effectiveness of the learned rewiring strategies. △ Less

Submitted 27 January, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: 10 pages, 6 figures, 1 appendix

Journal ref: Proceedings of the First Learning on Graphs Conference (LoG 2022), PMLR 198:49:1-49:15

arXiv:2205.12880 [pdf, other]

Trust-based Consensus in Multi-Agent Reinforcement Learning Systems

Authors: Ho Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: An often neglected issue in multi-agent reinforcement learning (MARL) is the potential presence of unreliable agents in the environment whose deviations from expected behavior can prevent a system from accomplishing its intended tasks. In particular, consensus is a fundamental underpinning problem of cooperative distributed multi-agent systems. Consensus requires different agents, situated in a de… ▽ More An often neglected issue in multi-agent reinforcement learning (MARL) is the potential presence of unreliable agents in the environment whose deviations from expected behavior can prevent a system from accomplishing its intended tasks. In particular, consensus is a fundamental underpinning problem of cooperative distributed multi-agent systems. Consensus requires different agents, situated in a decentralized communication network, to reach an agreement out of a set of initial proposals that they put forward. Learning-based agents should adopt a protocol that allows them to reach consensus despite having one or more unreliable agents in the system. This paper investigates the problem of unreliable agents in MARL, considering consensus as a case study. Echoing established results in the distributed systems literature, our experiments show that even a moderate fraction of such agents can greatly impact the ability of reaching consensus in a networked environment. We propose Reinforcement Learning-based Trusted Consensus (RLTC), a decentralized trust mechanism, in which agents can independently decide which neighbors to communicate with. We empirically demonstrate that our trust mechanism is able to handle unreliable agents effectively, as evidenced by higher consensus success rates. △ Less

Submitted 30 May, 2024; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accepted for publication in proceedings of the first Reinforcement Learning Conference (RLC 2024)

arXiv:2201.06118 [pdf, other]

doi 10.3233/IA-220136

DeepCreativity: Measuring Creativity with Deep Learning Techniques

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: Measuring machine creativity is one of the most fascinating challenges in Artificial Intelligence. This paper explores the possibility of using generative learning techniques for automatic assessment of creativity. The proposed solution does not involve human judgement, it is modular and of general applicability. We introduce a new measure, namely DeepCreativity, based on Margaret Boden's definiti… ▽ More Measuring machine creativity is one of the most fascinating challenges in Artificial Intelligence. This paper explores the possibility of using generative learning techniques for automatic assessment of creativity. The proposed solution does not involve human judgement, it is modular and of general applicability. We introduce a new measure, namely DeepCreativity, based on Margaret Boden's definition of creativity as composed by value, novelty and surprise. We evaluate our methodology (and related measure) considering a case study, i.e., the generation of 19th century American poetry, showing its effectiveness and expressiveness. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: 12 pages, 2 figures

Journal ref: Intelligenza Artificiale 16, 2 (2022), 151-163

arXiv:2112.06018 [pdf, other]

Control-Tutored Reinforcement Learning: Towards the Integration of Data-Driven and Model-Based Control

Authors: F. De Lellis, M. Coraggio, G. Russo, M. Musolesi, M. di Bernardo

Abstract: We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency. This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours. The former is based on defining the reward function so that a Boolean condition can be used to determine when the contro… ▽ More We present an architecture where a feedback controller derived on an approximate model of the environment assists the learning process to enhance its data efficiency. This architecture, which we term as Control-Tutored Q-learning (CTQL), is presented in two alternative flavours. The former is based on defining the reward function so that a Boolean condition can be used to determine when the control tutor policy is adopted, while the latter, termed as probabilistic CTQL (pCTQL), is instead based on executing calls to the tutor with a certain probability during learning. Both approaches are validated, and thoroughly benchmarked against Q-Learning, by considering the stabilization of an inverted pendulum as defined in OpenAI Gym as a representative problem. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2111.08078 [pdf, other]

Regional Topics in British Grocery Retail Transactions

Authors: Mariflor Vega Carrasco, Mirco Musolesi, Jason O'Sullivan, Rosie Prior, Ioanna Manolopoulou

Abstract: Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shop** needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their correspo… ▽ More Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shop** needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their corresponding store and regional context. Topic models provide a powerful tool in the analysis of transactional data, identifying topics that display frequently-bought-together products and summarising transactions as mixtures of topics. We use the Segmented Topic Model (STM) to capture customer behaviours that are nested within stores. STM not only provides topics and transaction summaries but also topical summaries at the store level that can be used to identify regional topics. We summarised the posterior distribution of STM by post-processing multiple posterior samples and selecting semantic modes represented as recurrent topics. We use linear Gaussian process regression to model topic prevalence across British territory while accounting for spatial autocorrelation. We implement our methods on a dataset of transactional data from a major UK grocery retailer and demonstrate that shop** behaviours may vary regionally and nearby stores tend to exhibit similar regional demand. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 22 pages

arXiv:2109.08236 [pdf, other]

Reinforcement Learning on Encrypted Data

Authors: Alberto Jesu, Victor-Alexandru Darvariu, Alessandro Staffolani, Rebecca Montanari, Mirco Musolesi

Abstract: The growing number of applications of Reinforcement Learning (RL) in real-world domains has led to the development of privacy-preserving techniques due to the inherently sensitive nature of data. Most existing works focus on differential privacy, in which information is revealed in the clear to an agent whose learned model should be robust against information leakage to malicious third parties. Mo… ▽ More The growing number of applications of Reinforcement Learning (RL) in real-world domains has led to the development of privacy-preserving techniques due to the inherently sensitive nature of data. Most existing works focus on differential privacy, in which information is revealed in the clear to an agent whose learned model should be robust against information leakage to malicious third parties. Motivated by use cases in which only encrypted data might be shared, such as information from sensitive sites, in this work we consider scenarios in which the inputs themselves are sensitive and cannot be revealed. We develop a simple extension to the MDP framework which provides for the encryption of states. We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces. Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2106.06768 [pdf, other]

Planning Spatial Networks with Monte Carlo Tree Search

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: We tackle the problem of goal-directed graph construction: given a starting graph, a budget of modifications, and a global objective function, the aim is to find a set of edges whose addition to the graph achieves the maximum improvement in the objective (e.g., communication efficiency). This problem emerges in many networks of great importance for society such as transportation and critical infra… ▽ More We tackle the problem of goal-directed graph construction: given a starting graph, a budget of modifications, and a global objective function, the aim is to find a set of edges whose addition to the graph achieves the maximum improvement in the objective (e.g., communication efficiency). This problem emerges in many networks of great importance for society such as transportation and critical infrastructure networks. We identify two significant shortcomings with present methods. Firstly, they focus exclusively on network topology while ignoring spatial information; however, in many real-world networks, nodes are embedded in space, which yields different global objectives and governs the range and density of realizable connections. Secondly, existing RL methods scale poorly to large networks due to the high cost of training a model and the scaling factors of the action space and global objectives. In this work, we formulate this problem as a deterministic MDP. We adopt the Monte Carlo Tree Search framework for planning in this domain, prioritizing the optimality of final solutions over the speed of policy evaluation. We propose several improvements over the standard UCT algorithm for this family of problems, addressing their single-agent nature, the trade-off between the costs of edges and their contribution to the objective, and an action space linear in the number of nodes. We demonstrate the suitability of this approach for improving the global efficiency and attack resilience of a variety of synthetic and real-world networks, including Internet backbone networks and metro systems. Our approach obtains a 24% improvement in these metrics compared to UCT on the largest networks tested and scalability superior to previous methods. △ Less

Submitted 16 February, 2022; v1 submitted 12 June, 2021; originally announced June 2021.

arXiv:2106.06762 [pdf, other]

Solving Graph-based Public Good Games with Tree Search and Imitation Learning

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Public goods games represent insightful settings for studying incentives for individual agents to make contributions that, while costly for each of them, benefit the wider society. In this work, we adopt the perspective of a central planner with a global view of a network of self-interested agents and the goal of maximizing some desired property in the context of a best-shot public goods game. Exi… ▽ More Public goods games represent insightful settings for studying incentives for individual agents to make contributions that, while costly for each of them, benefit the wider society. In this work, we adopt the perspective of a central planner with a global view of a network of self-interested agents and the goal of maximizing some desired property in the context of a best-shot public goods game. Existing algorithms for this known NP-complete problem find solutions that are sub-optimal and cannot optimize for criteria other than social welfare. In order to efficiently solve public goods games, our proposed method directly exploits the correspondence between equilibria and the Maximal Independent Set (mIS) structural property of graphs. In particular, we define a Markov Decision Process which incrementally generates an mIS, and adopt a planning method to search for equilibria, outperforming existing methods. Furthermore, we devise a graph imitation learning technique that uses demonstrations of the search to obtain a graph neural network parametrized policy which quickly generalizes to unseen game instances. Our evaluation results show that this policy is able to reach 99.5% of the performance of the planning method while being three orders of magnitude faster to evaluate on the largest graphs tested. The methods presented in this work can be applied to a large class of public goods games of potentially high societal impact and more broadly to other graph combinatorial optimization problems. △ Less

Submitted 26 October, 2021; v1 submitted 12 June, 2021; originally announced June 2021.

Comments: To appear in Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2105.09266 [pdf, ps, other]

doi 10.1017/dap.2022.10

Copyright in Generative Deep Learning

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: Machine-generated artworks are now part of the contemporary art scene: they are attracting significant investments and they are presented in exhibitions together with those created by human artists. These artworks are mainly based on generative deep learning techniques, which have seen a formidable development and remarkable refinement in the very recent years. Given the inherent characteristics o… ▽ More Machine-generated artworks are now part of the contemporary art scene: they are attracting significant investments and they are presented in exhibitions together with those created by human artists. These artworks are mainly based on generative deep learning techniques, which have seen a formidable development and remarkable refinement in the very recent years. Given the inherent characteristics of these techniques, a series of novel legal problems arise. In this article, we consider a set of key questions in the area of generative deep learning for the arts, including the following: is it possible to use copyrighted works as training set for generative models? How do we legally store their copies in order to perform the training process? Who (if someone) will own the copyright on the generated data? We try to answer these questions considering the law in force in both the United States of America and the European Union, and potential future alternatives. We then extend our analysis to code generation, which is an emerging area of generative deep learning. Finally, we also formulate a set of practical guidelines for artists and developers working on deep learning generated art, as well as some policy suggestions for policymakers. △ Less

Submitted 21 September, 2021; v1 submitted 19 May, 2021; originally announced May 2021.

Comments: 16 pages. Second version contains updates after entry into force of EU's directive on copyright in the Digital Single Market, and corrections of typos. Third version contains a new section about GitHub Copilot and its copyright implications. Fourth version contains improvements in abstract, introduction and conclusions, and a general rearrangement of the central sections

Journal ref: Data & Policy. 2022;4:e17

arXiv:2104.02726 [pdf, other]

doi 10.1145/3664595

Creativity and Machine Learning: A Survey

Authors: Giorgio Franceschelli, Mirco Musolesi

Abstract: There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the cu… ▽ More There is a growing interest in the area of machine learning and creativity. This survey presents an overview of the history and the state of the art of computational creativity theories, key machine learning techniques (including generative deep learning), and corresponding automatic evaluation methods. After presenting a critical discussion of the key contributions in this area, we outline the current research challenges and emerging opportunities in this field. △ Less

Submitted 13 May, 2024; v1 submitted 6 April, 2021; originally announced April 2021.

Comments: Just accepted in ACM Computing Surveys (see https://dl.acm.org/doi/10.1145/3664595)

arXiv:2102.07523 [pdf, other]

Cooperation and Reputation Dynamics with Reinforcement Learning

Authors: Nicolas Anastassacos, Julian García, Stephen Hailes, Mirco Musolesi

Abstract: Creating incentives for cooperation is a challenge in natural and artificial systems. One potential answer is reputation, whereby agents trade the immediate cost of cooperation for the future benefits of having a good reputation. Game theoretical models have shown that specific social norms can make cooperation stable, but how agents can independently learn to establish effective reputation mechan… ▽ More Creating incentives for cooperation is a challenge in natural and artificial systems. One potential answer is reputation, whereby agents trade the immediate cost of cooperation for the future benefits of having a good reputation. Game theoretical models have shown that specific social norms can make cooperation stable, but how agents can independently learn to establish effective reputation mechanisms on their own is less understood. We use a simple model of reinforcement learning to show that reputation mechanisms generate two coordination problems: agents need to learn how to coordinate on the meaning of existing reputations and collectively agree on a social norm to assign reputations to others based on their behavior. These coordination problems exhibit multiple equilibria, some of which effectively establish cooperation. When we train agents with a standard Q-learning algorithm in an environment with the presence of reputation mechanisms, convergence to undesirable equilibria is widespread. We propose two mechanisms to alleviate this: (i) seeding a proportion of the system with fixed agents that steer others towards good equilibria; and (ii), intrinsic rewards based on the idea of introspection, i.e., augmenting agents' rewards by an amount proportionate to the performance of their own strategy against themselves. A combination of these simple mechanisms is successful in stabilizing cooperation, even in a fully decentralized version of the problem where agents learn to use and assign reputations simultaneously. We show how our results relate to the literature in Evolutionary Game Theory, and discuss implications for artificial, human and hybrid systems, where reputations can be used as a way to establish trust and cooperation. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: Published in AAMAS'21, 9 pages

arXiv:2005.10125 [pdf, other]

Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

Authors: Mariflor Vega-Carrasco, Jason O'sullivan, Rosie Prior, Ioanna Manolopoulou, Mirco Musolesi

Abstract: Understanding the shop** motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shop** transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while kee** interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a… ▽ More Understanding the shop** motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shop** transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while kee** interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shop** motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain. △ Less

Submitted 24 February, 2021; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: 20 pages, 9 figures

arXiv:2001.11279 [pdf, other]

doi 10.1098/rspa.2021.0168

Goal-directed graph construction using reinforcement learning

Authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi

Abstract: Graphs can be used to represent and reason about systems and a variety of metrics have been devised to quantify their global characteristics. However, little is currently known about how to construct a graph or improve an existing one given a target objective. In this work, we formulate the construction of a graph as a decision-making process in which a central agent creates topologies by trial an… ▽ More Graphs can be used to represent and reason about systems and a variety of metrics have been devised to quantify their global characteristics. However, little is currently known about how to construct a graph or improve an existing one given a target objective. In this work, we formulate the construction of a graph as a decision-making process in which a central agent creates topologies by trial and error and receives rewards proportional to the value of the target objective. By means of this conceptual framework, we propose an algorithm based on reinforcement learning and graph neural networks to learn graph construction and improvement strategies. Our core case study focuses on robustness to failures and attacks, a property relevant for the infrastructure and communication networks that power modern society. Experiments on synthetic and real-world graphs show that this approach can outperform existing methods while being cheaper to evaluate. It also allows generalization to out-of-sample graphs, as well as to larger out-of-distribution graphs in some cases. The approach is applicable to the optimization of other global structural properties of graphs. △ Less

Submitted 27 October, 2021; v1 submitted 30 January, 2020; originally announced January 2020.

Journal ref: Proceedings of the Royal Society A (2021)

arXiv:1912.07662 [pdf, other]

Graph Input Representations for Machine Learning Applications in Urban Network Analysis

Authors: Alessio Pagani, Abhinav Mehrotra, Mirco Musolesi

Abstract: Understanding and learning the characteristics of network paths has been of particular interest for decades and has led to several successful applications. Such analysis becomes challenging for urban networks as their size and complexity are significantly higher compared to other networks. The state-of-the-art machine learning (ML) techniques allow us to detect hidden patterns and, thus, infer the… ▽ More Understanding and learning the characteristics of network paths has been of particular interest for decades and has led to several successful applications. Such analysis becomes challenging for urban networks as their size and complexity are significantly higher compared to other networks. The state-of-the-art machine learning (ML) techniques allow us to detect hidden patterns and, thus, infer the features associated with them. However, very little is known about the impact on the performance of such predictive models by the use of different input representations. In this paper, we design and evaluate six different graph input representations (i.e., representations of the network paths), by considering the network's topological and temporal characteristics, for being used as inputs for machine learning models to learn the behavior of urban networks paths. The representations are validated and then tested with a real-world taxi journeys dataset predicting the tips using a road network of New York. Our results demonstrate that the input representations that use temporal information help the model to achieve the highest accuracy (RMSE of 1.42$). △ Less

Submitted 11 December, 2019; originally announced December 2019.

arXiv:1904.03004 [pdf, ps, other]

Quantifying Networked Resilience via Multi-Scale Feedback Loops in Water Distribution Networks

Authors: Alessio Pagani, Fanlin Meng, Guangtao Fu, Mirco Musolesi, Weisi Guo

Abstract: Water distribution networks (WDNs) are one of the most important man-made infrastructures. Resilience, the ability to respond to disturbances and recover to a desirable state, is of vital importance to our society. There is increasing evidence that the resilience of networked infrastructures with dynamic signals depends on their network topological properties. Whilst existing literature has examin… ▽ More Water distribution networks (WDNs) are one of the most important man-made infrastructures. Resilience, the ability to respond to disturbances and recover to a desirable state, is of vital importance to our society. There is increasing evidence that the resilience of networked infrastructures with dynamic signals depends on their network topological properties. Whilst existing literature has examined a variety of complex network centrality and spectral properties, very little attention has been given to understand multi-scale flow-based feedback loops and its impact on overall stability. Here, resilience arising from multi-scale feedback loops is inferred using a proxy measure called trophic coherence. This is hypothesised to have a strong impact on both the pressure deficit and the water age. Our results show that trophic coherence is positively correlated with the time to recovery (0.62 to 0.52), while it is negatively correlated with the diffusion of a disruption (-0.66 to -0.52). Finally, we apply random forest analysis to combine resilience measures, showing that the new resilience ensemble provides a more accurate measure for networked resilience. △ Less

Submitted 16 May, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

arXiv:1902.03185 [pdf, other]

Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

Authors: Nicolas Anastassacos, Stephen Hailes, Mirco Musolesi

Abstract: Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both… ▽ More Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society. △ Less

Submitted 28 November, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: 8

Report number: Published in AAAI'20

arXiv:1809.10007 [pdf, other]

Learning through Probing: a decentralized reinforcement learning architecture for social dilemmas

Authors: Nicolas Anastassacos, Mirco Musolesi

Abstract: Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism… ▽ More Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter eta to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma (IPD), LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the IPD. △ Less

Submitted 22 December, 2018; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: 9 pages, 4 figures

arXiv:1806.01599 [pdf, other]

doi 10.1140/epjds/s13688-018-0142-z

Predicting the temporal activity patterns of new venues

Authors: Krittika D'Silva, Anastasios Noulas, Mirco Musolesi, Cecilia Mascolo, Max Sklar

Abstract: Estimating revenue and business demand of a newly opened venue is paramount as these early stages often involve critical decisions such as first rounds of staffing and resource allocation. Traditionally, this estimation has been performed through coarse-grained measures such as observing numbers in local venues or venues at similar places (e.g., coffee shops around another station in the same city… ▽ More Estimating revenue and business demand of a newly opened venue is paramount as these early stages often involve critical decisions such as first rounds of staffing and resource allocation. Traditionally, this estimation has been performed through coarse-grained measures such as observing numbers in local venues or venues at similar places (e.g., coffee shops around another station in the same city). The advent of crowdsourced data from devices and services carried by individuals on a daily basis has opened up the possibility of performing better predictions of temporal visitation patterns for locations and venues. In this paper, using mobility data from Foursquare, a location-centric platform, we treat venue categories as proxies for urban activities and analyze how they become popular over time. The main contribution of this work is a prediction framework able to use characteristic temporal signatures of places together with k-nearest neighbor metrics capturing similarities among urban regions, to forecast weekly popularity dynamics of a new venue establishment in a city neighborhood. We further show how we are able to forecast the popularity of the new venue after one month following its opening by using locality and temporal similarity as features. For the evaluation of our approach we focus on London. We show that temporally similar areas of the city can be successfully used as inputs of predictions of the visit patterns of new venues, with an improvement of 41% compared to a random selection of wards as a training set for the prediction task. We apply these concepts of temporally similar areas and locality to the real-time predictions related to new venues and show that these features can effectively be used to predict the future trends of a venue. Our findings have the potential to impact the design of location-based technologies and decisions made by new business owners. △ Less

Submitted 5 June, 2018; originally announced June 2018.

Journal ref: EPJ Data Sci. 7 (1) 13 (2018)

arXiv:1803.10133 [pdf, other]

You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information

Authors: Beatrice Perez, Mirco Musolesi, Gianluca Stringhini

Abstract: Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study… ▽ More Metadata are associated to most of the information we produce in our daily interactions and communication in the digital world. Yet, surprisingly, metadata are often still catergorized as non-sensitive. Indeed, in the past, researchers and practitioners have mainly focused on the problem of the identification of a user from the content of a message. In this paper, we use Twitter as a case study to quantify the uniqueness of the association between metadata and user identity and to understand the effectiveness of potential obfuscation strategies. More specifically, we analyze atomic fields in the metadata and systematically combine them in an effort to classify new tweets as belonging to an account using different machine learning algorithms of increasing complexity. We demonstrate that through the application of a supervised learning algorithm, we are able to identify any user in a group of 10,000 with approximately 96.7% accuracy. Moreover, if we broaden the scope of our search and consider the 10 most likely candidates we increase the accuracy of the model to 99.22%. We also found that data obfuscation is hard and ineffective for this type of data: even after perturbing 60% of the training data, it is still possible to classify users with an accuracy higher than 95%. These results have strong implications in terms of the design of metadata obfuscation strategies, for example for data set release, not only for Twitter, but, more generally, for most social media platforms. △ Less

Submitted 14 May, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

Comments: 11 pages, 13 figures. Published in the Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM 2018). June 2018. Stanford, CA, USA

arXiv:1711.10171 [pdf, other]

Intelligent Notification Systems: A Survey of the State of the Art and Research Challenges

Authors: Abhinav Mehrotra, Mirco Musolesi

Abstract: Notifications provide a unique mechanism for increasing the effectiveness of real-time information delivery systems. However, notifications that demand users' attention at inopportune moments are more likely to have adverse effects and might become a cause of potential disruption rather than proving beneficial to users. In order to address these challenges a variety of intelligent notification mec… ▽ More Notifications provide a unique mechanism for increasing the effectiveness of real-time information delivery systems. However, notifications that demand users' attention at inopportune moments are more likely to have adverse effects and might become a cause of potential disruption rather than proving beneficial to users. In order to address these challenges a variety of intelligent notification mechanisms based on monitoring and learning users' behavior have been proposed. The goal of such mechanisms is maximizing users' receptivity to the delivered information by automatically inferring the right time and the right context for sending a certain type of information. This article provides an overview of the current state of the art in the area of intelligent notification mechanisms that relies on the awareness of users' context and preferences. More specifically, we first present a survey of studies focusing on understanding and modeling users' interruptibility and receptivity to notifications from desktops and mobile devices. Then, we discuss the existing challenges and opportunities in develo** mechanisms for intelligent notification systems in a variety of application scenarios. △ Less

Submitted 2 January, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

arXiv:1711.06350 [pdf, other]

Towards Deep Learning Models for Psychological State Prediction using Smartphone Data: Challenges and Opportunities

Authors: Gatis Mikelsons, Matthew Smith, Abhinav Mehrotra, Mirco Musolesi

Abstract: There is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals' mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for… ▽ More There is an increasing interest in exploiting mobile sensing technologies and machine learning techniques for mental health monitoring and intervention. Researchers have effectively used contextual information, such as mobility, communication and mobile phone usage patterns for quantifying individuals' mood and wellbeing. In this paper, we investigate the effectiveness of neural network models for predicting users' level of stress by using the location information collected by smartphones. We characterize the mobility patterns of individuals using the GPS metrics presented in the literature and employ these metrics as input to the network. We evaluate our approach on the open-source StudentLife dataset. Moreover, we discuss the challenges and trade-offs involved in building machine learning models for digital mental health and highlight potential future work in this direction. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: 6 pages, 2 figures, In Proceedings of the NIPS Workshop on Machine Learning for Healthcare 2017 (ML4H 2017). Colocated with NIPS 2017

arXiv:1710.08464 [pdf]

Interpretable Machine Learning for Privacy-Preserving Pervasive Systems

Authors: Benjamin Baron, Mirco Musolesi

Abstract: Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy. Our everyday interactions with pervasive systems generate traces that capture various aspects of human behavior and enable machine learning algorithms to extract latent information about users. In this paper, we propose a machine learning interpretability framework that enables users to understand how these generated traces violate their privacy. △ Less

Submitted 4 June, 2019; v1 submitted 23 October, 2017; originally announced October 2017.

Journal ref: IEEE Pervasive Computing, 2019

arXiv:1709.06519 [pdf, other]

Linking Twitter Events With Stock Market Jitters

Authors: Fani Tsapeli, Nikolaos Bezirgiannidis, Peter Tino, Mirco Musolesi

Abstract: Predicting investors reactions to financial and political news is important for the early detection of stock market jitters. Evidence from several recent studies suggests that online social media could improve prediction of stock market movements. However, utilizing such information to predict strong stock market fluctuations has not been explored so far. In this work, we propose a novel event det… ▽ More Predicting investors reactions to financial and political news is important for the early detection of stock market jitters. Evidence from several recent studies suggests that online social media could improve prediction of stock market movements. However, utilizing such information to predict strong stock market fluctuations has not been explored so far. In this work, we propose a novel event detection method on Twitter, tailored to detect financial and political events that influence a specific stock market. The proposed approach applies a bursty topic detection method on a stream of tweets related to finance or politics followed by a classification process which filters-out events that do not influence the examined stock market. We train our classifier to recognise real events by using solely information about stock market volatility, without the need of manual labeling. We model Twitter events as feature vectors that encompass a rich variety of information, such as the geographical distribution of tweets, their polarity, information about their authors as well as information about bursty words associated with the event. We show that utilizing only information about tweets polarity, like most previous studies, results in wasting important information. We apply the proposed method on high-frequency intra-day data from the Greek and Spanish stock market and we show that our financial event detector successfully predicts most of the stock market jitters. △ Less

Submitted 19 June, 2017; originally announced September 2017.

arXiv:1706.00690 [pdf, other]

A Comparison of Spatial-based Targeted Disease Containment Strategies using Mobile Phone Data

Authors: Stefania Rubrichi, Zbigniew Smoreda, Mirco Musolesi

Abstract: Epidemic outbreaks are an important healthcare challenge, especially in develo** countries where they represent one of the major causes of mortality. Approaches that can rapidly target subpopulations for surveillance and control are critical for enhancing containment processes during epidemics. Using a real-world dataset from Ivory Coast, this work presents an attempt to unveil the socio-geogr… ▽ More Epidemic outbreaks are an important healthcare challenge, especially in develo** countries where they represent one of the major causes of mortality. Approaches that can rapidly target subpopulations for surveillance and control are critical for enhancing containment processes during epidemics. Using a real-world dataset from Ivory Coast, this work presents an attempt to unveil the socio-geographical heterogeneity of disease transmission dynamics. By employing a spatially explicit meta-population epidemic model derived from mobile phone Call Detail Records (CDRs), we investigate how the differences in mobility patterns may affect the course of a realistic infectious disease outbreak. We consider different existing measures of the spatial dimension of human mobility and interactions, and we analyse their relevance in identifying the highest risk sub-population of individuals, as the best candidates for isolation countermeasures. The approaches presented in this paper provide further evidence that mobile phone data can be effectively exploited to facilitate our understanding of individuals' spatial behaviour and its relationship with the risk of infectious diseases' contagion. In particular, we show that CDRs-based indicators of individuals' spatial activities and interactions hold promise for gaining insight of contagion heterogeneity and thus for develo** containment strategies to support decision-making during country-level pandemics. △ Less

Submitted 3 July, 2018; v1 submitted 2 June, 2017; originally announced June 2017.

MSC Class: 91C99

arXiv:1703.04334 [pdf, other]

Probabilistic Matching: Causal Inference under Measurement Errors

Authors: Fani Tsapeli, Peter Tino, Mirco Musolesi

Abstract: The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. In such cases, inaccuracies of the appl… ▽ More The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. In such cases, inaccuracies of the applied inference methods will result in noisy outputs. In this study, we propose a novel approach for causal inference when one or more key variables are noisy. Our method utilizes the knowledge about the uncertainty of the real values of key variables in order to reduce the bias induced by noisy measurements. We evaluate our approach in comparison with existing methods both on simulated and real scenarios and we demonstrate that our method reduces the bias and avoids false causal inference conclusions in most cases. △ Less

Submitted 13 March, 2017; originally announced March 2017.

Comments: In Proceedings of International Joint Conference Of Neural Networks (IJCNN) 2017

arXiv:1703.01476 [pdf, ps, other]

Tracing Networks of Knowledge in the Digital Age

Authors: Mirco Musolesi

Abstract: The emergence of new digital technologies has allowed the study of human behaviour at a scale and at level of granularity that were unthinkable just a decade ago. In particular, by analysing the digital traces left by people interacting in the online and offline worlds, we are able to trace the spreading of knowledge and ideas at both local and global scales. In this article we will discuss how th… ▽ More The emergence of new digital technologies has allowed the study of human behaviour at a scale and at level of granularity that were unthinkable just a decade ago. In particular, by analysing the digital traces left by people interacting in the online and offline worlds, we are able to trace the spreading of knowledge and ideas at both local and global scales. In this article we will discuss how these digital traces can be used to map knowledge across the world, outlining both the limitations and the challenges in performing this type of analysis. We will focus on data collected from social media platforms, large-scale digital repositories and mobile data. Finally, we will provide an overview of the tools that are available to scholars and practitioners for understanding these processes using these emerging forms of data. △ Less

Submitted 7 August, 2019; v1 submitted 4 March, 2017; originally announced March 2017.

Comments: 8 pages. In Proceedings of the British Academy. Accepted for Publication. To Appear. 2019

arXiv:1702.01181 [pdf, other]

Sensing and Modeling Human Behavior Using Social Media and Mobile Data

Authors: Abhinav Mehrotra, Mirco Musolesi

Abstract: In the past years we have witnessed the emergence of the new discipline of computational social science, which promotes a new data-driven and computation-based approach to social sciences. In this article we discuss how the availability of new technologies such as online social media and mobile smartphones has allowed researchers to passively collect human behavioral data at a scale and a level of… ▽ More In the past years we have witnessed the emergence of the new discipline of computational social science, which promotes a new data-driven and computation-based approach to social sciences. In this article we discuss how the availability of new technologies such as online social media and mobile smartphones has allowed researchers to passively collect human behavioral data at a scale and a level of granularity that were just unthinkable some years ago. We also discuss how these digital traces can then be used to prove (or disprove) existing theories and develop new models of human behavior. △ Less

Submitted 8 February, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

Comments: 9 pages, 2 figures

arXiv:1701.08308 [pdf, other]

Anonymous or not? Understanding the Factors Affecting Personal Mobile Data Disclosure

Authors: Christos Perentis, Michele Vescovi, Chiara Leonardi, Corrado Moiso, Mirco Musolesi, Fabio Pianesi, Bruno Lepri

Abstract: The wide adoption of mobile devices and social media platforms have dramatically increased the collection and sharing of personal information. More and more frequently, users are called to take decisions concerning the disclosure of their personal information. In this study, we investigate the factors affecting users' choices toward the disclosure of their personal data, including not only their d… ▽ More The wide adoption of mobile devices and social media platforms have dramatically increased the collection and sharing of personal information. More and more frequently, users are called to take decisions concerning the disclosure of their personal information. In this study, we investigate the factors affecting users' choices toward the disclosure of their personal data, including not only their demographic and self-reported individual characteristics, but also their social interactions and their mobility patterns inferred from months of mobile phone data activity. We report the findings of a field-study conducted with a community of 63 subjects provided with (i) a smart-phone and (ii) a Personal Data Store (PDS) enabling them to control the disclosure of their data. We monitor the sharing behavior of our participants through the PDS, and evaluate the contribution of different factors affecting their disclosing choices of location and social interaction data. Our analysis shows that social interaction inferred by mobile phones is an important factor revealing willingness to share, regardless of the data type. In addition, we provide further insights on the individual traits relevant to the prediction of sharing behavior. △ Less

Submitted 28 January, 2017; originally announced January 2017.

Comments: 19 pages, 3 figures, 3 tables

ACM Class: D.4.6; E.3; H.1.2; H.5.0; H.5.2; H.5.3; H.5.4; K.0; K.2; K.3; K.4; K.5; K.7

arXiv:1610.08469 [pdf, other]

Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

Authors: Sina Sajadmanesh, Sina Jafarzadeh, Seyed Ali Osia, Hamid R. Rabiee, Hamed Haddadi, Yelena Mejova, Mirco Musolesi, Emiliano De Cristofaro, Gianluca Stringhini

Abstract: Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate variet… ▽ More Food and nutrition occupy an increasingly prevalent space on the web, and dishes and recipes shared online provide an invaluable mirror into culinary cultures and attitudes around the world. More specifically, ingredients, flavors, and nutrition information become strong signals of the taste preferences of individuals and civilizations. However, there is little understanding of these palate varieties. In this paper, we present a large-scale study of recipes published on the web and their content, aiming to understand cuisines and culinary habits around the world. Using a database of more than 157K recipes from over 200 different cuisines, we analyze ingredients, flavors, and nutritional values which distinguish dishes from different regions, and use this knowledge to assess the predictability of recipes from different cuisines. We then use country health statistics to understand the relation between these factors and health indicators of different nations, such as obesity, diabetes, migration, and health expenditure. Our results confirm the strong effects of geographical and cultural similarities on recipes, health indicators, and culinary preferences across the globe. △ Less

Submitted 25 April, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

Comments: In the Web Science Track of 26th International World Wide Web Conference (WWW 2017)

Showing 1–50 of 75 results for author: Musolesi, M