-
Towards General Negotiation Strategies with End-to-End Reinforcement Learning
Authors:
Bram M. Renting,
Thomas M. Moerland,
Holger H. Hoos,
Catholijn M. Jonker
Abstract:
The research field of automated negotiation has a long history of designing agents that can negotiate with other agents. Such negotiation strategies are traditionally based on manual design and heuristics. More recently, reinforcement learning approaches have also been used to train agents to negotiate. However, negotiation problems are diverse, causing observation and action dimensions to change,…
▽ More
The research field of automated negotiation has a long history of designing agents that can negotiate with other agents. Such negotiation strategies are traditionally based on manual design and heuristics. More recently, reinforcement learning approaches have also been used to train agents to negotiate. However, negotiation problems are diverse, causing observation and action dimensions to change, which cannot be handled by default linear policy networks. Previous work on this topic has circumvented this issue either by fixing the negotiation problem, causing policies to be non-transferable between negotiation problems or by abstracting the observations and actions into fixed-size representations, causing loss of information and expressiveness due to feature design. We developed an end-to-end reinforcement learning method for diverse negotiation problems by representing observations and actions as a graph and applying graph neural networks in the policy. With empirical evaluations, we show that our method is effective and that we can learn to negotiate with other agents on never-before-seen negotiation problems. Our result opens up new opportunities for reinforcement learning in negotiation agents.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
A Hybrid Intelligence Method for Argument Mining
Authors:
Michiel van der Meer,
Enrico Liscio,
Catholijn M. Jonker,
Aske Plaat,
Piek Vossen,
Pradeep K. Murukannaiah
Abstract:
Large-scale survey tools enable the collection of citizen feedback in opinion corpora. Extracting the key arguments from a large and noisy set of opinions helps in understanding the opinions quickly and accurately. Fully automated methods can extract arguments but (1) require large labeled datasets that induce large annotation costs and (2) work well for known viewpoints, but not for novel points…
▽ More
Large-scale survey tools enable the collection of citizen feedback in opinion corpora. Extracting the key arguments from a large and noisy set of opinions helps in understanding the opinions quickly and accurately. Fully automated methods can extract arguments but (1) require large labeled datasets that induce large annotation costs and (2) work well for known viewpoints, but not for novel points of view. We propose HyEnA, a hybrid (human + AI) method for extracting arguments from opinionated texts, combining the speed of automated processing with the understanding and reasoning capabilities of humans. We evaluate HyEnA on three citizen feedback corpora. We find that, on the one hand, HyEnA achieves higher coverage and precision than a state-of-the-art automated method when compared to a common set of diverse opinions, justifying the need for human insight. On the other hand, HyEnA requires less human effort and does not compromise quality compared to (fully manual) expert analysis, demonstrating the benefit of combining human and artificial intelligence.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Value Preferences Estimation and Disambiguation in Hybrid Participatory Systems
Authors:
Enrico Liscio,
Luciano C. Siebert,
Catholijn M. Jonker,
Pradeep K. Murukannaiah
Abstract:
Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propo…
▽ More
Understanding citizens' values in participatory systems is crucial for citizen-centric policy-making. We envision a hybrid participatory system where participants make choices and provide motivations for those choices, and AI agents estimate their value preferences by interacting with them. We focus on situations where a conflict is detected between participants' choices and motivations, and propose methods for estimating value preferences while addressing detected inconsistencies by interacting with the participants. We operationalize the philosophical stance that "valuing is deliberatively consequential." That is, if a participant's choice is based on a deliberation of value preferences, the value preferences can be observed in the motivation the participant provides for the choice. Thus, we propose and compare value estimation methods that prioritize the values estimated from motivations over the values estimated from choices alone. Then, we introduce a disambiguation strategy that addresses the detected inconsistencies between choices and motivations by directly interacting with the participants. We evaluate the proposed methods on a dataset of a large-scale survey on energy transition. The results show that explicitly addressing inconsistencies between choices and motivations improves the estimation of an individual's value preferences. The disambiguation strategy does not show substantial improvements when compared to similar baselines--however, we discuss how the novelty of the approach can open new research avenues and propose improvements to address the current limitations.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
An Empirical Analysis of Diversity in Argument Summarization
Authors:
Michiel van der Meer,
Piek Vossen,
Catholijn M. Jonker,
Pradeep K. Murukannaiah
Abstract:
Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task -- capturing diversity -- which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argu…
▽ More
Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task -- capturing diversity -- which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.
△ Less
Submitted 14 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Enabling the Digital Democratic Revival: A Research Program for Digital Democracy
Authors:
Davide Grossi,
Ulrike Hahn,
Michael Mäs,
Andreas Nitsche,
Jan Behrens,
Niclas Boehmer,
Markus Brill,
Ulle Endriss,
Umberto Grandi,
Adrian Haret,
Jobst Heitzig,
Nicolien Janssens,
Catholijn M. Jonker,
Marijn A. Keijzer,
Axel Kistner,
Martin Lackner,
Alexandra Lieben,
Anna Mikhaylovskaya,
Pradeep K. Murukannaiah,
Carlo Proietti,
Manon Revel,
Élise Rouméas,
Ehud Shapiro,
Gogulapati Sreedurga,
Björn Swierczek
, et al. (4 additional authors not shown)
Abstract:
This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computational…
▽ More
This white paper outlines a long-term scientific vision for the development of digital-democracy technology. We contend that if digital democracy is to meet the ambition of enabling a participatory renewal in our societies, then a comprehensive multi-methods research effort is required that could, over the years, support its development in a democratically principled, empirically and computationally informed way. The paper is co-authored by an international and interdisciplinary team of researchers and arose from the Lorentz Center Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022).
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
A Systematic Review on Fostering Appropriate Trust in Human-AI Interaction
Authors:
Siddharth Mehrotra,
Chadha Degachi,
Oleksandra Vereschak,
Catholijn M. Jonker,
Myrthe L. Tielman
Abstract:
Appropriate Trust in Artificial Intelligence (AI) systems has rapidly become an important area of focus for both researchers and practitioners. Various approaches have been used to achieve it, such as confidence scores, explanations, trustworthiness cues, or uncertainty communication. However, a comprehensive understanding of the field is lacking due to the diversity of perspectives arising from v…
▽ More
Appropriate Trust in Artificial Intelligence (AI) systems has rapidly become an important area of focus for both researchers and practitioners. Various approaches have been used to achieve it, such as confidence scores, explanations, trustworthiness cues, or uncertainty communication. However, a comprehensive understanding of the field is lacking due to the diversity of perspectives arising from various backgrounds that influence it and the lack of a single definition for appropriate trust. To investigate this topic, this paper presents a systematic review to identify current practices in building appropriate trust, different ways to measure it, types of tasks used, and potential challenges associated with it. We also propose a Belief, Intentions, and Actions (BIA) map** to study commonalities and differences in the concepts related to appropriate trust by (a) describing the existing disagreements on defining appropriate trust, and (b) providing an overview of the concepts and definitions related to appropriate trust in AI from the existing literature. Finally, the challenges identified in studying appropriate trust are discussed, and observations are summarized as current trends, potential gaps, and research opportunities for future work. Overall, the paper provides insights into the complex concept of appropriate trust in human-AI interaction and presents research opportunities to advance our understanding on this topic.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Do Differences in Values Influence Disagreements in Online Discussions?
Authors:
Michiel van der Meer,
Piek Vossen,
Catholijn M. Jonker,
Pradeep K. Murukannaiah
Abstract:
Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in on…
▽ More
Disagreements are common in online discussions. Disagreement may foster collaboration and improve the quality of a discussion under some conditions. Although there exist methods for recognizing disagreement, a deeper understanding of factors that influence disagreement is lacking in the literature. We investigate a hypothesis that differences in personal values are indicative of disagreement in online discussions. We show how state-of-the-art models can be used for estimating values in online discussions and how the estimated values can be aggregated into value profiles. We evaluate the estimated value profiles based on human-annotated agreement labels. We find that the dissimilarity of value profiles correlates with disagreement in specific cases. We also find that including value information in agreement prediction improves performance.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems
Authors:
Catholijn M. Jonker,
Luciano Cavalcante Siebert,
Pradeep K. Murukannaiah
Abstract:
With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control…
▽ More
With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control over AI systems. Focusing on decision support systems, we propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms. We also propose a possible research approach to design and develop self-reflective capability in AI systems. Finally, we argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), thus increasing meaningful human control and empowering human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Registered Report : Perception of Other's Musical Preferences Based on Their Personal Values
Authors:
Sandy Manolios,
Catholijn M. Jonker,
Cynthia C. S. Liem
Abstract:
The present work is part of a research line seeking to uncover the mysteries of what lies behind people's musical preferences in order to provide better music recommendations. More specifically, it takes the angle of personal values. Personal values are what we as people strive for, and are a popular tool in marketing research to understand customer preferences for certain types of product. Theref…
▽ More
The present work is part of a research line seeking to uncover the mysteries of what lies behind people's musical preferences in order to provide better music recommendations. More specifically, it takes the angle of personal values. Personal values are what we as people strive for, and are a popular tool in marketing research to understand customer preferences for certain types of product. Therefore, it makes sense to explore their usefulness in the music domain. Based on a previous qualitative work using the Means-End theory, we designed a survey in an attempt to more quantitatively approach the relationship between personal values and musical preferences. We support our approach with a simulation study as a tool to improve the experimental procedure and decisions.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents
Authors:
Pei-Yu Chen,
Myrthe L. Tielman,
Dirk K. J. Heylen,
Catholijn M. Jonker,
M. Birna van Riemsdijk
Abstract:
AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a different way of looking at the notion of alignment, namely by introducing AI Alignment Dialogues: dialogues with which users and agents try to achieve and maintain ali…
▽ More
AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a different way of looking at the notion of alignment, namely by introducing AI Alignment Dialogues: dialogues with which users and agents try to achieve and maintain alignment via interaction. We argue that alignment dialogues have a number of advantages in comparison to data-driven approaches, especially for behaviour support agents, which aim to support users in achieving their desired future behaviours rather than their current behaviours. The advantages of alignment dialogues include allowing the users to directly convey higher-level concepts to the agent, and making the agent more transparent and trustworthy. In this paper we outline the concept and high-level structure of alignment dialogues. Moreover, we conducted a qualitative focus group user study from which we developed a model that describes how alignment dialogues affect users, and created design suggestions for AI alignment dialogues. Through this we establish foundations for AI alignment dialogues and shed light on what requires further development and research.
△ Less
Submitted 5 October, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Automated Configuration and Usage of Strategy Portfolios for Bargaining
Authors:
Bram M. Renting,
Holger H. Hoos,
Catholijn M. Jonker
Abstract:
Bargaining can be used to resolve mixed-motive games in multi-agent systems. Although there is an abundance of negotiation strategies implemented in automated negotiating agents, most agents are based on single fixed strategies, while it is widely acknowledged that there is no single best-performing strategy for all negotiation settings.
In this paper, we focus on bargaining settings where oppon…
▽ More
Bargaining can be used to resolve mixed-motive games in multi-agent systems. Although there is an abundance of negotiation strategies implemented in automated negotiating agents, most agents are based on single fixed strategies, while it is widely acknowledged that there is no single best-performing strategy for all negotiation settings.
In this paper, we focus on bargaining settings where opponents are repeatedly encountered, but the bargaining problems change. We introduce a novel method that automatically creates and deploys a portfolio of complementary negotiation strategies using a training set and optimise pay-off in never-before-seen bargaining settings through per-setting strategy selection. Our method relies on the following contributions. We introduce a feature representation that captures characteristics for both the opponent and the bargaining problem. We model the behaviour of an opponent during a negotiation based on its actions, which is indicative of its negotiation strategy, in order to be more effective in future encounters.
Our combination of feature-based methods generalises to new negotiation settings, as in practice, over time, it selects effective counter strategies in future encounters. Our approach is tested in an ANAC-like tournament, and we show that we are capable of winning such a tournament with a 5.6% increase in pay-off compared to the runner-up agent.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Exploring Effectiveness of Explanations for Appropriate Trust: Lessons from Cognitive Psychology
Authors:
Ruben S. Verhagen,
Siddharth Mehrotra,
Mark A. Neerincx,
Catholijn M. Jonker,
Myrthe L. Tielman
Abstract:
The rapid development of Artificial Intelligence (AI) requires developers and designers of AI systems to focus on the collaboration between humans and machines. AI explanations of system behavior and reasoning are vital for effective collaboration by fostering appropriate trust, ensuring understanding, and addressing issues of fairness and bias. However, various contextual and subjective factors c…
▽ More
The rapid development of Artificial Intelligence (AI) requires developers and designers of AI systems to focus on the collaboration between humans and machines. AI explanations of system behavior and reasoning are vital for effective collaboration by fostering appropriate trust, ensuring understanding, and addressing issues of fairness and bias. However, various contextual and subjective factors can influence an AI system explanation's effectiveness. This work draws inspiration from findings in cognitive psychology to understand how effective explanations can be designed. We identify four components to which explanation designers can pay special attention: perception, semantics, intent, and user & context. We illustrate the use of these four explanation components with an example of estimating food calories by combining text with visuals, probabilities with exemplars, and intent communication with both user and context in mind. We propose that the significant challenge for effective AI explanations is an additional step between explanation generation using algorithms not producing interpretable explanations and explanation communication. We believe this extra step will benefit from carefully considering the four explanation components outlined in our work, which can positively affect the explanation's effectiveness.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
MOPaC: The Multiple Offers Protocol for Multilateral Negotiations with Partial Consensus
Authors:
Pradeep K. Murukannaiah,
Catholijn M. Jonker
Abstract:
Existing protocols for multilateral negotiation require a full consensus among the negotiating parties. In contrast, we propose a protocol for multilateral negotiation that allows partial consensus, wherein only a subset of the negotiating parties can reach an agreement. We motivate problems that require such a protocol and describe the protocol formally.
Existing protocols for multilateral negotiation require a full consensus among the negotiating parties. In contrast, we propose a protocol for multilateral negotiation that allows partial consensus, wherein only a subset of the negotiating parties can reach an agreement. We motivate problems that require such a protocol and describe the protocol formally.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Towards a Real-time Measure of the Perception of Anthropomorphism in Human-robot Interaction
Authors:
Maria Tsfasman,
Avinash Saravanan,
Dekel Viner,
Daan Goslinga,
Sarah de Wolf,
Chirag Raman,
Catholijn M. Jonker,
Catharine Oertel
Abstract:
How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this…
▽ More
How human-like do conversational robots need to look to enable long-term human-robot conversation? One essential aspect of long-term interaction is a human's ability to adapt to the varying degrees of a conversational partner's engagement and emotions. Prosodically, this can be achieved through (dis)entrainment. While speech-synthesis has been a limiting factor for many years, restrictions in this regard are increasingly mitigated. These advancements now emphasise the importance of studying the effect of robot embodiment on human entrainment. In this study, we conducted a between-subjects online human-robot interaction experiment in an educational use-case scenario where a tutor was either embodied through a human or a robot face. 43 English-speaking participants took part in the study for whom we analysed the degree of acoustic-prosodic entrainment to the human or robot face, respectively. We found that the degree of subjective and objective perception of anthropomorphism positively correlates with acoustic-prosodic entrainment.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Meaningful human control: actionable properties for AI system development
Authors:
Luciano Cavalcante Siebert,
Maria Luce Lupetti,
Evgeni Aizenberg,
Niek Beckers,
Arkady Zgonnikov,
Herman Veluwenkamp,
David Abbink,
Elisa Giaccardi,
Geert-Jan Houben,
Catholijn M. Jonker,
Jeroen van den Hoven,
Deborah Forster,
Reginald L. Lagendijk
Abstract:
How can humans remain in control of artificial intelligence (AI)-based systems designed to perform tasks autonomously? Such systems are increasingly ubiquitous, creating benefits - but also undesirable situations where moral responsibility for their actions cannot be properly attributed to any particular person or group. The concept of meaningful human control has been proposed to address responsi…
▽ More
How can humans remain in control of artificial intelligence (AI)-based systems designed to perform tasks autonomously? Such systems are increasingly ubiquitous, creating benefits - but also undesirable situations where moral responsibility for their actions cannot be properly attributed to any particular person or group. The concept of meaningful human control has been proposed to address responsibility gaps and mitigate them by establishing conditions that enable a proper attribution of responsibility for humans; however, clear requirements for researchers, designers, and engineers are yet inexistent, making the development of AI-based systems that remain under meaningful human control challenging. In this paper, we address the gap between philosophical theory and engineering practice by identifying, through an iterative process of abductive thinking, four actionable properties for AI-based systems under meaningful human control, which we discuss making use of two applications scenarios: automated vehicles and AI-based hiring. First, a system in which humans and AI algorithms interact should have an explicitly defined domain of morally loaded situations within which the system ought to operate. Second, humans and AI agents within the system should have appropriate and mutually compatible representations. Third, responsibility attributed to a human should be commensurate with that human's ability and authority to control the system. Fourth, there should be explicit links between the actions of the AI agents and actions of humans who are aware of their moral responsibility. We argue that these four properties will support practically-minded professionals to take concrete steps toward designing and engineering for AI systems that facilitate meaningful human control.
△ Less
Submitted 19 May, 2022; v1 submitted 25 November, 2021;
originally announced December 2021.
-
Towards Social Situation Awareness in Support Agents
Authors:
Ilir Kola,
Pradeep K. Murukannaiah,
Catholijn M. Jonker,
M. Birna van Riemsdijk
Abstract:
Artificial agents that support people in their daily activities (e.g., virtual coaches and personal assistants) are increasingly prevalent. Since many daily activities are social in nature, support agents should understand a user's social situation to offer comprehensive support. However, there are no systematic approaches for develo** support agents that are social situation aware. We identify…
▽ More
Artificial agents that support people in their daily activities (e.g., virtual coaches and personal assistants) are increasingly prevalent. Since many daily activities are social in nature, support agents should understand a user's social situation to offer comprehensive support. However, there are no systematic approaches for develo** support agents that are social situation aware. We identify key requirements for a support agent to be social situation aware and propose steps to realize those requirements. These steps are presented through a conceptual architecture centered on two key ideas: (1) conceptualizing social situation awareness as an instantiation of `general' situation awareness, and (2) using situation taxonomies for such instantiation. This enables support agents to represent a user's social situation, comprehend its meaning, and assess its impact on the user's behavior. We discuss empirical results supporting the effectiveness of the proposed approach and illustrate how the architecture can be used in support agents through two use cases.
△ Less
Submitted 4 April, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Using Psychological Characteristics of Situations for Social Situation Comprehension in Support Agents
Authors:
Ilir Kola,
Catholijn M. Jonker,
M. Birna van Riemsdijk
Abstract:
Support agents that help users in their daily lives need to take into account not only the user's characteristics, but also the social situation of the user. Existing work on including social context uses some type of situation cue as an input to information processing techniques in order to assess the expected behavior of the user. However, research shows that it is important to also determine th…
▽ More
Support agents that help users in their daily lives need to take into account not only the user's characteristics, but also the social situation of the user. Existing work on including social context uses some type of situation cue as an input to information processing techniques in order to assess the expected behavior of the user. However, research shows that it is important to also determine the meaning of a situation, a step which we refer to as social situation comprehension. We propose using psychological characteristics of situations, which have been proposed in social science for ascribing meaning to situations, as the basis for social situation comprehension. Using data from user studies, we evaluate this proposal from two perspectives. First, from a technical perspective, we show that psychological characteristics of situations can be used as input to predict the priority of social situations, and that psychological characteristics of situations can be predicted from the features of a social situation. Second, we investigate the role of the comprehension step in human-machine meaning making. We show that psychological characteristics can be successfully used as a basis for explanations given to users about the decisions of an agenda management personal assistant agent.
△ Less
Submitted 13 July, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
From Organisational Structure to Organisational Behaviour Formalisation
Authors:
Catholijn M. Jonker,
Jan Treur
Abstract:
To understand how an organisational structure relates to organisational behaviour is an interesting fundamental challenge in the area of organisation modelling. Specifications of organisational structure usually have a diagrammatic form that abstracts from more detailed dynamics. Dynamic properties of agent systems, on the other hand, are often specified in the form of a set of logical formulae in…
▽ More
To understand how an organisational structure relates to organisational behaviour is an interesting fundamental challenge in the area of organisation modelling. Specifications of organisational structure usually have a diagrammatic form that abstracts from more detailed dynamics. Dynamic properties of agent systems, on the other hand, are often specified in the form of a set of logical formulae in some temporal language. This paper addresses the question how these two perspectives can be combined in one framework. It is shown how for different aggregation levels and other elements within an organisation structure, sets of dynamic properties can be specified. Organisational structure provides a structure of (interlevel) relationships between these multiple sets of dynamic properties. Thus organisational structure is reflected in the formalisation of the dynamics of organisational behaviour. To illustrate the effectiveness of the approach a formal foundation is presented for the integrated specification of both structure and behaviour of an AGR organisation model.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
Reason Against the Machine: Future Directions for Mass Online Deliberation
Authors:
Ruth Shortall,
Anatol Itten,
Michiel van der Meer,
Pradeep K. Murukannaiah,
Catholijn M. Jonker
Abstract:
Designers of online deliberative platforms aim to counter the degrading quality of online debates. Support technologies such as machine learning and natural language processing open avenues for widening the circle of people involved in deliberation, moving from small groups to "crowd" scale. Numerous design features of large-scale online discussion systems allow larger numbers of people to discuss…
▽ More
Designers of online deliberative platforms aim to counter the degrading quality of online debates. Support technologies such as machine learning and natural language processing open avenues for widening the circle of people involved in deliberation, moving from small groups to "crowd" scale. Numerous design features of large-scale online discussion systems allow larger numbers of people to discuss shared problems, enhance critical thinking, and formulate solutions. We review the transdisciplinary literature on the design of digital mass deliberation platforms and examine the commonly featured design aspects (e.g., argumentation support, automated facilitation, and gamification) that attempt to facilitate scaling up. We find that the literature is largely focused on develo** technical fixes for scaling up deliberation, but may neglect the more nuanced requirements of high quality deliberation. Current design research is carried out with a small, atypical segment of the world's population, and much research is still needed on how to facilitate and accommodate different genders or cultures in deliberation, how to deal with the implications of pre-existing social inequalities, how to build motivation and self-efficacy in certain groups, and how to deal with differences in cognitive abilities and cultural or linguistic differences. Few studies bridge disciplines between deliberative theory, design and engineering. As a result, scaling up deliberation will likely advance in separate systemic siloes. We make design and process recommendations to correct this course and suggest avenues for future research
△ Less
Submitted 31 January, 2022; v1 submitted 27 July, 2021;
originally announced July 2021.
-
A Data-Driven Method for Recognizing Automated Negotiation Strategies
Authors:
Ming Li,
Pradeep K. Murukannaiah,
Catholijn M. Jonker
Abstract:
Understanding an opponent agent helps in negotiating with it. Existing works on understanding opponents focus on preference modeling (or estimating the opponent's utility function). An important but largely unexplored direction is recognizing an opponent's negotiation strategy, which captures the opponent's tactics, e.g., to be tough at the beginning but to concede toward the deadline. Recognizing…
▽ More
Understanding an opponent agent helps in negotiating with it. Existing works on understanding opponents focus on preference modeling (or estimating the opponent's utility function). An important but largely unexplored direction is recognizing an opponent's negotiation strategy, which captures the opponent's tactics, e.g., to be tough at the beginning but to concede toward the deadline. Recognizing complex, state-of-the-art, negotiation strategies is extremely challenging, and simple heuristics may not be adequate for this purpose. We propose a novel data-driven approach for recognizing an opponent's s negotiation strategy. Our approach includes a data generation method for an agent to generate domain-independent sequences by negotiating with a variety of opponents across domains, a feature engineering method for representing negotiation data as time series with time-step features and overall features, and a hybrid (recurrent neural network-based) deep learning method for recognizing an opponent's strategy from the time series of bids. We perform extensive experiments, spanning four problem scenarios, to demonstrate the effectiveness of our approach.
△ Less
Submitted 7 October, 2021; v1 submitted 3 July, 2021;
originally announced July 2021.
-
Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning
Authors:
Youri Coppens,
Denis Steckelmacher,
Catholijn M. Jonker,
Ann Nowé
Abstract:
Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a map** from states t…
▽ More
Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a map** from states to actions, but also produces extra meta-information, such as action values indicating the quality of alternative actions. This meta-information can indicate whether more than one action is near-optimal for a certain state. We extend CN2 to make it able to leverage knowledge about equally-good actions to distill the policy into fewer rules, increasing its interpretability by a person. Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment. We demonstrate the applicability of our algorithm on the Mario AI benchmark, a complex task that requires modern reinforcement learning algorithms including neural networks. The explanations we produce capture the learned policy in only a few rules, that allow a person to understand what the black-box agent learned. Source code: https://gitlab.ai.vub.ac.be/yocoppen/svcn2
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
More Similar Values, More Trust? -- the Effect of Value Similarity on Trust in Human-Agent Interaction
Authors:
Siddharth Mehrotra,
Catholijn M. Jonker,
Myrthe L. Tielman
Abstract:
As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (…
▽ More
As AI systems are increasingly involved in decision making, it also becomes important that they elicit appropriate levels of trust from their users. To achieve this, it is first important to understand which factors influence trust in AI. We identify that a research gap exists regarding the role of personal values in trust in AI. Therefore, this paper studies how human and agent Value Similarity (VS) influences a human's trust in that agent. To explore this, 89 participants teamed up with five different agents, which were designed with varying levels of value similarity to that of the participants. In a within-subjects, scenario-based experiment, agents gave suggestions on what to do when entering the building to save a hostage. We analyzed the agent's scores on subjective value similarity, trust and qualitative data from open-ended questions. Our results show that agents rated as having more similar values also scored higher on trust, indicating a positive effect between the two. With this result, we add to the existing understanding of human-agent trust by providing insight into the role of value-similarity.
△ Less
Submitted 19 May, 2021;
originally announced May 2021.
-
Modelling Human Routines: Conceptualising Social Practice Theory for Agent-Based Simulation
Authors:
Rijk Mercuur,
Virginia Dignum,
Catholijn M. Jonker
Abstract:
Our routines play an important role in a wide range of social challenges such as climate change, disease outbreaks and coordinating staff and patients in a hospital. To use agent-based simulations (ABS) to understand the role of routines in social challenges we need an agent framework that integrates routines. This paper provides the domain-independent Social Practice Agent (SoPrA) framework that…
▽ More
Our routines play an important role in a wide range of social challenges such as climate change, disease outbreaks and coordinating staff and patients in a hospital. To use agent-based simulations (ABS) to understand the role of routines in social challenges we need an agent framework that integrates routines. This paper provides the domain-independent Social Practice Agent (SoPrA) framework that satisfies requirements from the literature to simulate our routines. By choosing the appropriate concepts from the literature on agent theory, social psychology and social practice theory we ensure SoPrA correctly depicts current evidence on routines. By creating a consistent, modular and parsimonious framework suitable for multiple domains we enhance the usability of SoPrA. SoPrA provides ABS researchers with a conceptual, formal and computational framework to simulate routines and gain new insights into social systems.
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Model-based Reinforcement Learning: A Survey
Authors:
Thomas M. Moerland,
Joost Broekens,
Aske Plaat,
Catholijn M. Jonker
Abstract:
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematic…
▽ More
Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.
△ Less
Submitted 31 March, 2022; v1 submitted 30 June, 2020;
originally announced June 2020.
-
A Unifying Framework for Reinforcement Learning and Planning
Authors:
Thomas M. Moerland,
Joost Broekens,
Aske Plaat,
Catholijn M. Jonker
Abstract:
Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in…
▽ More
Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.
△ Less
Submitted 31 March, 2022; v1 submitted 26 June, 2020;
originally announced June 2020.
-
The Second Type of Uncertainty in Monte Carlo Tree Search
Authors:
Thomas M Moerland,
Joost Broekens,
Aske Plaat,
Catholijn M Jonker
Abstract:
Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from…
▽ More
Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from the reinforcement learning community. We then introduce a new algorithm, which estimates the size of the subtree below an action, and leverages this information in the UCB formula to better direct exploration. Subsequently, we generalize these ideas by showing that loops, i.e., the repeated occurrence of (approximately) the same state in the same trace, are actually a special case of subtree depth variation. Testing on a variety of tasks shows that our algorithms increase sample efficiency, especially when the planning budget per timestep is small.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning
Authors:
Thomas M. Moerland,
Anna Deichler,
Simone Baldi,
Joost Broekens,
Catholijn M. Jonker
Abstract:
Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning introduces a new question: how should we balance time…
▽ More
Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning introduces a new question: how should we balance time spend on planning, learning and acting? The importance of this trade-off has not been explicitly studied before. We show that it is actually of key importance, with computational results indicating that we should neither plan too long nor too short. Conceptually, we identify a new spectrum of planning-learning algorithms which ranges from exhaustive search (long planning) to model-free RL (no planning), with optimal performance achieved midway.
△ Less
Submitted 15 May, 2020;
originally announced May 2020.
-
Automated Configuration of Negotiation Strategies
Authors:
Bram M. Renting,
Holger H. Hoos,
Catholijn M. Jonker
Abstract:
Bidding and acceptance strategies have a substantial impact on the outcome of negotiations in scenarios with linear additive and nonlinear utility functions. Over the years, it has become clear that there is no single best strategy for all negotiation settings, yet many fixed strategies are still being developed. We envision a shift in the strategy design question from: What is a good strategy?, t…
▽ More
Bidding and acceptance strategies have a substantial impact on the outcome of negotiations in scenarios with linear additive and nonlinear utility functions. Over the years, it has become clear that there is no single best strategy for all negotiation settings, yet many fixed strategies are still being developed. We envision a shift in the strategy design question from: What is a good strategy?, towards: What could be a good strategy? For this purpose, we developed a method leveraging automated algorithm configuration to find the best strategies for a specific set of negotiation settings. By empowering automated negotiating agents using automated algorithm configuration, we obtain a flexible negotiation agent that can be configured automatically for a rich space of opponents and negotiation scenarios.
To critically assess our approach, the agent was tested in an ANAC-like bilateral automated negotiation tournament setting against past competitors. We show that our automatically configured agent outperforms all other agents, with a 5.1% increase in negotiation payoff compared to the next-best agent. We note that without our agent in the tournament, the top-ranked agent wins by a margin of only 0.01%.
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Towards Agent-based Models of Rumours in Organizations: A Social Practice Theory Approach
Authors:
Amir Ebrahimi Fard,
Rijk Mercuur,
Virginia Dignum,
Catholijn M. Jonker,
Bartel van de Walle
Abstract:
Rumour is a collective emergent phenomenon with a potential for provoking a crisis. Modelling approaches have been deployed since five decades ago; however, the focus was mostly on epidemic behaviour of the rumours which does not take into account the differences of the agents. We use social practice theory to model agent decision making in organizational rumourmongering. Such an approach provides…
▽ More
Rumour is a collective emergent phenomenon with a potential for provoking a crisis. Modelling approaches have been deployed since five decades ago; however, the focus was mostly on epidemic behaviour of the rumours which does not take into account the differences of the agents. We use social practice theory to model agent decision making in organizational rumourmongering. Such an approach provides us with an opportunity to model rumourmongering agents with a layer of cognitive realism and study the impacts of various intervention strategies for prevention and control of rumours in organizations.
△ Less
Submitted 10 April, 2019; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Modelling Agents Endowed with Social Practices: Static Aspects
Authors:
Rijk Mercuur,
Virginia Dignum,
Catholijn M. Jonker
Abstract:
To understand societal phenomena through simulation, we need computational variants of socio-cognitive theories. Social Practice Theory has provided a unique understanding of social phenomena regarding the routinized, social and interconnected aspects of behaviour. This paper provides the Social Practice Agent (SoPrA) model that enables the use of Social Practice Theory (SPT) for agent-based simul…
▽ More
To understand societal phenomena through simulation, we need computational variants of socio-cognitive theories. Social Practice Theory has provided a unique understanding of social phenomena regarding the routinized, social and interconnected aspects of behaviour. This paper provides the Social Practice Agent (SoPrA) model that enables the use of Social Practice Theory (SPT) for agent-based simulations. We extract requirements from SPT, construct a computational model in the Unified Modelling Language, verify its implementation in Netlogo and Protégé and show how SoPrA maps on a use case of commuting. The next step is to model the dynamic aspect of SPT and validate SoPrA's ability to provide understanding in different scenario's. This paper provides the groundwork with a computational model that is a correct depiction of SPT, computational feasible and can be directly mapped to the habitual, social and interconnected aspects of a target scenario.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
The Potential of the Return Distribution for Exploration in RL
Authors:
Thomas M. Moerland,
Joost Broekens,
Catholijn M. Jonker
Abstract:
This paper studies the potential of the return distribution for exploration in deterministic reinforcement learning (RL) environments. We study network losses and propagation mechanisms for Gaussian, Categorical and Gaussian mixture distributions. Combined with exploration policies that leverage this return distribution, we solve, for example, a randomized Chain task of length 100, which has not b…
▽ More
This paper studies the potential of the return distribution for exploration in deterministic reinforcement learning (RL) environments. We study network losses and propagation mechanisms for Gaussian, Categorical and Gaussian mixture distributions. Combined with exploration policies that leverage this return distribution, we solve, for example, a randomized Chain task of length 100, which has not been reported before when learning with neural networks.
△ Less
Submitted 2 July, 2018; v1 submitted 11 June, 2018;
originally announced June 2018.
-
A0C: Alpha Zero in Continuous Action Space
Authors:
Thomas M. Moerland,
Joost Broekens,
Aske Plaat,
Catholijn M. Jonker
Abstract:
A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical ex…
▽ More
A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Monte Carlo Tree Search for Asymmetric Trees
Authors:
Thomas M. Moerland,
Joost Broekens,
Aske Plaat,
Catholijn M. Jonker
Abstract:
We present an extension of Monte Carlo Tree Search (MCTS) that strongly increases its efficiency for trees with asymmetry and/or loops. Asymmetric termination of search trees introduces a type of uncertainty for which the standard upper confidence bound (UCB) formula does not account. Our first algorithm (MCTS-T), which assumes a non-stochastic environment, backs-up tree structure uncertainty and…
▽ More
We present an extension of Monte Carlo Tree Search (MCTS) that strongly increases its efficiency for trees with asymmetry and/or loops. Asymmetric termination of search trees introduces a type of uncertainty for which the standard upper confidence bound (UCB) formula does not account. Our first algorithm (MCTS-T), which assumes a non-stochastic environment, backs-up tree structure uncertainty and leverages it for exploration in a modified UCB formula. Results show vastly improved efficiency in a well-known asymmetric domain in which MCTS performs arbitrarily bad. Next, we connect the ideas about asymmetric termination to the presence of loops in the tree, where the same state appears multiple times in a single trace. An extension to our algorithm (MCTS-T+), which in addition to non-stochasticity assumes full state observability, further increases search efficiency for domains with loops as well. Benchmark testing on a set of OpenAI Gym and Atari 2600 games indicates that our algorithms always perform better than or at least equivalent to standard MCTS, and could be first-choice tree search algorithms for non-stochastic, fully-observable environments.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Volunteers in the Smart City: Comparison of Contribution Strategies on Human-Centered Measures
Authors:
Stefano Bennati,
Ivana Dusparic,
Rhythima Shinde,
Catholijn M. Jonker
Abstract:
Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depe…
▽ More
Several smart city services rely on users contribution, e.g., data, which can be costly for the users in terms of privacy. High costs lead to reduced user participation, which undermine the success of smart city technologies. This work develops a scenario-independent design principle, based on public good theory, for resource management in smart city applications, where provision of a service depends on contributors and free-riders, which benefit from the service without contributing own resources. Following this design principle, different classes of algorithms for resource management are evaluated with respect to human-centered measures, i.e., privacy, fairness and social welfare. Trade-offs that characterize algorithms are discussed across two smart city application scenarios. These results might help Smart City application designers to choose a suitable algorithm given a scenario-specific set of requirements, and users to choose a service based on an algorithm that matches their preferences.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
Authors:
Luisa M Zintgraf,
Diederik M Roijers,
Sjoerd Linders,
Catholijn M Jonker,
Ann Nowé
Abstract:
In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap.…
▽ More
In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Efficient exploration with Double Uncertain Value Networks
Authors:
Thomas M. Moerland,
Joost Broekens,
Catholijn M. Jonker
Abstract:
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these…
▽ More
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Emotion in Reinforcement Learning Agents and Robots: A Survey
Authors:
Thomas M. Moerland,
Joost Broekens,
Catholijn M. Jonker
Abstract:
This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making archit…
▽ More
This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are usually grounded in the agent's decision making architecture, of which RL is an important subclass. Studying emotions in RL-based agents is useful for three research fields. For machine learning (ML) researchers, emotion models may improve learning efficiency. For the interactive ML and human-robot interaction (HRI) community, emotions can communicate state and enhance user investment. Lastly, it allows affective modelling (AM) researchers to investigate their emotion theories in a successful AI agent class. This survey provides background on emotion theory and RL. It systematically addresses 1) from what underlying dimensions (e.g., homeostasis, appraisal) emotions can be derived and how these can be modelled in RL-agents, 2) what types of emotions have been derived from these dimensions, and 3) how these emotions may either influence the learning efficiency of the agent or be useful as social signals. We also systematically compare evaluation criteria, and draw connections to important RL sub-domains like (intrinsic) motivation and model-based RL. In short, this survey provides both a practical overview for engineers wanting to implement emotions in their RL agents, and identifies challenges and directions for future emotion-RL research.
△ Less
Submitted 15 May, 2017;
originally announced May 2017.
-
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning
Authors:
Thomas M. Moerland,
Joost Broekens,
Catholijn M. Jonker
Abstract:
In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In co…
▽ More
In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In contrast, deep generative models do capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for model-based RL. Subsequently, we compare different VI models on their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with multimodal dynamics. Results show VI successfully predicts multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.
△ Less
Submitted 8 August, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
PriMaL: A Privacy-Preserving Machine Learning Method for Event Detection in Distributed Sensor Networks
Authors:
Stefano Bennati,
Catholijn M. Jonker
Abstract:
This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning method for reducing the privacy cost of information transmitted through a network. Distributed sensor networks are often used for automated classification and detection of abnormal events in high-stakes situations, e.g. fire in buildings, earthquakes, or crowd disasters. Such networks might transmit privacy-sensitive infor…
▽ More
This paper introduces PriMaL, a general PRIvacy-preserving MAchine-Learning method for reducing the privacy cost of information transmitted through a network. Distributed sensor networks are often used for automated classification and detection of abnormal events in high-stakes situations, e.g. fire in buildings, earthquakes, or crowd disasters. Such networks might transmit privacy-sensitive information, e.g. GPS location of smartphones, which might be disclosed if the network is compromised. Privacy concerns might slow down the adoption of the technology, in particular in the scenario of social sensing where participation is voluntary, thus solutions are needed which improve privacy without compromising on the event detection accuracy. PriMaL is implemented as a machine-learning layer that works on top of an existing event detection algorithm. Experiments are run in a general simulation framework, for several network topologies and parameter values. The privacy footprint of state-of-the-art event detection algorithms is compared within the proposed framework. Results show that PriMaL is able to reduce the privacy cost of a distributed event detection algorithm below that of the corresponding centralized algorithm, within the bounds of some assumptions about the protocol. Moreover the performance of the distributed algorithm is not statistically worse than that of the centralized algorithm.
△ Less
Submitted 21 March, 2017;
originally announced March 2017.
-
Can we reach Pareto optimal outcomes using bottom-up approaches?
Authors:
Victor Sanchez-Anguix,
Reyhan Aydogan,
Tim Baarslag,
Catholijn M. Jonker
Abstract:
Traditionally, researchers in decision making have focused on attempting to reach Pareto Optimality using horizontal approaches, where optimality is calculated taking into account every participant at the same time. Sometimes, this may prove to be a difficult task (e.g., conflict, mistrust, no information sharing, etc.). In this paper, we explore the possibility of achieving Pareto Optimal outcome…
▽ More
Traditionally, researchers in decision making have focused on attempting to reach Pareto Optimality using horizontal approaches, where optimality is calculated taking into account every participant at the same time. Sometimes, this may prove to be a difficult task (e.g., conflict, mistrust, no information sharing, etc.). In this paper, we explore the possibility of achieving Pareto Optimal outcomes in a group by using a bottom-up approach: discovering Pareto optimal outcomes by interacting in subgroups. We analytically show that Pareto optimal outcomes in a subgroup are also Pareto optimal in a supergroup of those agents in the case of strict, transitive, and complete preferences. Then, we empirically analyze the prospective usability and practicality of bottom-up approaches in a variety of decision making domains.
△ Less
Submitted 3 July, 2016;
originally announced July 2016.