Search | arXiv e-print repository

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Authors: Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers

Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perfor… ▽ More Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted for the Blue Sky Track at AAMAS'24

arXiv:2305.05560 [pdf, other]

Distributional Multi-Objective Decision Making

Authors: Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers

Abstract: For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly.… ▽ More For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems. △ Less

Submitted 18 July, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Accepted at IJCAI 2023

arXiv:2211.13032 [pdf, other]

Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning

Authors: Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion

Abstract: In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in r… ▽ More In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns. △ Less

Submitted 6 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.00966

arXiv:2209.04376 [pdf, other]

Challenges of Implementing Agile Processes in Remote-First Companies

Authors: Lulit Asfaw, Mikael Clemmons, Cody Hayes, Elise Letnaunchyn, Elnaz Rabieinejad

Abstract: The trend of remote work, especially in the IT sector, has been on the rise in recent years, and its popularity has especially increased since the COVID-19 pandemic. In addition to adopting remote work, companies also have been migrating toward managing their projects using agile processes. Agile processes promote small and continuous feedback loops powered by effective communication. In this surv… ▽ More The trend of remote work, especially in the IT sector, has been on the rise in recent years, and its popularity has especially increased since the COVID-19 pandemic. In addition to adopting remote work, companies also have been migrating toward managing their projects using agile processes. Agile processes promote small and continuous feedback loops powered by effective communication. In this survey, we look to discover the challenges of implementing these processes in a remote setting, specifically focusing on the impact on communication. We examine the role communication plays in an agile setting and look for ways to mitigate the risk remote environments impose on it. Lastly, we present other miscellaneous challenges companies could experience that still carry dangers but are less impactful overall to agile implementation. △ Less

Submitted 9 September, 2022; originally announced September 2022.

arXiv:2207.00368 [pdf, other]

Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models

Authors: Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion

Abstract: Many real-world problems contain multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example, in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependencies between turbines arise due to the wake effect. We m… ▽ More Many real-world problems contain multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example, in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependencies between turbines arise due to the wake effect. We model such sparse dependencies between agents as a multi-objective coordination graph (MO-CoG). In multi-objective reinforcement learning a utility function is typically used to model a users preferences over objectives, which may be unknown a priori. In such settings a set of optimal policies must be computed. Which policies are optimal depends on which optimality criterion applies. If the utility function of a user is derived from multiple executions of a policy, the scalarised expected returns (SER) must be optimised. If the utility of a user is derived from a single execution of a policy, the expected scalarised returns (ESR) criterion must be optimised. For example, wind farms are subjected to constraints and regulations that must be adhered to at all times, therefore the ESR criterion must be optimised. For MO-CoGs, the state-of-the-art algorithms can only compute a set of optimal policies for the SER criterion, leaving the ESR criterion understudied. To compute a set of optimal polices under the ESR criterion, also known as the ESR set, distributions over the returns must be maintained. Therefore, to compute a set of optimal policies under the ESR criterion for MO-CoGs, we present a novel distributional multi-objective variable elimination (DMOVE) algorithm. We evaluate DMOVE in realistic wind farm simulations. Given the returns in real-world wind farm settings are continuous, we utilise a model known as real-NVP to learn the continuous return distributions to calculate the ESR set. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2204.05027 [pdf, ps, other]

Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

Authors: Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin

Abstract: Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's a… ▽ More Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's attack rate. However, as the mitigation of epidemics involves distinct, and possibly conflicting criteria (i.a., prevalence, mortality, morbidity, cost), a multi-objective approach is warranted to learn balanced policies. To lift this decision-making process to real-world epidemic models, we apply deep multi-objective reinforcement learning and build upon a state-of-the-art algorithm, Pareto Conditioned Networks (PCN), to learn a set of solutions that approximates the Pareto front of the decision problem. We consider the first wave of the Belgian COVID-19 epidemic, which was mitigated by a lockdown, and study different deconfinement strategies, aiming to minimize both COVID-19 cases (i.e., infections and hospitalizations) and the societal burden that is induced by the applied mitigation measures. We contribute a multi-objective Markov decision process that encapsulates the stochastic compartment model that was used to inform policy makers during the COVID-19 epidemic. As these social mitigation measures are implemented in a continuous action space that modulates the contact matrix of the age-structured epidemic model, we extend PCN to this setting. We evaluate the solution returned by PCN, and observe that it correctly learns to reduce the social burden whenever the hospitalization rates are sufficiently low. In this work, we thus show that multi-objective reinforcement learning is attainable in complex epidemiological models and provides essential insights to balance complex mitigation policies. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2112.15422 [pdf, other]

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

Authors: Peter Vamplew, Benjamin J. Smith, Johan Kallstrom, Gabriel Ramos, Roxana Radulescu, Diederik M. Roijers, Conor F. Hayes, Fredrik Heintz, Patrick Mannion, Pieter J. K. Libin, Richard Dazeley, Cameron Foale

Abstract: The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and co… ▽ More The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, it is still undesirable to use this approach for the development of artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. △ Less

Submitted 24 November, 2021; originally announced December 2021.

arXiv:2106.01048 [pdf, other]

doi 10.1007/s00521-022-07334-x

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

Authors: Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion

Abstract: In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal pol… ▽ More In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal policies must be learned. However, settings where the expected utility must be maximised have been largely overlooked by the multi-objective reinforcement learning community and, as a consequence, a set of optimal solutions has yet to be defined. In this paper we address this challenge by proposing first-order stochastic dominance as a criterion to build solution sets to maximise expected utility. We also propose a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice. We then define a new solution concept called the ESR set, which is a set of policies that are ESR dominant. Finally, we define a new multi-objective distributional tabular reinforcement learning (MOT-DRL) algorithm to learn the ESR set in a multi-objective multi-armed bandit setting. △ Less

Submitted 1 July, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

arXiv:2103.09568 [pdf, other]

doi 10.1007/s10458-022-09552-y

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Authors: Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

Abstract: Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr… ▽ More Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Journal ref: Auton Agent Multi-Agent Syst 36, 26 (2022)

arXiv:2102.00966 [pdf, other]

Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

Authors: Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion

Abstract: In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinfo… ▽ More In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Our key insight is that we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time. In this paper, we propose Distributional Monte Carlo Tree Search, an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Moreover, our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns. △ Less

Submitted 2 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

Comments: 8 pages, 4 figures

arXiv:2010.16361 [pdf, other]

Towards Preference Learning for Autonomous Ground Robot Navigation Tasks

Authors: Cory Hayes, Matthew Marge

Abstract: We are interested in the design of autonomous robot behaviors that learn the preferences of users over continued interactions, with the goal of efficiently executing navigation behaviors in a way that the user expects. In this paper, we discuss our work in progress to modify a general model for robot navigation behaviors in an exploration task on a per-user basis using preference-based reinforceme… ▽ More We are interested in the design of autonomous robot behaviors that learn the preferences of users over continued interactions, with the goal of efficiently executing navigation behaviors in a way that the user expects. In this paper, we discuss our work in progress to modify a general model for robot navigation behaviors in an exploration task on a per-user basis using preference-based reinforcement learning. The novel contribution of this approach is that it combines reinforcement learning, motion planning, and natural language processing to allow an autonomous agent to learn from sustained dialogue with a human teammate as opposed to one-off instructions. △ Less

Submitted 5 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: Accepted for publication at AI-HRI 2020 (arXiv:2010.13830)

arXiv:1910.05624 [pdf, other]

A Research Platform for Multi-Robot Dialogue with Humans

Authors: Matthew Marge, Stephen Nogar, Cory J. Hayes, Stephanie M. Lukin, Jesse Bloecker, Eric Holder, Clare Voss

Abstract: This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for… ▽ More This paper presents a research platform that supports spoken dialogue interaction with multiple robots. The demonstration showcases our crafted MultiBot testing scenario in which users can verbally issue search, navigate, and follow instructions to two robotic teammates: a simulated ground robot and an aerial robot. This flexible language and robotic platform takes advantage of existing tools for speech recognition and dialogue management that are compatible with new domains, and implements an inter-agent communication protocol (tactical behavior specification), where verbal instructions are encoded for tasks assigned to the appropriate robot. △ Less

Submitted 12 October, 2019; originally announced October 2019.

Comments: Accepted for publication at NAACL 2019; also presented at AI-HRI 2019 (arXiv:1909.04812)

Report number: AI-HRI/2019/05

arXiv:1812.05001 [pdf, other]

Temporal Analysis of Entity Relatedness and its Evolution using Wikipedia and DBpedia

Authors: Narumol Prangnawarat, John P. McCrae, Conor Hayes

Abstract: Many researchers have made use of the Wikipedia network for relatedness and similarity tasks. However, most approaches use only the most recent information and not historical changes in the network. We provide an analysis of entity relatedness using temporal graph-based approaches over different versions of the Wikipedia article link network and DBpedia, which is an open-source knowledge base extr… ▽ More Many researchers have made use of the Wikipedia network for relatedness and similarity tasks. However, most approaches use only the most recent information and not historical changes in the network. We provide an analysis of entity relatedness using temporal graph-based approaches over different versions of the Wikipedia article link network and DBpedia, which is an open-source knowledge base extracted from Wikipedia. We consider creating the Wikipedia article link network as both a union and intersection of edges over multiple time points and present a novel variation of the Jaccard index to weight edges based on their transience. We evaluate our results against the KORE dataset, which was created in 2010, and show that using the 2010 Wikipedia article link network produces the strongest result, suggesting that semantic similarity is time sensitive. We then show that integrating multiple time frames in our methods can give a better overall similarity demonstrating that temporal evolution can have an important effect on entity relatedness. △ Less

Submitted 12 December, 2018; originally announced December 2018.

arXiv:1810.02017 [pdf, other]

Balancing Efficiency and Coverage in Human-Robot Dialogue Collection

Authors: Matthew Marge, Claire Bonial, Stephanie Lukin, Cory Hayes, Ashley Foots, Ron Artstein, Cassidy Henry, Kimberly Pollard, Carla Gordon, Felix Gervits, Anton Leuski, Susan Hill, Clare Voss, David Traum

Abstract: We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includ… ▽ More We describe a multi-phased Wizard-of-Oz approach to collecting human-robot dialogue in a collaborative search and navigation task. The data is being used to train an initial automated robot dialogue system to support collaborative exploration tasks. In the first phase, a wizard freely typed robot utterances to human participants. For the second phase, this data was used to design a GUI that includes buttons for the most common communications, and templates for communications with varying parameters. Comparison of the data gathered in these phases show that the GUI enabled a faster pace of dialogue while still maintaining high coverage of suitable responses, enabling more efficient targeted data collection, and improvements in natural language understanding using GUI-collected data. As a promising first step towards interactive learning, this work shows that our approach enables the collection of useful training data for navigation-based HRI tasks. △ Less

Submitted 7 October, 2018; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: Presented at AI-HRI AAAI-FSS, 2018 (arXiv:1809.06606)

Report number: AI-HRI/2018/01

arXiv:1807.08074 [pdf, other]

ScoutBot: A Dialogue System for Collaborative Navigation

Authors: Stephanie M. Lukin, Felix Gervits, Cory J. Hayes, Anton Leuski, Pooja Moolchandani, John G. Rogers III, Carlos Sanchez Amaro, Matthew Marge, Clare R. Voss, David Traum

Abstract: ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user's instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot r… ▽ More ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user's instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot responses were initiated by a human wizard in previous interactions. The demonstration will show a simulated ground robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot Operating System). △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: Originally published in the Proceedings of the Association for Computational Linguistics (ACL) 2018, System Demonstrations, 93-98

arXiv:1710.06406 [pdf, other]

Laying Down the Yellow Brick Road: Development of a Wizard-of-Oz Interface for Collecting Human-Robot Dialogue

Authors: Claire Bonial, Matthew Marge, Ron artstein, Ashley Foots, Felix Gervits, Cory J. Hayes, Cassidy Henry, Susan G. Hill, Anton Leuski, Stephanie M. Lukin, Pooja Moolchandani, Kimberly A. Pollard, David Traum, Clare R. Voss

Abstract: We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open param… ▽ More We describe the adaptation and refinement of a graphical user interface designed to facilitate a Wizard-of-Oz (WoZ) approach to collecting human-robot dialogue data. The data collected will be used to develop a dialogue system for robot navigation. Building on an interface previously used in the development of dialogue systems for virtual agents and video playback, we add templates with open parameters which allow the wizard to quickly produce a wide variety of utterances. Our research demonstrates that this approach to data collection is viable as an intermediate step in develo** a dialogue system for physical robots in remote locations from their users - a domain in which the human and robot need to regularly verify and update a shared understanding of the physical environment. We show that our WoZ interface and the fixed set of utterances and templates therein provide for a natural pace of dialogue with good coverage of the navigation domain. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Comments: 7 pages, 2 figures, accepted for oral presentation at the Symposium on Natural Communication for Human-Robot Collaboration, AAAI Fall Symposium Series, November 9-11, 2017, https://www.aaai.org/ocs/index.php/FSS/FSS17

arXiv:1606.02485 [pdf, other]

doi 10.1109/ROMAN.2016.7745138

Exploring Implicit Human Responses to Robot Mistakes in a Learning from Demonstration Task

Authors: Cory J. Hayes, Maryam Moosaei, Laurel D. Riek

Abstract: As robots enter human environments, they will be expected to accomplish a tremendous range of tasks. It is not feasible for robot designers to pre-program these behaviors or know them in advance, so one way to address this is through end-user programming, such as via learning from demonstration (LfD). While significant work has been done on the mechanics of enabling robot learning from human teach… ▽ More As robots enter human environments, they will be expected to accomplish a tremendous range of tasks. It is not feasible for robot designers to pre-program these behaviors or know them in advance, so one way to address this is through end-user programming, such as via learning from demonstration (LfD). While significant work has been done on the mechanics of enabling robot learning from human teachers, one unexplored aspect is enabling mutual feedback between both the human teacher and robot during the learning process, i.e., implicit learning. In this paper, we explore one aspect of this mutual understanding, grounding sequences, where both a human and robot provide non-verbal feedback to signify their mutual understanding during interaction. We conducted a study where people taught an autonomous humanoid robot a dance, and performed gesture analysis to measure people's responses to the robot during correct and incorrect demonstrations. △ Less

Submitted 8 June, 2016; originally announced June 2016.

Comments: 7 pages, 2 figures, IEEE RO-MAN 2016, IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN 2016)

arXiv:1210.1480 [pdf, other]

doi 10.1140/epjst/e2012-01692-1

Theoretical And Technological Building Blocks For An Innovation Accelerator

Authors: Frank van Harmelen, George Kampis, Katy Borner, Peter van den Besselaar, Erik Schultes, Carole Goble, Paul Groth, Barend Mons, Stuart Anderson, Stefan Decker, Conor Hayes, Thierry Buecheler, Dirk Helbing

Abstract: The scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often not available to other scientists, and the independent validation of results is limited. Building on the Innovation Accelerator paper by Helbing and Balietti (2011) this pap… ▽ More The scientific system that we use today was devised centuries ago and is inadequate for our current ICT-based society: the peer review system encourages conservatism, journal publications are monolithic and slow, data is often not available to other scientists, and the independent validation of results is limited. Building on the Innovation Accelerator paper by Helbing and Balietti (2011) this paper takes the initial global vision and reviews the theoretical and technological building blocks that can be used for implementing an innovation (in first place: science) accelerator platform driven by re-imagining the science system. The envisioned platform would rest on four pillars: (i) Redesign the incentive scheme to reduce behavior such as conservatism, herding and hy**; (ii) Advance scientific publications by breaking up the monolithic paper unit and introducing other building blocks such as data, tools, experiment workflows, resources; (iii) Use machine readable semantics for publications, debate structures, provenance etc. in order to include the computer as a partner in the scientific process, and (iv) Build an online platform for collaboration, including a network of trust and reputation among the different types of stakeholders in the scientific system: scientists, educators, funding agencies, policy makers, students and industrial innovators among others. Any such improvements to the scientific system must support the entire scientific process (unlike current tools that chop up the scientific process into disconnected pieces), must facilitate and encourage collaboration and interdisciplinarity (again unlike current tools), must facilitate the inclusion of intelligent computing in the scientific process, must facilitate not only the core scientific process, but also accommodate other stakeholders such science policy makers, industrial innovators, and the general public. △ Less

Submitted 4 October, 2012; originally announced October 2012.

arXiv:1201.2277 [pdf, ps, other]

A Time Decoupling Approach for Studying Forum Dynamics

Authors: Andrey Kan, Jeffrey Chan, Conor Hayes, Bernie Hogan, James Bailey, Christopher Leckie

Abstract: Online forums are rich sources of information about user communication activity over time. Finding temporal patterns in online forum communication threads can advance our understanding of the dynamics of conversations. The main challenge of temporal analysis in this context is the complexity of forum data. There can be thousands of interacting users, who can be numerically described in many differ… ▽ More Online forums are rich sources of information about user communication activity over time. Finding temporal patterns in online forum communication threads can advance our understanding of the dynamics of conversations. The main challenge of temporal analysis in this context is the complexity of forum data. There can be thousands of interacting users, who can be numerically described in many different ways. Moreover, user characteristics can evolve over time. We propose an approach that decouples temporal information about users into sequences of user events and inter-event times. We develop a new feature space to represent the event sequences as paths, and we model the distribution of the inter-event times. We study over 30,000 users across four Internet forums, and discover novel patterns in user communication. We find that users tend to exhibit consistency over time. Furthermore, in our feature space, we observe regions that represent unlikely user behaviors. Finally, we show how to derive a numerical representation for each forum, and we then use this representation to derive a novel clustering of multiple forums. △ Less

Submitted 11 January, 2012; originally announced January 2012.

Comments: This submission is the paper draft after a major revision, it is currently under review in World Wide Web journal. The supplementary data can be downloaded from: http://people.eng.unimelb.edu.au/akan/user-paths/supp.pdf (please contact the authors if that doesn't work for some reason)

arXiv:1010.4327 [pdf]

Cross-Community Dynamics in Science: How Information Retrieval Affects Semantic Web and Vice Versa

Authors: Václav Belák, Marcel Karnstedt, Conor Hayes

Abstract: Community effects on the behaviour of individuals, the community itself and other communities can be observed in a wide range of applications. This is true in scientific research, where communities of researchers have increasingly to justify their impact and progress to funding agencies. While previous work has tried to explain and analyse such phenomena, there is still a great potential for incre… ▽ More Community effects on the behaviour of individuals, the community itself and other communities can be observed in a wide range of applications. This is true in scientific research, where communities of researchers have increasingly to justify their impact and progress to funding agencies. While previous work has tried to explain and analyse such phenomena, there is still a great potential for increasing the quality and accuracy of this analysis, especially in the context of cross-community effects. In this work, we propose a general framework consisting of several different techniques to analyse and explain such dynamics. The proposed methodology works with arbitrary community algorithms and incorporates meta-data to improve the overall quality and expressiveness of the analysis. We suggest and discuss several approaches to understand, interpret and explain particular phenomena, which themselves are identified in an automated manner. We illustrate the benefits and strengths of our approach by exposing highly interesting in-depth details of cross-community effects between two closely related and well established areas of scientific research. We finally conclude and highlight the important open issues on the way towards understanding, defining and eventually predicting typical life-cycles and classes of communities in the context of cross-community effects. △ Less

Submitted 30 November, 2010; v1 submitted 20 October, 2010; originally announced October 2010.

Comments: Extended version of a paper 'Life-Cycles and Mutual Effects of Scientific Communities' presented at ASNA 2010 conference in Zurich (http://www.asna.ch). 28 pages, 7 tables, and 12 figures

Showing 1–20 of 20 results for author: Hayes, C