Skip to main content

Showing 1–19 of 19 results for author: Vamplew, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.06266  [pdf, other

    cs.LG

    Value function interference and greedy action selection in value-based multi-objective reinforcement learning

    Authors: Peter Vamplew, Cameron Foale, Richard Dazeley

    Abstract: Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) to the more general case of problems with multiple, conflicting objectives, represented by vector-valued rewards. Widely-used scalar RL methods such as Q-learning can be modified to handle multiple objectives by (1) learning vector-valued value functions, and (2) performing action selection usi… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  2. arXiv:2402.02665  [pdf, ps, other

    cs.LG

    Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

    Authors: Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers

    Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perfor… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted for the Blue Sky Track at AAMAS'24

  3. arXiv:2401.03163  [pdf, other

    cs.LG

    An Empirical Investigation of Value-Based Multi-objective Reinforcement Learning for Stochastic Environments

    Authors: Kewen Ding, Peter Vamplew, Cameron Foale, Richard Dazeley

    Abstract: One common approach to solve multi-objective reinforcement learning (MORL) problems is to extend conventional Q-learning by using vector Q-values in combination with a utility function. However issues can arise with this approach in the context of stochastic environments, particularly when optimising for the Scalarised Expected Reward (SER) criterion. This paper extends prior research, providing a… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.08669

  4. arXiv:2305.19223  [pdf, other

    cs.AI cs.CY cs.HC

    Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

    Authors: Catalin Mitelut, Ben Smith, Peter Vamplew

    Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on develo** algorithms and paradigms that ensure AI sys… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  5. arXiv:2210.05187  [pdf, other

    cs.AI cs.LG cs.RO

    Broad-persistent Advice for Interactive Reinforcement Learning Scenarios

    Authors: Francisco Cruz, Adam Bignold, Hung Son Nguyen, Richard Dazeley, Peter Vamplew

    Abstract: The use of interactive advice in reinforcement learning scenarios allows for speeding up the learning process for autonomous agents. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Moreover, the information provided by each interaction is not retained and instead discarded by the agent after a… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: Extended abstract accepted at the 2nd RL-CONFORM Workshop at IEEE/RSJ IROS'22 Conference. 5 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:2102.02441, arXiv:2110.08003

  6. arXiv:2210.03325  [pdf, other

    cs.LG

    Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks

    Authors: Adrian Ly, Richard Dazeley, Peter Vamplew, Francisco Cruz, Sunil Aryal

    Abstract: Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the $Q$-values, commonly referred to as the overestima… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  7. arXiv:2207.03214  [pdf, other

    cs.AI

    Evaluating Human-like Explanations for Robot Actions in Reinforcement Learning Scenarios

    Authors: Francisco Cruz, Charlotte Young, Richard Dazeley, Peter Vamplew

    Abstract: Explainable artificial intelligence is a research field that tries to provide more transparency for autonomous intelligent systems. Explainability has been used, particularly in reinforcement learning and robotic scenarios, to better understand the robot decision-making process. Previous work, however, has been widely focused on providing technical explanations that can be better understood by AI… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: 8 pages, 8 figures

  8. arXiv:2112.15422  [pdf, other

    cs.AI

    Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

    Authors: Peter Vamplew, Benjamin J. Smith, Johan Kallstrom, Gabriel Ramos, Roxana Radulescu, Diederik M. Roijers, Conor F. Hayes, Fredrik Heintz, Patrick Mannion, Pieter J. K. Libin, Richard Dazeley, Cameron Foale

    Abstract: The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and co… ▽ More

    Submitted 24 November, 2021; originally announced December 2021.

  9. arXiv:2108.09003  [pdf, other

    cs.AI

    Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey

    Authors: Richard Dazeley, Peter Vamplew, Francisco Cruz

    Abstract: Broad Explainable Artificial Intelligence moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent's behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: 22 pages, 7 figures

  10. Levels of explainable artificial intelligence for human-aligned conversational explanations

    Authors: Richard Dazeley, Peter Vamplew, Cameron Foale, Charlotte Young, Sunil Aryal, Francisco Cruz

    Abstract: Over the last few years there has been rapid research growth into eXplainable Artificial Intelligence (XAI) and the closely aligned Interpretable Machine Learning (IML). Drivers for this growth include recent legislative changes and increased investments by industry and governments, along with increased concern from the general public. People are affected by autonomous decisions every day and the… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: 35 pages, 13 figures

    Journal ref: Artificial Intelligence, 299, 103525 (2021)

  11. A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Authors: Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

    Abstract: Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Journal ref: Auton Agent Multi-Agent Syst 36, 26 (2022)

  12. arXiv:2102.02441  [pdf, other

    cs.AI cs.MA

    Persistent Rule-based Interactive Reinforcement Learning

    Authors: Adam Bignold, Francisco Cruz, Richard Dazeley, Peter Vamplew, Cameron Foale

    Abstract: Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Additionally, the information provided by each interaction is… ▽ More

    Submitted 2 September, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 24 pages, 7 figures

  13. Human Engagement Providing Evaluative and Informative Advice for Interactive Reinforcement Learning

    Authors: Adam Bignold, Francisco Cruz, Richard Dazeley, Peter Vamplew, Cameron Foale

    Abstract: Interactive reinforcement learning proposes the use of externally-sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to i… ▽ More

    Submitted 7 July, 2022; v1 submitted 20 September, 2020; originally announced September 2020.

    Comments: 23 pages, 15 figures

    Journal ref: Neural Computing and Applications, 1-16 (2022)

  14. A Conceptual Framework for Externally-influenced Agents: An Assisted Reinforcement Learning Review

    Authors: Adam Bignold, Francisco Cruz, Matthew E. Taylor, Tim Brys, Richard Dazeley, Peter Vamplew, Cameron Foale

    Abstract: A long-term goal of reinforcement learning agents is to be able to perform tasks in complex real-world scenarios. The use of external information is one way of scaling agents to more complex problems. However, there is a general lack of collaboration or interoperability between different approaches using external information. In this work, while reviewing externally-influenced methods, we propose… ▽ More

    Submitted 19 September, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: 33 pages, 9 figures

  15. Explainable robotic systems: Understanding goal-driven actions in a reinforcement learning scenario

    Authors: Francisco Cruz, Richard Dazeley, Peter Vamplew, Ithan Moreira

    Abstract: Robotic systems are more present in our society everyday. In human-robot environments, it is crucial that end-users may correctly understand their robotic team-partners, in order to collaboratively complete a task. To increase action understanding, users demand more explainability about the decisions by the robot in particular situations. Recently, explainable robotic systems have emerged as an al… ▽ More

    Submitted 2 September, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 26 pages, 10 figures

  16. arXiv:2005.02057  [pdf, other

    cs.LG stat.ML

    Discrete-to-Deep Supervised Policy Learning

    Authors: Budi Kurniawan, Peter Vamplew, Michael Papasimeon, Richard Dazeley, Cameron Foale

    Abstract: Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises th… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: 9 pages, 9 figures. Adaptive and Learning Agents Workshop at AAMAS 2020, Auckland, New Zealand

    MSC Class: 68T05 ACM Class: I.2.6

  17. arXiv:2004.06277  [pdf, other

    cs.LG cs.MA stat.ML

    A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions

    Authors: Peter Vamplew, Cameron Foale, Richard Dazeley

    Abstract: We report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. An example multiobjective Markov Decision Process (MOMDP) is used to demonstrate that under such conditions these approaches may be unable to discover the policy which maximises the Scalarised Expected Return, a… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: 6 pages. Accepted for presentation in the Adaptive and Learning Agents Workshop, AAMAS 2020

    Journal ref: The impact of environmental stochasticity on value-based multiobjective reinforcement learning, Neural Computing and Applications, 2021

  18. A Multi-Objective Deep Reinforcement Learning Framework

    Authors: Thanh Thi Nguyen, Ngoc Duy Nguyen, Peter Vamplew, Saeid Nahavandi, Richard Dazeley, Chee Peng Lim

    Abstract: This paper introduces a new scalable multi-objective deep reinforcement learning (MODRL) framework based on deep Q-networks. We develop a high-performance MODRL framework that supports both single-policy and multi-policy strategies, as well as both linear and non-linear approaches to action selection. The experimental results on two benchmark problems (two-objective deep sea treasure environment a… ▽ More

    Submitted 19 June, 2020; v1 submitted 7 March, 2018; originally announced March 2018.

    Comments: 21 pages

    Report number: Volume 96, November 2020, 103915

    Journal ref: Engineering Applications of Artificial Intelligence, 2020

  19. A Survey of Multi-Objective Sequential Decision-Making

    Authors: Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley

    Abstract: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, l… ▽ More

    Submitted 3 February, 2014; originally announced February 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 48, pages 67-113, 2013