Skip to main content

Showing 1–11 of 11 results for author: Bossens, D M

.
  1. arXiv:2308.11267  [pdf, other

    cs.LG cs.AI cs.NE

    Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

    Authors: David M. Bossens

    Abstract: The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has prev… ▽ More

    Submitted 15 May, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

  2. arXiv:2212.03932  [pdf, other

    cs.LG cs.AI

    Low Variance Off-policy Evaluation with State-based Importance Sampling

    Authors: David M. Bossens, Philip S. Thomas

    Abstract: In many domains, the exploration process of reinforcement learning will be too costly as it requires trying out suboptimal policies, resulting in a need for off-policy evaluation, in which a target policy is evaluated based on data collected from a known behaviour policy. In this context, importance sampling estimators provide estimates for the expected return by weighting the trajectory based on… ▽ More

    Submitted 4 May, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

  3. arXiv:2209.02066  [pdf, other

    cs.AI cs.LG cs.RO

    Trust in Language Grounding: a new AI challenge for human-robot teams

    Authors: David M. Bossens, Christine Evers

    Abstract: The challenge of language grounding is to fully understand natural language by grounding language in real-world referents. While AI techniques are available, the widespread adoption and effectiveness of such technologies for human-robot teams relies critically on user trust. This survey provides three contributions relating to the newly emerging field of trust in language grounding, including a) a… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

  4. Resilient robot teams: a review integrating decentralised control, change-detection, and learning

    Authors: David M. Bossens, Sarvapali Ramchurn, Danesh Tarapore

    Abstract: Purpose of review: This paper reviews opportunities and challenges for decentralised control, change-detection, and learning in the context of resilient robot teams. Recent findings: Exogenous fault detection methods can provide a generic detection or a specific diagnosis with a recovery solution. Robot teams can perform active and distributed sensing for detecting changes in the environment, in… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: Accepted for Current Robotics Reports

  5. Explicit Explore, Exploit, or Escape ($E^4$): near-optimal safety-constrained reinforcement learning in polynomial time

    Authors: David M. Bossens, Nicholas Bishop

    Abstract: In reinforcement learning (RL), an agent must explore an initially unknown environment in order to learn a desired behaviour. When RL agents are deployed in real world environments, safety is of primary concern. Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment. This paper p… ▽ More

    Submitted 23 June, 2022; v1 submitted 14 November, 2021; originally announced November 2021.

    Comments: Accepted at Machine Learning

  6. Quality-Diversity Meta-Evolution: customising behaviour spaces to a meta-objective

    Authors: David M. Bossens, Danesh Tarapore

    Abstract: Quality-Diversity (QD) algorithms evolve behaviourally diverse and high-performing solutions. To illuminate the elite solutions for a space of behaviours, QD algorithms require the definition of a suitable behaviour space. If the behaviour space is high-dimensional, a suitable dimensionality reduction technique is required to maintain a limited number of behavioural niches. While current methodolo… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  7. arXiv:2106.01741  [pdf, other

    cs.LG cs.AI cs.NE

    Lifetime policy reuse and the importance of task capacity

    Authors: David M. Bossens, Adam J. Sobey

    Abstract: A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting. Policy reuse and other multi-policy reinforcement learning techniques can learn multiple tasks but may generate many policies. This paper presents two novel contributions, namely 1)… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 June, 2021; originally announced June 2021.

  8. arXiv:2105.10317  [pdf, other

    cs.NE cs.AI cs.RO

    On the use of feature-maps and parameter control for improved quality-diversity meta-evolution

    Authors: David M. Bossens, Danesh Tarapore

    Abstract: In Quality-Diversity (QD) algorithms, which evolve a behaviourally diverse archive of high-performing solutions, the behaviour space is a difficult design choice that should be tailored to the target application. In QD meta-evolution, one evolves a population of QD algorithms to optimise the behaviour space based on an archive-level objective, the meta-fitness. This paper proposes an improved meta… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

    Comments: extended version of GECCO'21 paper

  9. arXiv:2012.11444  [pdf, other

    cs.RO cs.AI cs.MA cs.NE

    Rapidly adapting robot swarms with Swarm Map-based Bayesian Optimisation

    Authors: David M. Bossens, Danesh Tarapore

    Abstract: Rapid performance recovery from unforeseen environmental perturbations remains a grand challenge in swarm robotics. To solve this challenge, we investigate a behaviour adaptation approach, where one searches an archive of controllers for potential recovery solutions. To apply behaviour adaptation in swarm robotic systems, we propose two algorithms: (i) Swarm Map-based Optimisation (SMBO), which se… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  10. arXiv:2003.04599  [pdf, other

    cs.RO

    ASVLite: a high-performance simulator for autonomous surface vehicles

    Authors: Toby Thomas, David M. Bossens, Danesh Tarapore

    Abstract: The energy of ocean waves is the key distinguishing factor of marine environments compared to other aquatic environments such as lakes and rivers. Waves significantly affect the dynamics of marine vehicles; hence it is imperative to consider the dynamics of vehicles in waves when develo** efficient control strategies for autonomous surface vehicles (ASVs). However, most marine simulators availab… ▽ More

    Submitted 13 April, 2021; v1 submitted 10 March, 2020; originally announced March 2020.

  11. QED: using Quality-Environment-Diversity to evolve resilient robot swarms

    Authors: David M. Bossens, Danesh Tarapore

    Abstract: In swarm robotics, any of the robots in a swarm may be affected by different faults, resulting in significant performance declines. To allow fault recovery from randomly injected faults to different robots in a swarm, a model-free approach may be preferable due to the accumulation of faults in models and the difficulty to predict the behaviour of neighbouring robots. One model-free approach to fau… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.