Search | arXiv e-print repository

AutoInspect: Towards Long-Term Autonomous Industrial Inspection

Authors: Michal Staniaszek, Tobit Flatscher, Joseph Rowell, Hanlin Niu, Wenxing Liu, Yang You, Robert Skilton, Maurice Fallon, Nick Hawes

Abstract: We give an overview of AutoInspect, a ROS-based software system for robust and extensible mission-level autonomy. Over the past three years AutoInspect has been deployed in a variety of environments, including at a mine, a chemical plant, a mock oil rig, decommissioned nuclear power plants, and a fusion reactor for durations ranging from hours to weeks. The system combines robust map** and local… ▽ More We give an overview of AutoInspect, a ROS-based software system for robust and extensible mission-level autonomy. Over the past three years AutoInspect has been deployed in a variety of environments, including at a mine, a chemical plant, a mock oil rig, decommissioned nuclear power plants, and a fusion reactor for durations ranging from hours to weeks. The system combines robust map** and localisation with graph-based autonomous navigation, mission execution, and scheduling to achieve a complete autonomous inspection system. The time from arrival at a new site to autonomous mission execution can be under an hour. It is deployed on a Boston Dynamics Spot robot using a custom sensing and compute payload called Frontier. In this work we go into detail of the system's performance in two long-term deployments of 49 days at a robotics test facility, and 35 days at the Joint European Torus (JET) fusion reactor in Oxfordshire, UK. △ Less

Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

arXiv:2404.10446 [pdf, other]

Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Authors: Matthew Gadd, Daniele De Martini, Luke Pitt, Wayne Tubby, Matthew Towlson, Chris Prahacs, Oliver Bartlett, John Jackson, Man Qi, Paul Newman, Andrew Hector, Roberto Salguero-Gómez, Nick Hawes

Abstract: We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platfor… ▽ More We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings. △ Less

Submitted 1 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: to be presented at the Workshop on Field Robotics - ICRA 2024

arXiv:2404.07732 [pdf, other]

Monte Carlo Tree Search with Boltzmann Exploration

Authors: Michael Painter, Mohamed Baioumy, Nick Hawes, Bruno Lacerda

Abstract: Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample action… ▽ More Monte-Carlo Tree Search (MCTS) methods, such as Upper Confidence Bound applied to Trees (UCT), are instrumental to automated planning techniques. However, UCT can be slow to explore an optimal action when it initially appears inferior to other actions. Maximum ENtropy Tree-Search (MENTS) incorporates the maximum entropy principle into an MCTS approach, utilising Boltzmann policies to sample actions, naturally encouraging more exploration. In this paper, we highlight a major limitation of MENTS: optimal actions for the maximum entropy objective do not necessarily correspond to optimal actions for the original objective. We introduce two algorithms, Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS), that address these limitations and preserve the benefits of Boltzmann policies, such as allowing actions to be sampled faster by using the Alias method. Our empirical analysis shows that our algorithms show consistent high performance across several benchmark domains, including the game of Go. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Camera ready version of NeurIPS2023 paper

Journal ref: Advances in Neural Information Processing Systems 36 (2024)

arXiv:2404.02795 [pdf, other]

Robust Pushing: Exploiting Quasi-static Belief Dynamics and Contact-informed Optimization

Authors: Julius Jankowski, Lara Brudermüller, Nick Hawes, Sylvain Calinon

Abstract: Non-prehensile manipulation such as pushing is typically subject to uncertain, non-smooth dynamics. However, modeling the uncertainty of the dynamics typically results in intractable belief dynamics, making data-efficient planning under uncertainty difficult. This article focuses on the problem of efficiently generating robust open-loop pushing plans. First, we investigate how the belief over obje… ▽ More Non-prehensile manipulation such as pushing is typically subject to uncertain, non-smooth dynamics. However, modeling the uncertainty of the dynamics typically results in intractable belief dynamics, making data-efficient planning under uncertainty difficult. This article focuses on the problem of efficiently generating robust open-loop pushing plans. First, we investigate how the belief over object configurations propagates through quasi-static contact dynamics. We exploit the simplified dynamics to predict the variance of the object configuration without sampling from a perturbation distribution. In a sampling-based trajectory optimization algorithm, the gain of the variance is constrained in order to enforce robustness of the plan. Second, we propose an informed trajectory sampling mechanism for drawing robot trajectories that are likely to make contact with the object. This sampling mechanism is shown to significantly improve chances of finding robust solutions, especially when making-and-breaking contacts is required. We demonstrate that the proposed approach is able to synthesize bi-manual pushing trajectories, resulting in successful long-horizon pushing maneuvers without exteroceptive feedback such as vision or tactile feedback. We furthermore deploy the proposed approach in a model-predictive control scheme, demonstrating additional robustness against unmodeled perturbations. △ Less

Submitted 27 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: submitted to the International Journal of Robotics Research (IJRR)

arXiv:2402.01370 [pdf, other]

CC-VPSTO: Chance-Constrained Via-Point-based Stochastic Trajectory Optimisation for Safe and Efficient Online Robot Motion Planning

Authors: Lara Brudermüller, Guillaume Berger, Julius Jankowski, Raunak Bhattacharyya, Raphaël Jungers, Nick Hawes

Abstract: Safety in the face of uncertainty is a key challenge in robotics. We introduce a real-time capable framework to generate safe and task-efficient robot motions for stochastic control problems. We frame this as a chance-constrained optimisation problem constraining the probability of the controlled system to violate a safety constraint to be below a set threshold. To estimate this probability we pro… ▽ More Safety in the face of uncertainty is a key challenge in robotics. We introduce a real-time capable framework to generate safe and task-efficient robot motions for stochastic control problems. We frame this as a chance-constrained optimisation problem constraining the probability of the controlled system to violate a safety constraint to be below a set threshold. To estimate this probability we propose a Monte--Carlo approximation. We suggest several ways to construct the problem given a fixed number of uncertainty samples, such that it is a reliable over-approximation of the original problem, i.e. any solution to the sample-based problem adheres to the original chance-constraint with high confidence. To solve the resulting problem, we integrate it into our motion planner VP-STO and name the enhanced framework Chance-Constrained (CC)-VPSTO. The strengths of our approach lie in i) its generality, without assumptions on the underlying uncertainty distribution, system dynamics, cost function, or the form of inequality constraints; and ii) its applicability to MPC-settings. We demonstrate the validity and efficiency of our approach on both simulation and real-world robot experiments. △ Less

Submitted 9 April, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 17 pages, 11 figures, submitted to IEEE Transactions on Robotics

arXiv:2311.10090 [pdf, other]

JaxMARL: Multi-Agent RL Environments in JAX

Authors: Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

Abstract: Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware accelerat… ▽ More Benchmarks play an important role in the development of machine learning algorithms. For example, research in reinforcement learning (RL) has been heavily influenced by available environments and benchmarks. However, RL environments are traditionally run on the CPU, limiting their scalability with typical academic compute. Recent advancements in JAX have enabled the wider use of hardware acceleration to overcome these computational hurdles, enabling massively parallel RL training pipelines and environments. This is particularly useful for multi-agent reinforcement learning (MARL) research. First of all, multiple agents must be considered at each environment step, adding computational burden, and secondly, the sample complexity is increased due to non-stationarity, decentralised partial observability, or other MARL challenges. In this paper, we present JaxMARL, the first open-source code base that combines ease-of-use with GPU enabled efficiency, and supports a large number of commonly used MARL environments as well as popular baseline algorithms. When considering wall clock time, our experiments show that per-run our JAX-based training pipeline is up to 12500x faster than existing approaches. This enables efficient and thorough evaluations, with the potential to alleviate the evaluation crisis of the field. We also introduce and benchmark SMAX, a vectorised, simplified version of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. We provide code at https://github.com/flairox/jaxmarl. △ Less

Submitted 19 December, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2307.00633 [pdf, other]

Effects of Explanation Specificity on Passengers in Autonomous Driving

Authors: Daniel Omeiza, Raunak Bhattacharyya, Nick Hawes, Marina Jirotka, Lars Kunze

Abstract: The nature of explanations provided by an explainable AI algorithm has been a topic of interest in the explainable AI and human-computer interaction community. In this paper, we investigate the effects of natural language explanations' specificity on passengers in autonomous driving. We extended an existing data-driven tree-based explainer algorithm by adding a rule-based option for explanation ge… ▽ More The nature of explanations provided by an explainable AI algorithm has been a topic of interest in the explainable AI and human-computer interaction community. In this paper, we investigate the effects of natural language explanations' specificity on passengers in autonomous driving. We extended an existing data-driven tree-based explainer algorithm by adding a rule-based option for explanation generation. We generated auditory natural language explanations with different levels of specificity (abstract and specific) and tested these explanations in a within-subject user study (N=39) using an immersive physical driving simulation setup. Our results showed that both abstract and specific explanations had similar positive effects on passengers' perceived safety and the feeling of anxiety. However, the specific explanations influenced the desire of passengers to takeover driving control from the autonomous vehicle (AV), while the abstract explanations did not. We conclude that natural language auditory explanations are useful for passengers in autonomous driving, and their specificity levels could influence how much in-vehicle participants would wish to be in control of the driving activity. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.09211 [pdf, other]

A Framework for Learning from Demonstration with Minimal Human Effort

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

Abstract: We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For e… ▽ More We consider robot learning in the context of shared autonomy, where control of the system can switch between a human teleoperator and autonomous control. In this setting we address reinforcement learning, and learning from demonstration, where there is a cost associated with human time. This cost represents the human time required to teleoperate the robot, or recover the robot from failures. For each episode, the agent must choose between requesting human teleoperation, or using one of its autonomous controllers. In our approach, we learn to predict the success probability for each controller, given the initial state of an episode. This is used in a contextual multi-armed bandit algorithm to choose the controller for the episode. A controller is learnt online from demonstrations and reinforcement learning so that autonomous performance improves, and the system becomes less reliant on the teleoperator with more experience. We show that our approach to controller selection reduces the human cost to perform two simulated tasks and a single real-world task. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: Preprint version of IEEE Robotics and Automation Letters paper

arXiv:2302.03086 [pdf, other]

DITTO: Offline Imitation Learning with World Models

Authors: Branton DeMoss, Paul Duckworth, Nick Hawes, Ingmar Posner

Abstract: We propose DITTO, an offline imitation learning algorithm which uses world models and on-policy reinforcement learning to addresses the problem of covariate shift, without access to an oracle or any additional online interactions. We discuss how world models enable offline, on-policy imitation learning, and propose a simple intrinsic reward defined in the world model latent space that induces imit… ▽ More We propose DITTO, an offline imitation learning algorithm which uses world models and on-policy reinforcement learning to addresses the problem of covariate shift, without access to an oracle or any additional online interactions. We discuss how world models enable offline, on-policy imitation learning, and propose a simple intrinsic reward defined in the world model latent space that induces imitation learning by reinforcement learning. Theoretically, we show that our formulation induces a divergence bound between expert and learner, in turn bounding the difference in reward. We test our method on difficult Atari environments from pixels alone, and achieve state-of-the-art performance in the offline setting. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2212.00124 [pdf, other]

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

Abstract: Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributio… ▽ More Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains. △ Less

Submitted 30 October, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

Comments: NeurIPS 2023

arXiv:2210.04067 [pdf, other]

VP-STO: Via-point-based Stochastic Trajectory Optimization for Reactive Robot Behavior

Authors: Julius Jankowski, Lara Brudermüller, Nick Hawes, Sylvain Calinon

Abstract: Achieving reactive robot behavior in complex dynamic environments is still challenging as it relies on being able to solve trajectory optimization problems quickly enough, such that we can replan the future motion at frequencies which are sufficiently high for the task at hand. We argue that current limitations in Model Predictive Control (MPC) for robot manipulators arise from inefficient, high-d… ▽ More Achieving reactive robot behavior in complex dynamic environments is still challenging as it relies on being able to solve trajectory optimization problems quickly enough, such that we can replan the future motion at frequencies which are sufficiently high for the task at hand. We argue that current limitations in Model Predictive Control (MPC) for robot manipulators arise from inefficient, high-dimensional trajectory representations and the negligence of time-optimality in the trajectory optimization process. Therefore, we propose a motion optimization framework that optimizes jointly over space and time, generating smooth and timing-optimal robot trajectories in joint-space. While being task-agnostic, our formulation can incorporate additional task-specific requirements, such as collision avoidance, and yet maintain real-time control rates, demonstrated in simulation and real-world robot experiments on closed-loop manipulation. For additional material, please visit https://sites.google.com/oxfordrobotics.institute/vp-sto. △ Less

Submitted 14 March, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: *Authors contributed equally

arXiv:2207.13409 [pdf, other]

Unbiased Active Inference for Classical Control

Authors: Mohamed Baioumy, Corrado Pezzato, Riccardo Ferrari, Nick Hawes

Abstract: Active inference is a mathematical framework that originated in computational neuroscience. Recently, it has been demonstrated as a promising approach for constructing goal-driven behavior in robotics. Specifically, the active inference controller (AIC) has been successful on several continuous control and state-estimation tasks. Despite its relative success, some established design choices lead t… ▽ More Active inference is a mathematical framework that originated in computational neuroscience. Recently, it has been demonstrated as a promising approach for constructing goal-driven behavior in robotics. Specifically, the active inference controller (AIC) has been successful on several continuous control and state-estimation tasks. Despite its relative success, some established design choices lead to a number of practical limitations for robot control. These include having a biased estimate of the state, and only an implicit model of control actions. In this paper, we highlight these limitations and propose an extended version of the unbiased active inference controller (u-AIC). The u-AIC maintains all the compelling benefits of the AIC and removes its limitations. Simulation results on a 2-DOF arm and experiments on a real 7-DOF manipulator show the improved performance of the u-AIC with respect to the standard AIC. The code can be found at https://github.com/cpezzato/unbiased_aic. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: 8 pages, 8 figures. Accepted at IROS 2022

arXiv:2204.12581 [pdf, other]

RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

Abstract: Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO),… ▽ More Offline reinforcement learning (RL) aims to find performant policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. We formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained to minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and adversarially optimising the model. The problem formulation that we address is theoretically grounded, resulting in a probably approximately correct (PAC) performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that it outperforms existing state-of-the-art baselines. △ Less

Submitted 11 October, 2022; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: NeurIPS 2022

arXiv:2204.08035 [pdf, other]

Beta Residuals: Improving Fault-Tolerant Control for Sensory Faults via Bayesian Inference and Precision Learning

Authors: Mohamed Baioumy, William Hartemink, Riccardo M. G. Ferrari, Nick Hawes

Abstract: Model-based fault-tolerant control (FTC) often consists of two distinct steps: fault detection & isolation (FDI), and fault accommodation. In this work we investigate posing fault-tolerant control as a single Bayesian inference problem. Previous work showed that precision learning allows for stochastic FTC without an explicit fault detection step. While this leads to implicit fault recovery, infor… ▽ More Model-based fault-tolerant control (FTC) often consists of two distinct steps: fault detection & isolation (FDI), and fault accommodation. In this work we investigate posing fault-tolerant control as a single Bayesian inference problem. Previous work showed that precision learning allows for stochastic FTC without an explicit fault detection step. While this leads to implicit fault recovery, information on sensor faults is not provided, which may be essential for triggering other impact-mitigation actions. In this paper, we introduce a precision-learning based Bayesian FTC approach and a novel beta residual for fault detection. Simulation results are presented, supporting the use of beta residual against competing approaches. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: 7 pages, 2 figures. Accepted at the 11th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes - SAFEPROCESS 2022

arXiv:2110.12746 [pdf, other]

Planning for Risk-Aversion and Expected Value in MDPs

Authors: Marc Rigter, Paul Duckworth, Bruno Lacerda, Nick Hawes

Abstract: Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). However, optimising the CVaR alone may res… ▽ More Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). However, optimising the CVaR alone may result in poor performance in expectation. In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. This motivates us to propose a lexicographic approach which minimises the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on four domains. Our results demonstrate that our lexicographic approach improves the expected cost compared to the state of the art algorithm, while achieving the optimal CVaR. △ Less

Submitted 10 March, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: Accepted to ICAPS 2022

arXiv:2109.11287 [pdf, other]

Risk-Aware Motion Planning in Partially Known Environments

Authors: Fernando S. Barbosa, Bruno Lacerda, Paul Duckworth, Jana Tumova, Nick Hawes

Abstract: Recent trends envisage robots being deployed in areas deemed dangerous to humans, such as buildings with gas and radiation leaks. In such situations, the model of the underlying hazardous process might be unknown to the agent a priori, giving rise to the problem of planning for safe behaviour in partially known environments. We employ Gaussian process regression to create a probabilistic model of… ▽ More Recent trends envisage robots being deployed in areas deemed dangerous to humans, such as buildings with gas and radiation leaks. In such situations, the model of the underlying hazardous process might be unknown to the agent a priori, giving rise to the problem of planning for safe behaviour in partially known environments. We employ Gaussian process regression to create a probabilistic model of the hazardous process from local noisy samples. The result of this regression is then used by a risk metric, such as the Conditional Value-at-Risk, to reason about the safety at a certain state. The outcome is a risk function that can be employed in optimal motion planning problems. We demonstrate the use of the proposed function in two approaches. First is a sampling-based motion planning algorithm with an event-based trigger for online replanning. Second is an adaptation to the incremental Gaussian Process motion planner (iGPMP2), allowing it to quickly react and adapt to the environment. Both algorithms are evaluated in representative simulation scenarios, where they demonstrate the ability of avoiding high-risk areas. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: 7 pages, 2 figures, to be published in CDC 2021

arXiv:2109.05870 [pdf, other]

doi 10.1007/978-3-030-93736-2_48

Towards Stochastic Fault-tolerant Control using Precision Learning and Active Inference

Authors: Mohamed Baioumy, Corrado Pezzato, Carlos Hernandez Corbato, Nick Hawes, Riccardo Ferrari

Abstract: This work presents a fault-tolerant control scheme for sensory faults in robotic manipulators based on active inference. In the majority of existing schemes, a binary decision of whether a sensor is healthy (functional) or faulty is made based on measured data. The decision boundary is called a threshold and it is usually deterministic. Following a faulty decision, fault recovery is obtained by ex… ▽ More This work presents a fault-tolerant control scheme for sensory faults in robotic manipulators based on active inference. In the majority of existing schemes, a binary decision of whether a sensor is healthy (functional) or faulty is made based on measured data. The decision boundary is called a threshold and it is usually deterministic. Following a faulty decision, fault recovery is obtained by excluding the malfunctioning sensor. We propose a stochastic fault-tolerant scheme based on active inference and precision learning which does not require a priori threshold definitions to trigger fault recovery. Instead, the sensor precision, which represents its health status, is learned online in a model-free way allowing the system to gradually, and not abruptly exclude a failing unit. Experiments on a robotic manipulator show promising results and directions for future work are discussed. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Presented at the International Workshop on Active Inference (IWAI) 2021; 11 pages, 3 figures

Journal ref: Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham

arXiv:2109.05866 [pdf, other]

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

Authors: Mohamed Baioumy, Bruno Lacerda, Paul Duckworth, Nick Hawes

Abstract: Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs genera… ▽ More Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs generalize finite and infinite horizon MDPs and are widely used in the artificial intelligence community. Additionally, we highlight some of the differences between solving an MDP using dynamic programming approaches widely used in the artificial intelligence community and approaches used in the active inference community. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Presented at the second International Workshop on Active Inference (IWAI 2021); 11 pages, 2 figures

arXiv:2104.01817 [pdf, other]

Fault-tolerant Control of Robot Manipulators with Sensory Faults using Unbiased Active Inference

Authors: Mohamed Baioumy, Corrado Pezzato, Riccardo Ferrari, Carlos Hernandez Corbato, Nick Hawes

Abstract: This work presents a novel fault-tolerant control scheme based on active inference. Specifically, a new formulation of active inference which, unlike previous solutions, provides unbiased state estimation and simplifies the definition of probabilistically robust thresholds for fault-tolerant control of robotic systems using the free-energy. The proposed solution makes use of the sensory prediction… ▽ More This work presents a novel fault-tolerant control scheme based on active inference. Specifically, a new formulation of active inference which, unlike previous solutions, provides unbiased state estimation and simplifies the definition of probabilistically robust thresholds for fault-tolerant control of robotic systems using the free-energy. The proposed solution makes use of the sensory prediction errors in the free-energy for the generation of residuals and thresholds for fault detection and isolation of sensory faults, and it does not require additional controllers for fault recovery. Results validating the benefits in a simulated 2-DOF manipulator are presented, and future directions to improve the current fault recovery approach are discussed. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 7 pages, 6 figures, Accepted at the European Control Conference (ECC) 2021

arXiv:2102.05762 [pdf, other]

Risk-Averse Bayes-Adaptive Reinforcement Learning

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

Abstract: In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to… ▽ More In this work, we address risk-averse Bayes-adaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem. △ Less

Submitted 26 October, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: Full version of NeurIPS 2021 paper

arXiv:2012.04626 [pdf, other]

Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes

Authors: Marc Rigter, Bruno Lacerda, Nick Hawes

Abstract: The parameters for a Markov Decision Process (MDP) often cannot be specified exactly. Uncertain MDPs (UMDPs) capture this model ambiguity by defining sets which the parameters belong to. Minimax regret has been proposed as an objective for planning in UMDPs to find robust policies which are not overly conservative. In this work, we focus on planning for Stochastic Shortest Path (SSP) UMDPs with un… ▽ More The parameters for a Markov Decision Process (MDP) often cannot be specified exactly. Uncertain MDPs (UMDPs) capture this model ambiguity by defining sets which the parameters belong to. Minimax regret has been proposed as an objective for planning in UMDPs to find robust policies which are not overly conservative. In this work, we focus on planning for Stochastic Shortest Path (SSP) UMDPs with uncertain cost and transition functions. We introduce a Bellman equation to compute the regret for a policy. We propose a dynamic programming algorithm that utilises the regret Bellman equation, and show that it optimises minimax regret exactly for UMDPs with independent uncertainties. For coupled uncertainties, we extend our approach to use options to enable a trade off between computation and solution quality. We evaluate our approach on both synthetic and real-world domains, showing that it significantly outperforms existing baselines. △ Less

Submitted 12 February, 2023; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Full version of AAAI 2021 paper, with corrigendum attached that describes error in original paper

arXiv:2005.05894 [pdf, other]

Active Inference for Integrated State-Estimation, Control, and Learning

Authors: Mohamed Baioumy, Paul Duckworth, Bruno Lacerda, Nick Hawes

Abstract: This work presents an approach for control, state-estimation and learning model (hyper)parameters for robotic manipulators. It is based on the active inference framework, prominent in computational neuroscience as a theory of the brain, where behaviour arises from minimizing variational free-energy. The robotic manipulator shows adaptive and robust behaviour compared to state-of-the-art methods. A… ▽ More This work presents an approach for control, state-estimation and learning model (hyper)parameters for robotic manipulators. It is based on the active inference framework, prominent in computational neuroscience as a theory of the brain, where behaviour arises from minimizing variational free-energy. The robotic manipulator shows adaptive and robust behaviour compared to state-of-the-art methods. Additionally, we show the exact relationship to classic methods such as PID control. Finally, we show that by learning a temporal parameter and model variances, our approach can deal with unmodelled dynamics, damps oscillations, and is robust against disturbances and poor initial parameters. The approach is validated on the `Franka Emika Panda' 7 DoF manipulator. △ Less

Submitted 30 March, 2021; v1 submitted 12 May, 2020; originally announced May 2020.

Comments: 7 pages, 6 figures, accepted for presentation at the International Conference on Robotics and Automation (ICRA) 2021

arXiv:2003.04445 [pdf, other]

Convex Hull Monte-Carlo Tree Search

Authors: Michael Painter, Bruno Lacerda, Nick Hawes

Abstract: This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating… ▽ More This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multiobjective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget. △ Less

Submitted 23 March, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: Camera-ready version of paper accepted to ICAPS 2020, along with relevant appendices

arXiv:1911.04848 [pdf, other]

doi 10.1145/3472206

Mixed-Initiative variable autonomy for remotely operated mobile robots

Authors: Manolis Chiou, Nick Hawes, Rustam Stolkin

Abstract: This paper presents an Expert-guided Mixed-Initiative Control Switcher (EMICS) for remotely operated mobile robots. The EMICS enables switching between different levels of autonomy during task execution initiated by either the human operator and/or the EMICS. The EMICS is evaluated in two disaster response inspired experiments, one with a simulated robot and test arena, and one with a real robot i… ▽ More This paper presents an Expert-guided Mixed-Initiative Control Switcher (EMICS) for remotely operated mobile robots. The EMICS enables switching between different levels of autonomy during task execution initiated by either the human operator and/or the EMICS. The EMICS is evaluated in two disaster response inspired experiments, one with a simulated robot and test arena, and one with a real robot in a realistic environment. Analyses from the two experiments provide evidence that: a) Human-Initiative (HI) systems outperform systems with single modes of operation, such as pure teleoperation, in navigation tasks; b) in the context of the simulated robot experiment, Mixed-Initiative (MI) systems provide improved performance in navigation tasks, improved operator performance in cognitive demanding secondary tasks, and improved operator workload compared to HI. Results also reinforce previous human-robot interaction evidence regarding the importance of the operator's personality traits and their trust in the autonomous system. Lastly, our experiment on a physical robot provides empirical evidence that identify two major challenges for MI control: a) the design of context-aware MI control systems; and b) the conflict for control between the robot's MI control system and the operator. Insights regarding these challenges are discussed and ways to tackle them are proposed. △ Less

Submitted 6 October, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

Comments: Submitted for journal publication, under review

Journal ref: ACM Transactions on Human-Robot Interaction, Volume 10, Issue 4, 2021

arXiv:1807.05196 [pdf, ps, other]

Artificial Intelligence for Long-Term Robot Autonomy: A Survey

Authors: Lars Kunze, Nick Hawes, Tom Duckett, Marc Hanheide, Tomáš Krajník

Abstract: Autonomous systems will play an essential role in many applications across diverse domains including space, marine, air, field, road, and service robotics. They will assist us in our daily routines and perform dangerous, dirty and dull tasks. However, enabling robotic systems to perform autonomously in complex, real-world scenarios over extended time periods (i.e. weeks, months, or years) poses ma… ▽ More Autonomous systems will play an essential role in many applications across diverse domains including space, marine, air, field, road, and service robotics. They will assist us in our daily routines and perform dangerous, dirty and dull tasks. However, enabling robotic systems to perform autonomously in complex, real-world scenarios over extended time periods (i.e. weeks, months, or years) poses many challenges. Some of these have been investigated by sub-disciplines of Artificial Intelligence (AI) including navigation & map**, perception, knowledge representation & reasoning, planning, interaction, and learning. The different sub-disciplines have developed techniques that, when re-integrated within an autonomous system, can enable robots to operate effectively in complex, long-term scenarios. In this paper, we survey and discuss AI techniques as 'enablers' for long-term robot autonomy, current progress in integrating these techniques within long-running robotic systems, and the future challenges and opportunities for AI in long-term autonomy. △ Less

Submitted 13 July, 2018; originally announced July 2018.

Comments: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L)

arXiv:1803.02906 [pdf, other]

Simultaneous Task Allocation and Planning Under Uncertainty

Authors: Fatma Faruq, Bruno Lacerda, Nick Hawes, David Parker

Abstract: We propose novel techniques for task allocation and planning in multi-robot systems operating in uncertain environments. Task allocation is performed simultaneously with planning, which provides more detailed information about individual robot behaviour, but also exploits independence between tasks to do so efficiently. We use Markov decision processes to model robot behaviour and linear temporal… ▽ More We propose novel techniques for task allocation and planning in multi-robot systems operating in uncertain environments. Task allocation is performed simultaneously with planning, which provides more detailed information about individual robot behaviour, but also exploits independence between tasks to do so efficiently. We use Markov decision processes to model robot behaviour and linear temporal logic to specify tasks and safety constraints. Building upon techniques and tools from formal verification, we show how to generate a sequence of multi-robot policies, iteratively refining them to reallocate tasks if individual robots fail, and providing probabilistic guarantees on the performance (and safe operation) of the team of robots under the resulting policy. We implement our approach and evaluate it on a benchmark multi-robot example. △ Less

Submitted 10 August, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

arXiv:1702.08513 [pdf, other]

doi 10.1109/IROS.2017.8206444

Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work

Authors: Nizar Massouh, Francesca Babiloni, Tatiana Tommasi, Jay Young, Nick Hawes, Barbara Caputo

Abstract: Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially… ▽ More Deep networks thrive when trained on large scale data collections. This has given ImageNet a central role in the development of deep architectures for visual object classification. However, ImageNet was created during a specific period in time, and as such it is prone to aging, as well as dataset bias issues. Moving beyond fixed training datasets will lead to more robust visual systems, especially when deployed on robots in new environments which must train on the objects they encounter there. To make this possible, it is important to break free from the need for manual annotators. Recent work has begun to investigate how to use the massive amount of images available on the Web in place of manual image annotations. We contribute to this research thread with two findings: (1) a study correlating a given level of noisily labels to the expected drop in accuracy, for two deep architectures, on two different types of noise, that clearly identifies GoogLeNet as a suitable architecture for learning from Web data; (2) a recipe for the creation of Web datasets with minimal noise and maximum visual variability, based on a visual and natural language processing concept expansion strategy. By combining these two results, we obtain a method for learning powerful deep object models automatically from the Web. We confirm the effectiveness of our approach through object categorization experiments using our Web-derived version of ImageNet on a popular robot vision benchmark database, and on a lifelong object discovery task on a mobile robot. △ Less

Submitted 28 February, 2017; originally announced February 2017.

Comments: 8 pages, 7 figures, 3 tables

Journal ref: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1604.04384 [pdf, other]

doi 10.1109/MRA.2016.2636359

The STRANDS Project: Long-Term Autonomy in Everyday Environments

Authors: Nick Hawes, Chris Burbridge, Ferdian Jovan, Lars Kunze, Bruno Lacerda, Lenka Mudrová, Jay Young, Jeremy Wyatt, Denise Hebesberger, Tobias Körtner, Rares Ambrus, Nils Bore, John Folkesson, Patric Jensfelt, Lucas Beyer, Alexander Hermans, Bastian Leibe, Aitor Aldoma, Thomas Fäulhammer, Michael Zillich, Markus Vincze, Eris Chinellato, Muhannad Al-Omari, Paul Duckworth, Yiannis Gatsoulis , et al. (8 additional authors not shown)

Abstract: Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile… ▽ More Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance. △ Less

Submitted 14 October, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

Showing 1–28 of 28 results for author: Hawes, N