-
Autoregressive Policies for Continuous Control Deep Reinforcement Learning
Authors:
Dmytro Korenkevych,
A. Rupam Mahmood,
Gautham Vasan,
James Bergstra
Abstract:
Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gauss…
▽ More
Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gaussian policies do not result in an effective exploration of an environment and become increasingly inefficient as the action rate increases. This contributes to a low sample efficiency often observed in learning continuous control tasks. We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains. We show that proposed processes possess two desirable features: subsequent process observations are temporally coherent with continuously adjustable degree of coherence, and the process stationary distribution is standard normal. We derive an autoregressive policy (ARP) that implements such processes maintaining the standard agent-environment interface. We show how ARPs can be easily used with the existing off-the-shelf learning algorithms. Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.
-
Benchmarking Reinforcement Learning Algorithms on Real-World Robots
Authors:
A. Rupam Mahmood,
Dmytro Korenkevych,
Gautham Vasan,
William Ma,
James Bergstra
Abstract:
Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications…
▽ More
Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications, it is crucial to withhold utilizing the unique advantages of simulations that do not transfer to the real world and experiment directly with physical robots. However, reinforcement learning research with physical robots faces substantial resistance due to the lack of benchmark tasks and supporting source code. In this work, we introduce several reinforcement learning tasks with multiple commercially available robots that present varying levels of learning difficulty, setup, and repeatability. On these tasks, we test the learning performance of off-the-shelf implementations of four reinforcement learning algorithms and analyze sensitivity to their hyper-parameters to determine their readiness for applications in various real-world tasks. Our results show that with a careful setup of the task interface and computations, some of these implementations can be readily applicable to physical robots. We find that state-of-the-art learning algorithms are highly sensitive to their hyper-parameters and their relative ordering does not transfer across tasks, indicating the necessity of re-tuning them for each task for best performance. On the other hand, the best hyper-parameter configuration from one task may often result in effective learning on held-out tasks even with different robots, providing a reasonable default. We make the benchmark tasks publicly available to enhance reproducibility in real-world reinforcement learning.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks
Authors:
Rafid Mahmood,
Aaron Babier,
Andrea McNiven,
Adam Diamant,
Timothy C. Y. Chan
Abstract:
Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves predicting desirable treatment plans before they are then corrected to deliverable ones. We propose a generative adversarial network (GAN) approach for predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dim…
▽ More
Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves predicting desirable treatment plans before they are then corrected to deliverable ones. We propose a generative adversarial network (GAN) approach for predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dimensional representations of the plan. Experiments on a dataset of oropharyngeal cancer patients show that our approach significantly outperforms previous methods on several clinical satisfaction criteria and similarity metrics.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
Learning to Optimize Contextually Constrained Problems for Real-Time Decision-Generation
Authors:
Aaron Babier,
Timothy C. Y. Chan,
Adam Diamant,
Rafid Mahmood
Abstract:
The topic of learning to solve optimization problems has received interest from both the operations research and machine learning communities. In this work, we combine techniques from both fields to address the problem of learning to generate decisions to instances of continuous optimization problems where the feasible set varies with contextual features. We propose a novel framework for training…
▽ More
The topic of learning to solve optimization problems has received interest from both the operations research and machine learning communities. In this work, we combine techniques from both fields to address the problem of learning to generate decisions to instances of continuous optimization problems where the feasible set varies with contextual features. We propose a novel framework for training a generative model to estimate optimal decisions by combining interior point methods and adversarial learning, which we further embed within an data generation algorithm. Decisions generated by our model satisfy in-sample and out-of-sample optimality guarantees. Finally, we investigate case studies in portfolio optimization and personalized treatment design, demonstrating that our approach yields advantages over predict-then-optimize and supervised deep learning techniques, respectively.
△ Less
Submitted 21 April, 2022; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Setting up a Reinforcement Learning Task with a Real-World Robot
Authors:
A. Rupam Mahmood,
Dmytro Korenkevych,
Brent J. Komer,
James Bergstra
Abstract:
Reinforcement learning is a promising approach to develo** hard-to-engineer adaptive solutions for complex and diverse robotic tasks. However, learning with real-world robots is often unreliable and difficult, which resulted in their low adoption in reinforcement learning research. This difficulty is worsened by the lack of guidelines for setting up learning tasks with robots. In this work, we d…
▽ More
Reinforcement learning is a promising approach to develo** hard-to-engineer adaptive solutions for complex and diverse robotic tasks. However, learning with real-world robots is often unreliable and difficult, which resulted in their low adoption in reinforcement learning research. This difficulty is worsened by the lack of guidelines for setting up learning tasks with robots. In this work, we develop a learning task with a UR5 robotic arm to bring to light some key elements of a task setup and study their contributions to the challenges with robots. We find that learning performance can be highly sensitive to the setup, and thus oversights and omissions in setup details can make effective learning, reproducibility, and fair comparison hard. Our study suggests some mitigating steps to help future experimenters avoid difficulties and pitfalls. We show that highly reliable and repeatable experiments can be performed in our setup, indicating the possibility of reinforcement learning research extensively based on real-world robots.
△ Less
Submitted 19 March, 2018;
originally announced March 2018.
-
An Empirical Evaluation of True Online TD(λ)
Authors:
Harm van Seijen,
A. Rupam Mahmood,
Patrick M. Pilarski,
Richard S. Sutton
Abstract:
The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test.…
▽ More
The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test. Specifically, we compare the performance of true online TD(λ) with that of TD(λ) on challenging examples, random Markov reward processes, and a real-world myoelectric prosthetic arm. We use linear function approximation with tabular, binary, and non-binary features. We assess the algorithms along three dimensions: computational cost, learning speed, and ease of use. Our results confirm the strength of true online TD(λ): 1) for sparse feature vectors, the computational overhead with respect to TD(λ) is minimal; for non-sparse features the computation time is at most twice that of TD(λ), 2) across all domains/representations the learning speed of true online TD(λ) is often better, but never worse than that of TD(λ), and 3) true online TD(λ) is easier to use, because it does not require choosing between trace types, and it is generally more stable with respect to the step-size. Overall, our results suggest that true online TD(λ) should be the first choice when looking for an efficient, general-purpose TD method.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.