Skip to main content

Showing 1–6 of 6 results for author: Mahmood, R

Searching in archive stat. Search in all archives.
.
  1. arXiv:1903.11524  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Autoregressive Policies for Continuous Control Deep Reinforcement Learning

    Authors: Dmytro Korenkevych, A. Rupam Mahmood, Gautham Vasan, James Bergstra

    Abstract: Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gauss… ▽ More

    Submitted 27 March, 2019; originally announced March 2019.

    Comments: Submitted to 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). Video: https://youtu.be/NCpyXBNqNmw Code: https://github.com/dkorenkevych/arp

  2. arXiv:1809.07731  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Benchmarking Reinforcement Learning Algorithms on Real-World Robots

    Authors: A. Rupam Mahmood, Dmytro Korenkevych, Gautham Vasan, William Ma, James Bergstra

    Abstract: Through many recent successes in simulation, model-free reinforcement learning has emerged as a promising approach to solving continuous control robotic tasks. The research community is now able to reproduce, analyze and build quickly on these results due to open source implementations of learning algorithms and simulated benchmark tasks. To carry forward these successes to real-world applications… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

    Comments: Appears in Proceedings of the Second Conference on Robot Learning (CoRL 2018). Companion video at https://youtu.be/ovDfhvjpQd8 and source code at https://github.com/kindredresearch/SenseAct

  3. arXiv:1807.06489  [pdf, other

    cs.LG physics.med-ph stat.ML

    Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks

    Authors: Rafid Mahmood, Aaron Babier, Andrea McNiven, Adam Diamant, Timothy C. Y. Chan

    Abstract: Knowledge-based planning (KBP) is an automated approach to radiation therapy treatment planning that involves predicting desirable treatment plans before they are then corrected to deliverable ones. We propose a generative adversarial network (GAN) approach for predicting desirable 3D dose distributions that eschews the previous paradigms of site-specific feature engineering and predicting low-dim… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: 15 pages. Accepted for publication in PMLR. Presented at Machine Learning for Health Care

  4. arXiv:1805.09293  [pdf, other

    cs.LG math.OC stat.ML

    Learning to Optimize Contextually Constrained Problems for Real-Time Decision-Generation

    Authors: Aaron Babier, Timothy C. Y. Chan, Adam Diamant, Rafid Mahmood

    Abstract: The topic of learning to solve optimization problems has received interest from both the operations research and machine learning communities. In this work, we combine techniques from both fields to address the problem of learning to generate decisions to instances of continuous optimization problems where the feasible set varies with contextual features. We propose a novel framework for training… ▽ More

    Submitted 21 April, 2022; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: 72 pages

  5. arXiv:1803.07067  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Setting up a Reinforcement Learning Task with a Real-World Robot

    Authors: A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, James Bergstra

    Abstract: Reinforcement learning is a promising approach to develo** hard-to-engineer adaptive solutions for complex and diverse robotic tasks. However, learning with real-world robots is often unreliable and difficult, which resulted in their low adoption in reinforcement learning research. This difficulty is worsened by the lack of guidelines for setting up learning tasks with robots. In this work, we d… ▽ More

    Submitted 19 March, 2018; originally announced March 2018.

    Comments: Submitted to 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  6. arXiv:1507.00353  [pdf, other

    cs.AI cs.LG stat.ML

    An Empirical Evaluation of True Online TD(λ)

    Authors: Harm van Seijen, A. Rupam Mahmood, Patrick M. Pilarski, Richard S. Sutton

    Abstract: The true online TD(λ) algorithm has recently been proposed (van Seijen and Sutton, 2014) as a universal replacement for the popular TD(λ) algorithm, in temporal-difference learning and reinforcement learning. True online TD(λ) has better theoretical properties than conventional TD(λ), and the expectation is that it also results in faster learning. In this paper, we put this hypothesis to the test.… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

    Comments: European Workshop on Reinforcement Learning (EWRL) 2015