Skip to main content

Showing 1–7 of 7 results for author: Roderick, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.17411  [pdf, other

    cs.LG stat.ML

    Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

    Authors: Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter

    Abstract: In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: 10 pages, 3 figures, 2 tables

  2. arXiv:2311.14885  [pdf, other

    cs.LG

    Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

    Authors: Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter

    Abstract: A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or distribution shift, between the dataset and the distribution over states and actions visited by the learned policy. This problem is exacerbated in the fully offline setting. The main approach to correct this shift has been through importance sampling, which leads to high-variance gradients. Other approaches, such as conser… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 10 pages

  3. arXiv:2011.08105  [pdf, other

    cs.LG math.OC

    Enforcing robust control guarantees within neural network policies

    Authors: Priya L. Donti, Melrose Roderick, Mahyar Fazlyab, J. Zico Kolter

    Abstract: When designing controllers for safety-critical systems, practitioners often face a challenging tradeoff between robustness and performance. While robust control methods provide rigorous guarantees on system stability under certain worst-case disturbances, they often yield simple controllers that perform poorly in the average (non-worst) case. In contrast, nonlinear control methods trained using de… ▽ More

    Submitted 28 January, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

    Comments: Code available online: https://github.com/locuslab/robust-nn-control

    Journal ref: International Conference on Learning Representations 2021

  4. arXiv:2007.03574  [pdf, other

    cs.LG cs.AI stat.ML

    Provably Safe PAC-MDP Exploration Using Analogies

    Authors: Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter

    Abstract: A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the… ▽ More

    Submitted 22 March, 2021; v1 submitted 7 July, 2020; originally announced July 2020.

    Comments: 10 pages, 3 figures, In proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

  5. arXiv:1711.07478  [pdf, other

    cs.LG cs.AI

    Implementing the Deep Q-Network

    Authors: Melrose Roderick, James MacGlashan, Stefanie Tellex

    Abstract: The Deep Q-Network proposed by Mnih et al. [2015] has become a benchmark and building point for much deep reinforcement learning research. However, replicating results for complex systems is often challenging since original scientific publications are not always able to describe in detail every important parameter setting and software engineering solution. In this paper, we present results from ou… ▽ More

    Submitted 20 November, 2017; originally announced November 2017.

  6. arXiv:1710.00459  [pdf, other

    cs.LG cs.AI

    Deep Abstract Q-Networks

    Authors: Melrose Roderick, Christopher Grimm, Stefanie Tellex

    Abstract: We examine the problem of learning and planning on high-dimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma's Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 1999… ▽ More

    Submitted 25 August, 2018; v1 submitted 1 October, 2017; originally announced October 2017.

  7. arXiv:1709.00503  [pdf, other

    stat.ML cs.AI cs.LG

    Mean Actor Critic

    Authors: Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

    Abstract: We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate rel… ▽ More

    Submitted 22 May, 2018; v1 submitted 1 September, 2017; originally announced September 2017.