Skip to main content

Showing 1–30 of 30 results for author: O'Donoghue, B

.
  1. arXiv:2311.13294  [pdf, other

    cs.LG cs.AI

    Probabilistic Inference in Reinforcement Learning Done Right

    Authors: Jean Tarbouriech, Tor Lattimore, Brendan O'Donoghue

    Abstract: A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic inference on a graphical model of the Markov decision process (MDP). The core object of study is the probability of each state-action pair being visited under the optimal policy. Previous approaches to approximate this quantity can be arbitrarily poor, leading to algorithms that do not implement genuine statist… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023

  2. arXiv:2302.09339  [pdf, other

    cs.LG cs.AI

    Efficient Exploration via Epistemic-Risk-Seeking Policy Optimization

    Authors: Brendan O'Donoghue

    Abstract: Exploration remains a key challenge in deep reinforcement learning (RL). Optimism in the face of uncertainty is a well-known heuristic with theoretical guarantees in the tabular setting, but how best to translate the principle to deep reinforcement learning, which involves online stochastic gradients and deep network function approximators, is not fully understood. In this paper we propose a new,… ▽ More

    Submitted 4 June, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

  3. arXiv:2302.01275  [pdf, other

    cs.LG

    ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

    Authors: Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

    Abstract: In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate… ▽ More

    Submitted 5 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  4. arXiv:2301.03236  [pdf, other

    cs.LG cs.AI math.OC

    Optimistic Meta-Gradients

    Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh

    Abstract: We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting. While a meta-learned update rule can yield faster convergence up to constant factor, it is not sufficient fo… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  5. arXiv:2212.14530  [pdf, other

    cs.AI cs.LG

    POMRL: No-Regret Learning-to-Plan with Increasing Horizons

    Authors: Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

    Abstract: We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying struc… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 24 pages, 6 figures

  6. arXiv:2210.12160  [pdf, ps, other

    cs.LG cs.AI math.OC

    On the connection between Bregman divergence and value in regularized Markov decision processes

    Authors: Brendan O'Donoghue

    Abstract: In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

    Submitted 6 November, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

  7. arXiv:2110.15688  [pdf, other

    stat.ML cs.LG

    Variational Bayesian Optimistic Sampling

    Authors: Brendan O'Donoghue, Tor Lattimore

    Abstract: We consider online sequential decision problems where an agent must balance exploration and exploitation. We derive a set of Bayesian `optimistic' policies which, in the stochastic multi-armed bandit case, includes the Thompson sampling policy. We provide a new analysis showing that any algorithm producing policies in the optimistic set enjoys $\tilde O(\sqrt{AT})$ Bayesian regret for a problem wi… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

  8. arXiv:2110.04629  [pdf, other

    cs.LG cs.AI stat.ML

    The Neural Testbed: Evaluating Joint Predictions

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

    Abstract: Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  9. arXiv:2106.04756  [pdf, other

    math.OC

    Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient

    Authors: David Applegate, Mateo Díaz, Oliver Hinder, Haihao Lu, Miles Lubin, Brendan O'Donoghue, Warren Schudy

    Abstract: We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), t… ▽ More

    Submitted 7 January, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  10. arXiv:2106.00669  [pdf, other

    cs.AI cs.LG stat.ML

    Discovering Diverse Nearly Optimal Policies with Successor Features

    Authors: Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while ass… ▽ More

    Submitted 4 January, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

  11. arXiv:2106.00661  [pdf, other

    cs.AI cs.LG stat.ML

    Reward is enough for convex MDPs

    Authors: Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

    Abstract: Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that t… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

  12. arXiv:2102.04323  [pdf, other

    cs.AI cs.LG

    Discovering a set of policies for the worst case reward

    Authors: Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

    Abstract: We study the problem of how to construct a set of policies that can be composed together to solve a collection of reinforcement learning tasks. Each task is a different reward function defined as a linear combination of known features. We consider a specific class of policy compositions which we call set improving policies (SIPs): given a set of policies and a set of tasks, a SIP is any compositio… ▽ More

    Submitted 10 December, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  13. arXiv:2012.13349  [pdf, other

    math.OC cs.AI cs.DM cs.LG cs.NE

    Solving Mixed Integer Programs Using Neural Networks

    Authors: Vinod Nair, Sergey Bartunov, Felix Gimeno, Ingrid von Glehn, Pawel Lichocki, Ivan Lobov, Brendan O'Donoghue, Nicolas Sonnerat, Christian Tjandraatmadja, Pengming Wang, Ravichandra Addanki, Tharindi Hapuarachchi, Thomas Keck, James Keeling, Pushmeet Kohli, Ira Ktena, Yujia Li, Oriol Vinyals, Yori Zwols

    Abstract: Mixed Integer Programming (MIP) solvers rely on an array of sophisticated heuristics developed with decades of research to solve large-scale MIP instances encountered in practice. Machine learning offers to automatically construct better heuristics from data by exploiting shared structure among instances in the data. This paper applies learning to the two key sub-tasks of a MIP solver, generating… ▽ More

    Submitted 29 July, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

  14. arXiv:2010.11364  [pdf, ps, other

    cs.LG math.OC

    Sample Efficient Reinforcement Learning with REINFORCE

    Authors: Junzi Zhang, Jongho Kim, Brendan O'Donoghue, Stephen Boyd

    Abstract: Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their ap… ▽ More

    Submitted 24 December, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: Accepted to AAAI 2021. Fixed typos in constants and enriched the literature review

  15. arXiv:2006.05145  [pdf, other

    cs.LG stat.CO stat.ML

    Matrix games with bandit feedback

    Authors: Brendan O'Donoghue, Tor Lattimore, Ian Osband

    Abstract: We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve lo… ▽ More

    Submitted 12 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  16. arXiv:2004.02177  [pdf, other

    math.OC

    Operator splitting for a homogeneous embedding of the linear complementarity problem

    Authors: Brendan O'Donoghue

    Abstract: The linear complementarity problem (LCP) is a general set membership problem that includes quadratic cone programming as a special case. In this work we consider a homogeneous embedding of the LCP, which encodes both the optimality conditions of the original LCP as well as certificates of infeasibility. The resulting problem can be expressed as a monotone inclusion problem involving the sum of two… ▽ More

    Submitted 12 June, 2021; v1 submitted 5 April, 2020; originally announced April 2020.

  17. arXiv:2001.00805  [pdf, other

    cs.LG cs.AI

    Making Sense of Reinforcement Learning and Probabilistic Inference

    Authors: Brendan O'Donoghue, Ian Osband, Catalin Ionescu

    Abstract: Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience. A recent line of research casts `RL as inference' and suggests a particular framework to generalize the RL problem as probabilistic inference. Our paper surfaces a key shortcoming in that approach, and clarifies the sense in whic… ▽ More

    Submitted 4 November, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

    Comments: ICLR 2020

  18. arXiv:1906.02608  [pdf, other

    math.OC

    Hamiltonian descent for composite objectives

    Authors: Brendan O'Donoghue, Chris J. Maddison

    Abstract: In optimization the duality gap between the primal and the dual problems is a measure of the suboptimality of any primal-dual point. In classical mechanics the equations of motion of a system can be derived from the Hamiltonian function, which is a quantity that describes the total energy of the system. In this paper we consider a convex optimization problem consisting of the sum of two convex fun… ▽ More

    Submitted 17 November, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

  19. arXiv:1902.09592  [pdf, other

    cs.LG stat.ML

    Verification of Non-Linear Specifications for Neural Networks

    Authors: Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we extend verification algorithms to be able to certify richer properties of neural networks. To do this we introduce the class of convex-relaxable specifications, which… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: ICLR conference paper

  20. arXiv:1811.09300  [pdf, other

    cs.NE cs.CR cs.LG

    Strength in Numbers: Trading-off Robustness and Computation via Adversarially-Trained Ensembles

    Authors: Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli

    Abstract: While deep learning has led to remarkable results on a number of challenging problems, researchers have discovered a vulnerability of neural networks in adversarial settings, where small but carefully chosen perturbations to the input can make the models produce extremely inaccurate outputs. This makes these models particularly unsuitable for safety-critical application domains (e.g. self-driving… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

    Comments: 12 pages

  21. arXiv:1809.05042  [pdf, other

    math.OC cs.LG stat.ML

    Hamiltonian Descent Methods

    Authors: Chris J. Maddison, Daniel Paulin, Yee Whye Teh, Brendan O'Donoghue, Arnaud Doucet

    Abstract: We propose a family of optimization methods that achieve linear convergence using first-order gradient information and constant step sizes on a class of convex functions much larger than the smooth and strongly convex ones. This larger class includes functions whose second derivatives may be singular or unbounded at their minima. Our methods are discretizations of conformal Hamiltonian dynamics, w… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

  22. arXiv:1808.03971  [pdf, other

    math.OC

    Globally Convergent Type-I Anderson Acceleration for Non-Smooth Fixed-Point Iterations

    Authors: Junzi Zhang, Brendan O'Donoghue, Stephen Boyd

    Abstract: We consider the application of the type-I Anderson acceleration to solving general non-smooth fixed-point problems. By interleaving with safe-guarding steps, and employing a Powell-type regularization and a re-start checking for strong linear independence of the updates, we propose the first globally convergent variant of Anderson acceleration assuming only that the fixed-point iteration is non-ex… ▽ More

    Submitted 12 August, 2018; originally announced August 2018.

    Comments: 47 pages

  23. arXiv:1807.09647  [pdf, other

    cs.LG cs.AI stat.ML

    Variational Bayesian Reinforcement Learning with Regret Bounds

    Authors: Brendan O'Donoghue

    Abstract: In reinforcement learning the Q-values summarize the expected future rewards that the agent will attain. However, they cannot capture the epistemic uncertainty about those rewards. In this work we derive a new Bellman operator with associated fixed point we call the `knowledge values'. These K-values compress both the expected future rewards and the epistemic uncertainty into a single value, so th… ▽ More

    Submitted 6 December, 2022; v1 submitted 25 July, 2018; originally announced July 2018.

  24. arXiv:1805.10265  [pdf, other

    cs.LG stat.ML

    Training verified learners with learned verifiers

    Authors: Krishnamurthy Dvijotham, Sven Gowal, Robert Stanforth, Relja Arandjelovic, Brendan O'Donoghue, Jonathan Uesato, Pushmeet Kohli

    Abstract: This paper proposes a new algorithmic framework, predictor-verifier training, to train neural networks that are verifiable, i.e., networks that provably satisfy some desired input-output properties. The key idea is to simultaneously train two networks: a predictor network that performs the task at hand,e.g., predicting labels given inputs, and a verifier network that computes a bound on how well t… ▽ More

    Submitted 29 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

  25. arXiv:1802.05666  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

    Authors: Jonathan Uesato, Brendan O'Donoghue, Aaron van den Oord, Pushmeet Kohli

    Abstract: This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate 'adversarial risk' as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optim… ▽ More

    Submitted 12 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

  26. arXiv:1709.05380  [pdf, other

    cs.AI cs.LG math.OC stat.ML

    The Uncertainty Bellman Equation and Exploration

    Authors: Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

    Abstract: We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-s… ▽ More

    Submitted 22 October, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

  27. arXiv:1611.01626  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Combining policy gradient and Q-learning

    Authors: Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

    Abstract: Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting. However, vanilla online variants are on-policy only and not able to take advantage of off-policy data. In this paper we describe a new technique that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer. This is motivated by making a connection between the f… ▽ More

    Submitted 7 April, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

  28. Large-Scale Convex Optimization for Dense Wireless Cooperative Networks

    Authors: Yuanming Shi, Jun Zhang, Brendan O'Donoghue, Khaled B. Letaief

    Abstract: Convex optimization is a powerful tool for resource allocation and signal processing in wireless networks. As the network density is expected to drastically increase in order to accommodate the exponentially growing mobile data traffic, performance optimization problems are entering a new era characterized by a high dimension and/or a large number of constraints, which poses significant design and… ▽ More

    Submitted 2 June, 2015; originally announced June 2015.

    Comments: to appear in IEEE Trans. Signal Process., 2015. Simulation code is available at https://github.com/SHIYUANMING/large-scale-convex-optimization

  29. arXiv:1312.3039  [pdf, ps, other

    math.OC

    Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

    Authors: Brendan O'Donoghue, Eric Chu, Neal Parikh, Stephen Boyd

    Abstract: We introduce a first order method for solving very large convex cone programs. The method uses an operator splitting method, the alternating directions method of multipliers, to solve the homogeneous self-dual embedding, an equivalent feasibility problem involving finding a nonzero point in the intersection of a subspace and a cone. This approach has several favorable properties. Compared to inter… ▽ More

    Submitted 25 July, 2016; v1 submitted 11 December, 2013; originally announced December 2013.

    Comments: 23 pages, no figures

    Journal ref: Journal of Optimization Theory and Applications, 169(3):1042-1068, June 2016

  30. arXiv:1204.3982  [pdf, ps, other

    math.OC

    Adaptive Restart for Accelerated Gradient Schemes

    Authors: Brendan O'Donoghue, Emmanuel Candes

    Abstract: In this paper we demonstrate a simple heuristic adaptive restart technique that can dramatically improve the convergence rate of accelerated gradient schemes. The analysis of the technique relies on the observation that these schemes exhibit two modes of behavior depending on how much momentum is applied. In what we refer to as the 'high momentum' regime the iterates generated by an accelerated gr… ▽ More

    Submitted 18 April, 2012; originally announced April 2012.

    Comments: 17 pages, 7 figures