Skip to main content

Showing 1–4 of 4 results for author: Spooner, T

Searching in archive stat. Search in all archives.
.
  1. arXiv:2205.07338  [pdf, other

    cs.AI cs.CC cs.LG math.PR stat.ML

    Reductive MDPs: A Perspective Beyond Temporal Horizons

    Authors: Thomas Spooner, Rui Silva, Joshua Lockhart, Jason Long, Vacslav Glukhov

    Abstract: Solving general Markov decision processes (MDPs) is a computationally hard problem. Solving finite-horizon MDPs, on the other hand, is highly tractable with well known polynomial-time algorithms. What drives this extreme disparity, and do problems exist that lie between these diametrically opposed complexities? In this paper we identify and analyse a sub-class of stochastic shortest path problems… ▽ More

    Submitted 15 May, 2022; originally announced May 2022.

    Comments: 15 pages, 10 figures, 1 algorithm

  2. arXiv:2102.10362  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

    Authors: Thomas Spooner, Nelson Vadori, Sumitra Ganesh

    Abstract: Policy gradient methods can solve complex tasks but often fail when the dimensionality of the action-space or objective multiplicity grow very large. This occurs, in part, because the variance on score-based gradient estimators scales quadratically. In this paper, we address this problem through a factor baseline which exploits independence structure encoded in a novel action-target influence netw… ▽ More

    Submitted 23 November, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021; 19 pages, 19 figures, 1 table

  3. arXiv:2007.04203  [pdf, other

    cs.LG cs.AI q-fin.CP q-fin.PM stat.ML

    A Natural Actor-Critic Algorithm with Downside Risk Constraints

    Authors: Thomas Spooner, Rahul Savani

    Abstract: Existing work on risk-sensitive reinforcement learning - both for symmetric and downside risk measures - has typically used direct Monte-Carlo estimation of policy gradients. While this approach yields unbiased gradient estimates, it also suffers from high variance and decreased sample efficiency compared to temporal-difference methods. In this paper, we study prediction and control with aversion… ▽ More

    Submitted 8 July, 2020; originally announced July 2020.

    Comments: 14 pages, 5 figures

  4. arXiv:2003.01820  [pdf, other

    q-fin.TR cs.AI cs.LG stat.ML

    Robust Market Making via Adversarial Reinforcement Learning

    Authors: Thomas Spooner, Rahul Savani

    Abstract: We show that adversarial reinforcement learning (ARL) can be used to produce market marking agents that are robust to adversarial and adaptively-chosen market conditions. To apply ARL, we turn the well-studied single-agent model of Avellaneda and Stoikov [2008] into a discrete-time zero-sum game between a market maker and adversary. The adversary acts as a proxy for other market participants that… ▽ More

    Submitted 8 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 7 pages, 3 figures; IJCAI-PRICAI '20 Conference Proceedings