Skip to main content

Showing 1–9 of 9 results for author: Mann, T A

Searching in archive cs. Search in all archives.
.
  1. arXiv:1807.09387  [pdf, other

    cs.LG stat.ML

    Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems

    Authors: Timothy A. Mann, Sven Gowal, András György, Ray Jiang, Huiyi Hu, Balaji Lakshminarayanan, Prav Srinivasan

    Abstract: Predicting delayed outcomes is an important problem in recommender systems (e.g., if customers will finish reading an ebook). We formalize the problem as an adversarial, delayed online learning problem and consider how a proxy for the delayed outcome (e.g., if customers read a third of the book in 24 hours) can help minimize regret, even though the proxy is not available when making a prediction.… ▽ More

    Submitted 15 October, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

  2. arXiv:1803.04848  [pdf, other

    cs.LG cs.AI stat.ML

    Soft-Robust Actor-Critic Policy-Gradient

    Authors: Esther Derman, Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

    Abstract: Robust Reinforcement Learning aims to derive optimal behavior that accounts for model uncertainty in dynamical systems. However, previous studies have shown that by considering the worst case scenario, robust policies can be overly conservative. Our soft-robust framework is an attempt to overcome this issue. In this paper, we present a novel Soft-Robust Actor-Critic algorithm (SR-AC). It learns an… ▽ More

    Submitted 24 October, 2018; v1 submitted 11 March, 2018; originally announced March 2018.

    Comments: UAI 2018

  3. arXiv:1803.01682  [pdf, other

    stat.ML cs.LG

    Beyond Greedy Ranking: Slate Optimization via List-CVAE

    Authors: Ray Jiang, Sven Gowal, Timothy A. Mann, Danilo J. Rezende

    Abstract: The conventional solution to the recommendation problem greedily ranks individual document candidates by prediction scores. However, this method fails to optimize the slate as a whole, and hence, often struggles to capture biases caused by the page layout and document interdepedencies. The slate recommendation problem aims to directly find the optimally ordered subset of documents (i.e. slates) th… ▽ More

    Submitted 23 February, 2019; v1 submitted 5 March, 2018; originally announced March 2018.

  4. arXiv:1802.03236  [pdf, other

    cs.AI cs.LG stat.ML

    Learning Robust Options

    Authors: Daniel J. Mankowitz, Timothy A. Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor

    Abstract: Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Polic… ▽ More

    Submitted 9 February, 2018; originally announced February 2018.

  5. arXiv:1612.09465  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Lambda Least-Squares Temporal Difference Learning

    Authors: Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

    Abstract: Temporal Difference learning or TD($λ$) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $λ$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $λ$ selection problem as a bias-variance trade-off where the solution is the value of $λ$ that leads to the smallest Mean Squared Value Error (MSVE). To… ▽ More

    Submitted 30 December, 2016; originally announced December 2016.

  6. arXiv:1602.03351  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Skills, Adaptive Partitions (ASAP)

    Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

    Abstract: We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks… ▽ More

    Submitted 7 June, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

  7. arXiv:1602.03348  [pdf, other

    cs.LG cs.AI

    Iterative Hierarchical Optimization for Misspecified Problems (IHOMP)

    Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

    Abstract: For complex, high-dimensional Markov Decision Processes (MDPs), it may be necessary to represent the policy with function approximation. A problem is misspecified whenever, the representation cannot express any policy with acceptable performance. We introduce IHOMP : an approach for solving misspecified problems. IHOMP iteratively learns a set of context specialized options and combines these opti… ▽ More

    Submitted 7 June, 2016; v1 submitted 10 February, 2016; originally announced February 2016.

    Comments: arXiv admin note: text overlap with arXiv:1506.03624

  8. arXiv:1506.03624  [pdf, other

    cs.AI

    Bootstrap** Skills

    Authors: Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

    Abstract: The monolithic approach to policy representation in Markov Decision Processes (MDPs) looks for a single policy that can be represented as a function from states to actions. For the monolithic approach to succeed (and this is not always possible), a complex feature representation is often necessary since the policy is a complex object that has to prescribe what actions to take all over the state sp… ▽ More

    Submitted 11 June, 2015; originally announced June 2015.

  9. arXiv:1504.04114  [pdf, other

    stat.ML cs.LG cs.SI

    Actively Learning to Attract Followers on Twitter

    Authors: Nir Levine, Timothy A. Mann, Shie Mannor

    Abstract: Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research has focused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers through normative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextual bandit… ▽ More

    Submitted 16 April, 2015; originally announced April 2015.