Skip to main content

Showing 151–181 of 181 results for author: Precup, D

.
  1. Singular value automata and approximate minimization

    Authors: Borja Balle, Prakash Panangaden, Doina Precup

    Abstract: The present paper uses spectral theory of linear operators to construct approximately minimal realizations of weighted languages. Our new contributions are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices based on their representation in terms of weighted automata, (ii) a new canonical form for weighted automata arising from the SVD of its corresponding Hankel matrix and… ▽ More

    Submitted 27 May, 2019; v1 submitted 16 November, 2017; originally announced November 2017.

    Journal ref: Math. Struct. Comp. Sci. 29 (2019) 1444-1478

  2. arXiv:1711.03817  [pdf, other

    cs.AI

    Learning with Options that Terminate Off-Policy

    Authors: Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

    Abstract: A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal… ▽ More

    Submitted 2 December, 2017; v1 submitted 10 November, 2017; originally announced November 2017.

    Comments: AAAI 2018

  3. arXiv:1709.06683  [pdf, other

    cs.LG

    OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

    Authors: Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

    Abstract: Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories… ▽ More

    Submitted 24 November, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: Accepted to the Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018

  4. arXiv:1709.06560  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement Learning that Matters

    Authors: Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, David Meger

    Abstract: In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determ… ▽ More

    Submitted 29 January, 2019; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: Accepted to the Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018

  5. arXiv:1709.04571  [pdf, other

    cs.AI

    When Waiting is not an Option : Learning Options with a Deliberation Cost

    Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

    Abstract: Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

  6. arXiv:1709.04380  [pdf, other

    cs.FL cs.AI cs.CL cs.LG

    Neural Network Based Nonlinear Weighted Finite Automata

    Authors: Tianyu Li, Guillaume Rabusseau, Doina Precup

    Abstract: Weighted finite automata (WFA) can expressively model functions defined over strings but are inherently linear models. Given the recent successes of nonlinear models in machine learning, it is natural to wonder whether ex-tending WFA to the nonlinear setting would be beneficial. In this paper, we propose a novel model of neural network based nonlinearWFA model (NL-WFA) along with a learning algori… ▽ More

    Submitted 21 December, 2017; v1 submitted 13 September, 2017; originally announced September 2017.

    Comments: AISTATS 2018

  7. arXiv:1708.04133  [pdf, other

    cs.LG

    Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control

    Authors: Riashat Islam, Peter Henderson, Maziar Gomrokchi, Doina Precup

    Abstract: Policy gradient methods in reinforcement learning have become increasingly prevalent for state-of-the-art performance in continuous control tasks. Novel methods typically benchmark against a few key algorithms such as deep deterministic policy gradients and trust region policy optimization. As such, it is important to present and use consistent baselines experiments. However, this can be difficult… ▽ More

    Submitted 10 August, 2017; originally announced August 2017.

    Comments: Accepted to Reproducibility in Machine Learning Workshop, ICML'17

  8. arXiv:1708.01289  [pdf, other

    cs.LG cs.AI stat.ML

    Independently Controllable Factors

    Authors: Valentin Thomas, Jules Pondard, Emmanuel Bengio, Marc Sarfati, Philippe Beaudoin, Marie-Jean Meurs, Joelle Pineau, Doina Precup, Yoshua Bengio

    Abstract: It has been postulated that a good representation is one that disentangles the underlying explanatory factors of variation. However, it remains an open question what kind of training framework could potentially achieve that. Whereas most previous work focuses on the static setting (e.g., with images), we postulate that some of the causal factors could be discovered if the learner is allowed to int… ▽ More

    Submitted 25 August, 2017; v1 submitted 3 August, 2017; originally announced August 2017.

  9. arXiv:1708.00805  [pdf, other

    cs.LG

    Variational Generative Stochastic Networks with Collaborative Sha**

    Authors: Philip Bachman, Doina Precup

    Abstract: We develop an approach to training generative models based on unrolling a variational auto-encoder into a Markov chain, and sha** the chain's trajectories using a technique inspired by recent work in Approximate Bayesian computation. We show that the global minimizer of the resulting objective is achieved when the generative model reproduces the target distribution. To allow finer control over t… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

    Comments: Old paper, from ICML 2015

  10. arXiv:1705.09322  [pdf, other

    cs.LG

    Convergent Tree Backup and Retrace with Function Approximation

    Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

    Abstract: Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrap** in a way that leads to both stable and efficient algorithms. In this work, we show that the \textsc{Tre… ▽ More

    Submitted 22 October, 2018; v1 submitted 25 May, 2017; originally announced May 2017.

    Journal ref: ICML 2018, Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4955-4964, 2018

  11. arXiv:1704.05495  [pdf, other

    cs.AI cs.LG

    Investigating Recurrence and Eligibility Traces in Deep Q-Networks

    Authors: Jean Harb, Doina Precup

    Abstract: Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight a… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.

    Comments: 8 pages, 3 figures, NIPS 2016 Deep Reinforcement Learning Workshop

  12. arXiv:1703.07718  [pdf, other

    cs.LG

    Independently Controllable Features

    Authors: Emmanuel Bengio, Valentin Thomas, Joelle Pineau, Doina Precup, Yoshua Bengio

    Abstract: Finding features that disentangle the different causes of variation in real data is a difficult task, that has nonetheless received considerable attention in static domains like natural images. Interactive environments, in which an agent can deliberately take actions, offer an opportunity to tackle this task better, because the agent can experiment with different actions and observe their effects.… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

    Comments: RLDM submission

  13. arXiv:1703.06471  [pdf, other

    cs.AI

    Multi-Timescale, Gradient Descent, Temporal Difference Learning with Linear Options

    Authors: Peeyush Kumar, Doina Precup

    Abstract: Deliberating on large or continuous state spaces have been long standing challenges in reinforcement learning. Temporal Abstraction have somewhat made this possible, but efficiently planing using temporal abstraction still remains an issue. Moreover using spatial abstractions to learn policies for various situations at once while using temporal abstraction models is an open problem. We propose her… ▽ More

    Submitted 19 March, 2017; originally announced March 2017.

  14. arXiv:1612.00916  [pdf, ps, other

    cs.AI

    A Matrix Splitting Perspective on Planning with Options

    Authors: Pierre-Luc Bacon, Doina Precup

    Abstract: We show that the Bellman operator underlying the options framework leads to a matrix splitting, an approach traditionally used to speed up convergence of iterative solvers for large linear systems of equations. Based on standard comparison theorems for matrix splittings, we then show how the asymptotic rate of convergence varies as a function of the inherent timescales of the options. This new per… ▽ More

    Submitted 10 July, 2017; v1 submitted 2 December, 2016; originally announced December 2016.

    Comments: The results presented in the previous version of this paper were found be applicable only to "gating execution" and not "call-and-return". We made this distinction clear in the text and added an extension to the call-and-return model

  15. arXiv:1609.05140  [pdf, other

    cs.AI

    The Option-Critic Architecture

    Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup

    Abstract: Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new opt… ▽ More

    Submitted 2 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: Accepted to the Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2017

  16. arXiv:1605.05416  [pdf, other

    cs.CL

    Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data

    Authors: Teng Long, Ryan Lowe, Jackie Chi Kit Cheung, Doina Precup

    Abstract: Recent work in learning vector-space embeddings for multi-relational data has focused on combining relational information derived from knowledge bases with distributional information derived from large text corpora. We propose a simple approach that leverages the descriptions of entities or phrases available in lexical resources, in conjunction with distributional semantics, in order to derive a b… ▽ More

    Submitted 17 May, 2016; originally announced May 2016.

    Comments: 6 pages. Accepted to ACL 2016 (short paper)

  17. arXiv:1603.02010  [pdf, other

    cs.LG stat.ML

    Differentially Private Policy Evaluation

    Authors: Borja Balle, Maziar Gomrokchi, Doina Precup

    Abstract: We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples.

    Submitted 7 March, 2016; originally announced March 2016.

  18. arXiv:1512.04105  [pdf, other

    cs.AI cs.LG

    Policy Gradient Methods for Off-policy Control

    Authors: Lucas Lehnert, Doina Precup

    Abstract: Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would like to… ▽ More

    Submitted 13 December, 2015; originally announced December 2015.

  19. arXiv:1511.06297  [pdf, other

    cs.LG

    Conditional Computation in Neural Networks for faster models

    Authors: Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, Doina Precup

    Abstract: Deep learning has become the state-of-art tool in many applications, but the evaluation and training of deep models can be time-consuming and computationally expensive. The conditional computation approach has been proposed to tackle this problem (Bengio et al., 2013; Davis & Arel, 2013). It operates by selectively activating only parts of the network at a time. In this paper, we use reinforcement… ▽ More

    Submitted 7 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: ICLR 2016 submission, revised

  20. arXiv:1510.08949  [pdf, other

    cs.LG

    Testing Visual Attention in Dynamic Environments

    Authors: Philip Bachman, David Krueger, Doina Precup

    Abstract: We investigate attention as the active pursuit of useful information. This contrasts with attention as a mechanism for the attenuation of irrelevant information. We also consider the role of short-term memory, whose use is critical to any model incapable of simultaneously perceiving all information on which its output depends. We present several simple synthetic tasks, which become considerably mo… ▽ More

    Submitted 29 October, 2015; originally announced October 2015.

  21. arXiv:1506.03504  [pdf, other

    cs.LG stat.ML

    Data Generation as Sequential Decision Making

    Authors: Philip Bachman, Doina Precup

    Abstract: We connect a broad class of generative models through their shared reliance on sequential decision making. Motivated by this view, we develop extensions to an existing model, and then explore the idea further in the context of data imputation -- perhaps the simplest setting in which to investigate the relation between unconditional and conditional generative modelling. We formulate data imputation… ▽ More

    Submitted 2 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: Accepted for publication at Advances in Neural Information Processing Systems (NIPS) 2015

  22. arXiv:1501.06841  [pdf, other

    cs.FL

    A Canonical Form for Weighted Automata and Applications to Approximate Minimization

    Authors: Borja Balle, Prakash Panangaden, Doina Precup

    Abstract: We study the problem of constructing approximations to a weighted automaton. Weighted finite automata (WFA) are closely related to the theory of rational series. A rational series is a function from strings to real numbers that can be computed by a finite WFA. Among others, this includes probability distributions generated by hidden Markov models and probabilistic automata. The relationship betwee… ▽ More

    Submitted 24 April, 2015; v1 submitted 27 January, 2015; originally announced January 2015.

  23. arXiv:1412.4864  [pdf, other

    stat.ML cs.LG cs.NE

    Learning with Pseudo-Ensembles

    Authors: Philip Bachman, Ouais Alsharif, Doina Precup

    Abstract: We formalize the notion of a pseudo-ensemble, a (possibly infinite) collection of child models spawned from a parent model by perturbing it according to some noise process. E.g., dropout (Hinton et. al, 2012) in a deep neural network trains a pseudo-ensemble of child subnetworks generated by randomly masking nodes in the parent network. We present a novel regularizer based on making the behavior o… ▽ More

    Submitted 15 December, 2014; originally announced December 2014.

    Comments: To appear in Advances in Neural Information Processing Systems 27 (NIPS 2014), Advances in Neural Information Processing Systems 27, Dec. 2014

  24. arXiv:1407.5358  [pdf, other

    cs.LG cs.AI stat.ML

    Practical Kernel-Based Reinforcement Learning

    Authors: André M. S. Barreto, Doina Precup, Joelle Pineau

    Abstract: Kernel-based reinforcement learning (KBRL) stands out among reinforcement learning algorithms for its strong theoretical guarantees. By casting the learning problem as a local kernel approximation, KBRL provides a way of computing a decision policy which is statistically consistent and converges to a unique solution. Unfortunately, the model constructed by KBRL grows with the number of sample tran… ▽ More

    Submitted 20 July, 2014; originally announced July 2014.

    MSC Class: 68T05 (Primary); 93E35; 90C40; 93E20; 49L20 (Secondary) ACM Class: I.2.8; I.2.6; G.3

  25. arXiv:1407.0449  [pdf, other

    cs.LG eess.SY math.OC stat.ML

    Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

    Authors: Amir-massoud Farahmand, Doina Precup, André M. S. Barreto, Mohammad Ghavamzadeh

    Abstract: Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a… ▽ More

    Submitted 1 July, 2014; originally announced July 2014.

    MSC Class: 68T05 (Primary); 93E35; 93E20; 90C40; 49L20 (Secondary) ACM Class: I.2.6; I.2.8

  26. arXiv:1402.6028  [pdf, other

    cs.AI cs.LG

    Algorithms for multi-armed bandit problems

    Authors: Volodymyr Kuleshov, Doina Precup

    Abstract: Although many algorithms for the multi-armed bandit problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as epsilon-greedy and Boltzmann exploration outpe… ▽ More

    Submitted 24 February, 2014; originally announced February 2014.

  27. arXiv:1207.5554  [pdf, other

    cs.LG stat.ML

    Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

    Authors: Mahdi Milani Fard, Yuri Grinberg, Amir-massoud Farahmand, Joelle Pineau, Doina Precup

    Abstract: We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We… ▽ More

    Submitted 21 September, 2012; v1 submitted 23 July, 2012; originally announced July 2012.

  28. arXiv:1207.4114  [pdf

    cs.AI

    Metrics for Finite Markov Decision Processes

    Authors: Norman Ferns, Prakash Panangaden, Doina Precup

    Abstract: We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based o… ▽ More

    Submitted 11 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)

    Report number: UAI-P-2004-PG-162-169

  29. arXiv:1207.1386  [pdf

    cs.AI

    Metrics for Markov Decision Processes with Infinite State Spaces

    Authors: Norman Ferns, Prakash Panangaden, Doina Precup

    Abstract: We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task v… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-201-208

  30. arXiv:1206.6836  [pdf

    cs.AI

    Methods for computing state similarity in Markov Decision Processes

    Authors: Norman Ferns, Pablo Samuel Castro, Doina Precup, Prakash Panangaden

    Abstract: A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovic… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI2006)

    Report number: UAI-P-2006-PG-174-181

  31. arXiv:1206.6385  [pdf

    cs.LG stat.ME stat.ML

    Improved Estimation in Time Varying Models

    Authors: Doina Precup, Philip Bachman

    Abstract: Locally adapted parameterizations of a model (such as locally weighted regression) are expressive but often suffer from high variance. We describe an approach for reducing the variance, based on the idea of estimating simultaneously a transformed space for the model, as well as locally adapted parameterizations in this new space. We present a new problem formulation that captures this idea and ill… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)