Skip to main content

Showing 1–20 of 20 results for author: Touati, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13097  [pdf, other

    cs.LG cs.AI

    Simple Ingredients for Offline Reinforcement Learning

    Authors: Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann Ollivier, Ahmed Touati

    Abstract: Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  2. arXiv:2311.02013  [pdf, other

    cs.LG cs.AI cs.RO

    SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

    Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

    Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for develo** generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to… ▽ More

    Submitted 28 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Published at International Conference of Learning Representations (ICLR) 2024. 26 pages

  3. arXiv:2309.03710  [pdf, other

    cs.LG

    A State Representation for Diminishing Rewards

    Authors: Ted Moskovitz, Samo Hromadka, Ahmed Touati, Diana Borsa, Maneesh Sahani

    Abstract: A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function.… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  4. arXiv:2210.13083  [pdf, other

    cs.LG

    Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

    Authors: Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta

    Abstract: We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-action pair exactly), it has been recently shown that representations with certain spectral properties (called HLS) may be more effective for the explorat… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted at Neurips 2022

  5. arXiv:2209.14935  [pdf, other

    cs.LG

    Does Zero-Shot Reinforcement Learning Exist?

    Authors: Ahmed Touati, Jérémy Rapin, Yann Ollivier

    Abstract: A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase. This marks a shift from the reward-centric RL paradigm towards "controllable" agents that can follow arbitrary instructions in an environment. Current RL agents can solve families of related tasks at best, or require pla… ▽ More

    Submitted 1 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Journal ref: International Conference on Learning Representations, 2023

  6. arXiv:2103.07945  [pdf, other

    cs.LG cs.AI math.OC

    Learning One Representation to Optimize All Rewards

    Authors: Ahmed Touati, Yann Ollivier

    Abstract: We introduce the forward-backward (FB) representation of the dynamics of a reward-free Markov decision process. It provides explicit near-optimal policies for any reward specified a posteriori. During an unsupervised phase, we use reward-free interactions with the environment to learn two representations via off-the-shelf deep learning methods and temporal difference (TD) learning. In the test pha… ▽ More

    Submitted 11 October, 2021; v1 submitted 14 March, 2021; originally announced March 2021.

  7. arXiv:2010.12870  [pdf, ps, other

    cs.LG stat.ML

    Efficient Learning in Non-Stationary Linear Markov Decision Processes

    Authors: Ahmed Touati, Pascal Vincent

    Abstract: We study episodic reinforcement learning in non-stationary linear (a.k.a. low-rank) Markov Decision Processes (MDPs), i.e, both the reward and transition kernel are linear with respect to a given feature map and are allowed to evolve either slowly or abruptly over time. For this problem setting, we propose OPT-WLSVI an optimistic model-free algorithm based on weighted least squares value iteration… ▽ More

    Submitted 27 December, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

  8. arXiv:2010.03744  [pdf, other

    cs.LG cs.AI stat.ML

    Maximum Reward Formulation In Reinforcement Learning

    Authors: Sai Krishna Gottipati, Yashaswi Pathak, Rohan Nuttall, Sahir, Raviteja Chunduru, Ahmed Touati, Sriram Ganapathi Subramanian, Matthew E. Taylor, Sarath Chandar

    Abstract: Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not… ▽ More

    Submitted 18 December, 2023; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: 14 pages, 5 figures Update based on reviewer feedback

  9. arXiv:2007.03749  [pdf, ps, other

    cs.LG stat.ML

    Sharp Analysis of Smoothed Bellman Error Embedding

    Authors: Ahmed Touati, Pascal Vincent

    Abstract: The \textit{Smoothed Bellman Error Embedding} algorithm~\citep{dai2018sbeed}, known as SBEED, was proposed as a provably convergent reinforcement learning algorithm with general nonlinear function approximation. It has been successfully implemented with neural networks and achieved strong empirical results. In this work, we study the theoretical behavior of SBEED in batch-mode reinforcement learni… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: Accepted at the ICML 2020 Workshop on Theoretical Foundations of Reinforcement Learning

  10. arXiv:2007.02786  [pdf, other

    cs.LG stat.ML

    TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?

    Authors: Joshua Romoff, Peter Henderson, David Kanaa, Emmanuel Bengio, Ahmed Touati, Pierre-Luc Bacon, Joelle Pineau

    Abstract: We investigate whether Jacobi preconditioning, accounting for the bootstrap term in temporal difference (TD) learning, can help boost performance of adaptive optimizers. Our method, TDprop, computes a per parameter learning rate based on the diagonal preconditioning of the TD update rule. We show how this can be used in both $n$-step returns and TD($λ$). Our theoretical findings demonstrate that i… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Presented at the Theoretical Foundations of Reinforcement Learning workshop at ICML 2020

  11. arXiv:2003.04108  [pdf, other

    cs.LG stat.ML

    Stable Policy Optimization via Off-Policy Divergence Regularization

    Authors: Ahmed Touati, Amy Zhang, Joelle Pineau, Pascal Vincent

    Abstract: Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) are among the most successful policy gradient approaches in deep reinforcement learning (RL). While these methods achieve state-of-the-art performance across a wide range of challenging tasks, there is room for improvement in the stabilization of the policy learning and how the off-policy data are used. In this paper we… ▽ More

    Submitted 19 June, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Journal ref: Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 2020

  12. arXiv:2003.04069  [pdf, other

    cs.LG stat.ML

    Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

    Authors: Ahmed Touati, Adrien Ali Taiga, Marc G. Bellemare

    Abstract: Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity b… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

  13. arXiv:1906.04282  [pdf, other

    cs.LG stat.ML

    Stochastic Neural Network with Kronecker Flow

    Authors: Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

    Abstract: Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this w… ▽ More

    Submitted 13 February, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 23rdInternational Conference on ArtificialIntelligence and Statistics (AISTATS) 2020

  14. arXiv:1906.03704  [pdf, other

    cs.LG stat.ML

    SVRG for Policy Evaluation with Fewer Gradient Evaluations

    Authors: Zilun Peng, Ahmed Touati, Pascal Vincent, Doina Precup

    Abstract: Stochastic variance-reduced gradient (SVRG) is an optimization method originally designed for tackling machine learning problems with a finite sum structure. SVRG was later shown to work for policy evaluation, a problem in reinforcement learning in which one aims to estimate the value function of a given policy. SVRG makes use of gradient estimates at two scales. At the slower scale, SVRG computes… ▽ More

    Submitted 19 June, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

    Comments: Short version of the paper is published in the proceedings of the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI-PRICAI2020)

  15. arXiv:1903.01599  [pdf, other

    stat.ML cs.LG

    Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

    Authors: Nan Rosemary Ke, Amanpreet Singh, Ahmed Touati, Anirudh Goyal, Yoshua Bengio, Devi Parikh, Dhruv Batra

    Abstract: In model-based reinforcement learning, the agent interleaves between model learning and planning. These two components are inextricably intertwined. If the model is not able to provide sensible long-term prediction, the executed planner would exploit model flaws, which can yield catastrophic failures. This paper focuses on building a model that reasons about the long-term future and demonstrates h… ▽ More

    Submitted 16 March, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

    Comments: To appear at ICLR 2019

  16. arXiv:1902.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Separating value functions across time-scales

    Authors: Joshua Romoff, Peter Henderson, Ahmed Touati, Emma Brunskill, Joelle Pineau, Yann Ollivier

    Abstract: In many finite horizon episodic reinforcement learning (RL) settings, it is desirable to optimize for the undiscounted return - in settings like Atari, for instance, the goal is to collect the most points while staying alive in the long run. Yet, it may be difficult (or even intractable) mathematically to learn with this target. As such, temporal discounting is often applied to optimize over a sho… ▽ More

    Submitted 24 May, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

    Comments: Full version accepted to ICML 2019. Extended abstract also to be presented at RLDM 2019

  17. arXiv:1806.02315  [pdf, other

    cs.LG stat.ML

    Randomized Value Functions via Multiplicative Normalizing Flows

    Authors: Ahmed Touati, Harsh Satija, Joshua Romoff, Joelle Pineau, Pascal Vincent

    Abstract: Randomized value functions offer a promising approach towards the challenge of efficient exploration in complex environments with high dimensional state and action spaces. Unlike traditional point estimate methods, randomized value functions maintain a posterior distribution over action-space values. This prevents the agent's behavior policy from prematurely exploiting early estimates and falling… ▽ More

    Submitted 28 June, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Journal ref: UAI 2019: Conference on Uncertainty in Artificial Intelligence 2019

  18. arXiv:1710.02248  [pdf, other

    cs.LG cs.AI stat.ML

    Learnable Explicit Density for Continuous Latent Space and Variational Inference

    Authors: Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

    Abstract: In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF)… ▽ More

    Submitted 5 October, 2017; originally announced October 2017.

    Comments: 2 figures, 5 pages, submitted to ICML Principled Approaches to Deep Learning workshop

  19. arXiv:1708.02511  [pdf, other

    cs.LG stat.ML

    Parametric Adversarial Divergences are Good Losses for Generative Modeling

    Authors: Gabriel Huang, Hugo Berard, Ahmed Touati, Gauthier Gidel, Pascal Vincent, Simon Lacoste-Julien

    Abstract: Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption. In this position paper, we argue that despite being "non-optimal", p… ▽ More

    Submitted 21 October, 2021; v1 submitted 8 August, 2017; originally announced August 2017.

  20. arXiv:1705.09322  [pdf, other

    cs.LG

    Convergent Tree Backup and Retrace with Function Approximation

    Authors: Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

    Abstract: Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrap** in a way that leads to both stable and efficient algorithms. In this work, we show that the \textsc{Tre… ▽ More

    Submitted 22 October, 2018; v1 submitted 25 May, 2017; originally announced May 2017.

    Journal ref: ICML 2018, Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4955-4964, 2018