Skip to main content

Showing 1–21 of 21 results for author: Mazoure, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07904  [pdf, other

    cs.LG

    Grounding Multimodal Large Language Models in Actions

    Authors: Andrew Szot, Bogdan Mazoure, Harsh Agrawal, Devon Hjelm, Zsolt Kira, Alexander Toshev

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2310.17722  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Models as Generalizable Policies for Embodied Tasks

    Authors: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott, Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev

    Abstract: We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLaRP to see and… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  3. arXiv:2306.07290  [pdf, other

    cs.LG cs.AI

    Value function estimation using conditional diffusion models for control

    Authors: Bogdan Mazoure, Walter Talbott, Miguel Angel Bautista, Devon Hjelm, Alexander Toshev, Josh Susskind

    Abstract: A fairly reliable trend in deep reinforcement learning is that the performance scales with the number of parameters, provided a complimentary scaling in amount of training data. As the appetite for large models increases, it is imperative to address, sooner than later, the potential problem of running out of high-quality demonstrations. In this case, instead of collecting only new data via costly… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  4. arXiv:2304.00046  [pdf, other

    cs.LG cs.AI

    Accelerating exploration and representation learning with offline pre-training

    Authors: Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand

    Abstract: Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  5. arXiv:2211.02100  [pdf, other

    cs.LG cs.AI

    Contrastive Value Learning: Implicit Models for Simple Offline RL

    Authors: Bogdan Mazoure, Benjamin Eysenbach, Ofir Nachum, Jonathan Tompson

    Abstract: Model-based reinforcement learning (RL) methods are appealing in the offline setting because they allow an agent to reason about the consequences of actions without interacting with the environment. Prior methods learn a 1-step dynamics model, which predicts the next state given the current state and action. These models do not immediately tell the agent which actions to take, but must be integrat… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Deep Reinforcement Learning Workshop, NeurIPS 2022

  6. arXiv:2206.03923  [pdf, other

    cs.LG cs.AI

    Sequential Density Estimation via Nonlinear Continuous Weighted Finite Automata

    Authors: Tianyu Li, Bogdan Mazoure, Guillaume Rabusseau

    Abstract: Weighted finite automata (WFAs) have been widely applied in many fields. One of the classic problems for WFAs is probability distribution estimation over sequences of discrete symbols. Although WFAs have been extended to deal with continuous input data, namely continuous WFAs (CWFAs), it is still unclear how to approximate density functions over sequences of continuous random variables using WFA-b… ▽ More

    Submitted 12 December, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  7. arXiv:2203.10351  [pdf, other

    cs.LG

    The Sandbox Environment for Generalizable Agent Research (SEGAR)

    Authors: R Devon Hjelm, Bogdan Mazoure, Florian Golemo, Felipe Frujeri, Mihai Jalobeanu, Andrey Kolobov

    Abstract: A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress. While there has been notable headway, current benchmarks either do not provide suitable exposure nor intuitive control of the underlying factors, are not easy-to-implement, customizable, or extensible, or are computationally expens… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

  8. arXiv:2111.14629  [pdf, other

    cs.LG cs.AI

    Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

    Authors: Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

    Abstract: Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they st… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Offline RL workshop at NeurIPS 2021

  9. arXiv:2106.02193  [pdf, other

    cs.LG cs.AI

    Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

    Authors: Bogdan Mazoure, Ahmed M. Ahmed, Patrick MacAlpine, R Devon Hjelm, Andrey Kolobov

    Abstract: A highly desirable property of a reinforcement learning (RL) agent -- and a major difficulty for deep RL approaches -- is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training. Many promising approaches to this challenge consider RL as a process of training two functions simultaneously: a complex nonlinear enco… ▽ More

    Submitted 16 March, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: ICLR 2022

  10. arXiv:2106.00589  [pdf, other

    cs.LG

    Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning

    Authors: Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan

    Abstract: We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility. Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system. Targeting immediately measurable proxies… ▽ More

    Submitted 14 September, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

  11. arXiv:2010.04003  [pdf, other

    cs.LG cs.AI stat.ML

    A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

    Authors: Thang Doan, Mehdi Bennani, Bogdan Mazoure, Guillaume Rabusseau, Pierre Alquier

    Abstract: Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we… ▽ More

    Submitted 25 February, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Accepted to AISTATS 2021. Keywords: continual learning, catastrophic forgetting, NTK regime, orthgonal gradient descent

    Journal ref: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

  12. arXiv:2006.07217  [pdf, other

    cs.LG stat.ML

    Deep Reinforcement and InfoMax Learning

    Authors: Bogdan Mazoure, Remi Tachet des Combes, Thang Doan, Philip Bachman, R Devon Hjelm

    Abstract: We begin with the hypothesis that a model-free agent whose representations are predictive of properties of future states (beyond expected rewards) will be more capable of solving and adapting to new RL problems. To test that hypothesis, we introduce an objective based on Deep InfoMax (DIM) which trains the agent to predict the future by maximizing the mutual information between its internal repres… ▽ More

    Submitted 16 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  13. arXiv:2002.02863  [pdf, other

    cs.LG stat.ML

    Representation of Reinforcement Learning Policies in Reproducing Kernel Hilbert Spaces

    Authors: Bogdan Mazoure, Thang Doan, Tianyu Li, Vladimir Makarenkov, Joelle Pineau, Doina Precup, Guillaume Rabusseau

    Abstract: We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box mode… ▽ More

    Submitted 15 October, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

  14. arXiv:1911.05010  [pdf, other

    cs.AI cs.LG

    Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

    Authors: Tianyu Li, Bogdan Mazoure, Doina Precup, Guillaume Rabusseau

    Abstract: Learning and planning in partially-observable domains is one of the most difficult problems in reinforcement learning. Traditional methods consider these two problems as independent, resulting in a classical two-stage paradigm: first learn the environment dynamics and then plan accordingly. This approach, however, disconnects the two problems and can consequently lead to algorithms that are sample… ▽ More

    Submitted 21 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

  15. arXiv:1909.07543  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning

    Authors: Thang Doan, Bogdan Mazoure, Moloud Abdar, Audrey Durand, Joelle Pineau, R Devon Hjelm

    Abstract: Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverag… ▽ More

    Submitted 9 July, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

  16. arXiv:1906.02719  [pdf, other

    stat.ML cs.LG

    Learning Gaussian Graphical Models with Ordered Weighted L1 Regularization

    Authors: Cody Mazza-Anthony, Bogdan Mazoure, Mark Coates

    Abstract: We address the task of identifying densely connected subsets of multivariate Gaussian random variables within a graphical model framework. We propose two novel estimators based on the Ordered Weighted $\ell_1$ (OWL) norm: 1) The Graphical OWL (GOWL) is a penalized likelihood method that applies the OWL norm to the lower triangle components of the precision matrix. 2) The column-by-column Graphical… ▽ More

    Submitted 19 November, 2020; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Published in IEEE Transactions on Signal Processing

  17. arXiv:1905.06893  [pdf, other

    cs.LG stat.ML

    Leveraging exploration in off-policy algorithms via normalizing flows

    Authors: Bogdan Mazoure, Thang Doan, Audrey Durand, R Devon Hjelm, Joelle Pineau

    Abstract: The ability to discover approximately optimal policies in domains with sparse rewards is crucial to applying reinforcement learning (RL) in many real-world scenarios. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been proposed to maintain the high exploration rate necessary to find high performing and generalizable policies. Soft actor-critic(SAC) is a… ▽ More

    Submitted 24 September, 2019; v1 submitted 16 May, 2019; originally announced May 2019.

    Comments: Accepted to 3rd Conference on Robot Learning (CoRL 2019); Keywords: Exploration, soft actor-critic, normalizing flow, off-policy; maximum entropy, reinforcement learning; deceptive reward; sparse reward; inverse autoregressive flow

  18. arXiv:1902.00570  [pdf, other

    cs.HC cs.CL eess.AS

    Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed

    Authors: Atta Norouzian, Bogdan Mazoure, Dermot Connolly, Daniel Willett

    Abstract: Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first "wake-up" the VA by saying a particular word/phrase every time he or she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the… ▽ More

    Submitted 1 February, 2019; originally announced February 2019.

    Comments: Accpeted for presentation at ICASSP2019

  19. arXiv:1808.00020  [pdf, other

    cs.LG stat.ML

    On-line Adaptative Curriculum Learning for GANs

    Authors: Thang Doan, Joao Monteiro, Isabela Albuquerque, Bogdan Mazoure, Audrey Durand, Joelle Pineau, R Devon Hjelm

    Abstract: Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen… ▽ More

    Submitted 11 March, 2019; v1 submitted 31 July, 2018; originally announced August 2018.

    Comments: Accepted to the Thirty-Third AAAI Conference On Artificial Intelligence, 2019 (Added 128x128 CelebA samples to the end of the appendix)

    Journal ref: Proceedings of 33rd AAAI Conference on Artificial Intelligence (AAAI 2019)

  20. arXiv:1805.04874  [pdf, other

    stat.ML cs.LG

    GAN Q-learning

    Authors: Thang Doan, Bogdan Mazoure, Clare Lyle

    Abstract: Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adve… ▽ More

    Submitted 20 July, 2018; v1 submitted 13 May, 2018; originally announced May 2018.

  21. arXiv:1711.04345  [pdf, other

    stat.ML cs.LG

    Alpha-Divergences in Variational Dropout

    Authors: Bogdan Mazoure, Riashat Islam

    Abstract: We investigate the use of alternative divergences to Kullback-Leibler (KL) in variational inference(VI), based on the Variational Dropout \cite{kingma2015}. Stochastic gradient variational Bayes (SGVB) \cite{aevb} is a general framework for estimating the evidence lower bound (ELBO) in Variational Bayes. In this work, we extend the SGVB estimator with using Alpha-Divergences, which are alternative… ▽ More

    Submitted 12 November, 2017; originally announced November 2017.

    Comments: Bogdan Mazoure and Riashat Islam contributed equally