Skip to main content

Showing 1–50 of 181 results for author: Precup, D

.
  1. arXiv:2406.12241  [pdf, other

    cs.LG cs.AI

    More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

    Authors: Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

    Abstract: Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal r… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Accepted to the Reinforcement Learning Conference (RLC) 2024

  2. arXiv:2405.18751  [pdf, other

    cs.CV cs.AI

    On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

    Authors: Jordi Armengol-Estapé, Vincent Michalski, Ramnath Kumar, Pierre-Luc St-Charles, Doina Precup, Samira Ebrahimi Kahou

    Abstract: Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists o… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.16899  [pdf, other

    cs.LG cs.AI

    Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents

    Authors: Safa Alver, Ali Rahimi-Kalahroudi, Doina Precup

    Abstract: In neuroscience, one of the key behavioral tests for determining whether a subject of study exhibits model-based behavior is to study its adaptiveness to local changes in the environment. In reinforcement learning, however, recent studies have shown that modern model-based agents display poor adaptivity to such changes. The main reason for this is that modern agents are typically designed to impro… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at CoLLAs 2024

  4. arXiv:2405.07838  [pdf, other

    cs.LG cs.AI

    Adaptive Exploration for Data-Efficient General Value Function Evaluations

    Authors: Arushi Jain, Josiah P. Hanna, Doina Precup

    Abstract: General Value Functions (GVFs) (Sutton et al, 2011) are an established way to represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique pseudo-reward. Multiple GVFs can be estimated in parallel using off-policy learning from a single stream of data, often sourced from a fixed behavior policy or pre-collected dataset. This… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 20 pages, 9 figures, Under Review

  5. arXiv:2405.01616  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative Active Learning for the Search of Small-molecule Protein Binders

    Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

    Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2403.11574  [pdf, ps, other

    cs.LG

    Offline Multitask Representation Learning for Reinforcement Learning

    Authors: Haque Ishfaq, Thanh Nguyen-Tang, Songtao Feng, Raman Arora, Mengdi Wang, Ming Yin, Doina Precup

    Abstract: We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  7. arXiv:2402.10309  [pdf, other

    cs.LG

    Discrete Probabilistic Inference as Control in Multi-path Environments

    Authors: Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio

    Abstract: We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has be… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  8. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  9. arXiv:2402.06137  [pdf, other

    cs.LG cs.CR

    On the Privacy of Selection Mechanisms with Gaussian Noise

    Authors: Jonathan Lebensold, Doina Precup, Borja Balle

    Abstract: Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand,… ▽ More

    Submitted 21 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: AISTATS 2024

  10. arXiv:2402.05234  [pdf, other

    cs.LG

    QGFN: Controllable Greediness with Action Values

    Authors: Elaine Lau, Stephen Zhewen Lu, Ling Pan, Doina Precup, Emmanuel Bengio

    Abstract: Generative Flow Networks (GFlowNets; GFNs) are a family of reward/energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value… ▽ More

    Submitted 23 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Under review

  11. arXiv:2402.04764  [pdf, other

    cs.LG

    Code as Reward: Empowering Reinforcement Learning with VLMs

    Authors: David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand

    Abstract: Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  12. arXiv:2402.03675  [pdf, other

    q-bio.BM cs.AI cs.CE cs.LG

    Effective Protein-Protein Interaction Exploration with PPIretrieval

    Authors: Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng

    Abstract: Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learn… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  13. arXiv:2312.11669  [pdf, other

    cs.LG cs.AI

    Prediction and Control in Continual Reinforcement Learning

    Authors: Nishanth Anand, Doina Precup

    Abstract: Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. In this paper, we focus on value function estimation in continual reinforcement learning. We propose to decompose the value function into two components which update at different timescales: a permanent value function, which holds general knowledge tha… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Published at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  14. arXiv:2312.00886  [pdf, other

    stat.ML cs.AI cs.GT cs.LG cs.MA

    Nash Learning from Human Feedback

    Authors: Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to… ▽ More

    Submitted 11 June, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  15. arXiv:2312.00231  [pdf, other

    eess.AS

    Learning domain-invariant classifiers for infant cry sounds

    Authors: Charles C. Onu, Hemanth K. Sheetha, Arsenii Gorin, Doina Precup

    Abstract: The issue of domain shift remains a problematic phenomenon in most real-world datasets and clinical audio is no exception. In this work, we study the nature of domain shift in a clinical database of infant cry sounds acquired across different geographies. We find that though the pitches of infant cries are similarly distributed regardless of the place of birth, other characteristics introduce pecu… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  16. arXiv:2311.03583  [pdf, other

    cs.AI cs.DM cs.LG

    Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

    Authors: Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner

    Abstract: This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erdős, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at MATH AI workshop at NeurIPS 2023, First three authors contributed equally, Last two authors have equal senior contribution

  17. arXiv:2311.01990  [pdf, other

    cs.LG

    Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

    Authors: Jonathan Colaço Carr, Prakash Panangaden, Doina Precup

    Abstract: Learning from Preferential Feedback (LfPF) plays an essential role in training Large Language Models, as well as certain types of interactive learning agents. However, a substantial gap exists between the theory and application of LfPF algorithms. Current results guaranteeing the existence of optimal policies in LfPF problems assume that both the preferences and transition dynamics are determined… ▽ More

    Submitted 27 March, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: v2: replaced with accepted AISTATS 2024 version, containing a new summary figure and one extra example. Results and conclusions are unchanged

  18. arXiv:2310.19685  [pdf, other

    cs.LG q-bio.BM

    DGFN: Double Generative Flow Networks

    Authors: Elaine Lau, Nikhil Vemgal, Doina Precup, Emmanuel Bengio

    Abstract: Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from re… ▽ More

    Submitted 6 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023 Workshop

  19. arXiv:2310.09997  [pdf, other

    cs.AI cs.LG eess.SY

    Forecaster: Towards Temporally Abstract Tree-Search Planning from Pixels

    Authors: Thomas Jiralerspong, Flemming Kondrup, Doina Precup, Khimya Khetarpal

    Abstract: The ability to plan at many different levels of abstraction enables agents to envision the long-term repercussions of their decisions and thus enables sample-efficient learning. This becomes particularly beneficial in complex environments from high-dimensional state space such as pixels, where the goal is distant and the reward sparse. We introduce Forecaster, a deep hierarchical reinforcement lea… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  20. arXiv:2310.08338  [pdf

    eess.AS cs.SD q-bio.NC

    A cry for help: Early detection of brain injury in newborns

    Authors: Charles C. Onu, Samantha Latremouille, Arsenii Gorin, Junhao Wang, Innocent Udeogu, Uchenna Ekwochi, Peter O. Ubuane, Omolara A. Kehinde, Muhammad A. Salisu, Datonye Briggs, Yoshua Bengio, Doina Precup

    Abstract: Since the 1960s, neonatal clinicians have known that newborns suffering from certain neurological conditions exhibit altered crying patterns such as the high-pitched cry in birth asphyxia. Despite an annual burden of over 1.5 million infant deaths and disabilities, early detection of neonatal brain injuries due to asphyxia remains a challenge, particularly in develo** countries where the majorit… ▽ More

    Submitted 3 November, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  21. arXiv:2310.00229  [pdf, other

    cs.AI cs.LG

    Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning

    Authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio

    Abstract: Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning framework utilizing spatio-temporal abstractions to generalize better in novel situations. It automatically decomposes the given task into smaller, more manageable subtasks, and thus enables sparse decision-making and focused computation on the relevant parts of the environment. The decomposition relies… ▽ More

    Submitted 16 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera-Ready

  22. arXiv:2308.15470  [pdf, other

    cs.LG

    Policy composition in reinforcement learning via multi-objective policy optimization

    Authors: Shruti Mishra, Ankit Anand, Jordan Hoffmann, Nicolas Heess, Martin Riedmiller, Abbas Abdolmaleki, Doina Precup

    Abstract: We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teacher policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

  23. arXiv:2307.11046  [pdf, other

    cs.LG cs.AI

    A Definition of Continual Reinforcement Learning

    Authors: David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

    Abstract: In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning.… ▽ More

    Submitted 1 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023

  24. arXiv:2307.11044  [pdf, other

    cs.LG cs.AI

    On the Convergence of Bounded Agents

    Authors: David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

    Abstract: When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  25. arXiv:2307.07674  [pdf, other

    cs.LG

    An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

    Authors: Nikhil Vemgal, Elaine Lau, Doina Precup

    Abstract: Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared… ▽ More

    Submitted 17 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted to ICML 2023 workshop on Structured Probabilistic Inference & Generative Modeling

  26. arXiv:2306.10587  [pdf, other

    cs.LG cs.AI stat.ML

    Acceleration in Policy Optimization

    Authors: Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

    Abstract: We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bound… ▽ More

    Submitted 5 September, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  27. arXiv:2306.02451  [pdf, other

    cs.LG cs.AI stat.ML

    For SALE: State-Action Representation Learning for Deep Reinforcement Learning

    Authors: Scott Fujimoto, Wei-Di Chang, Edward J. Smith, Shixiang Shane Gu, Doina Precup, David Meger

    Abstract: In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-le… ▽ More

    Submitted 5 November, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023

  28. arXiv:2305.18246  [pdf, other

    cs.LG

    Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

    Authors: Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli

    Abstract: We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by… ▽ More

    Submitted 17 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Published in The Twelfth International Conference on Learning Representations (ICLR) 2024

  29. arXiv:2305.05666  [pdf, other

    cs.LG cs.AI

    Policy Gradient Methods in the Presence of Symmetries and State Abstractions

    Authors: Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger, Doina Precup

    Abstract: Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both st… ▽ More

    Submitted 7 March, 2024; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Published in the Journal of Machine Learning Research (JMLR). arXiv admin note: text overlap with arXiv:2209.07364

  30. arXiv:2305.00969  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

    Authors: David Budaghyan, Charles C. Onu, Arsenii Gorin, Cem Subakan, Doina Precup

    Abstract: This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 p… ▽ More

    Submitted 21 March, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: ICASSP 2024

  31. arXiv:2304.14621  [pdf, other

    cs.LG q-bio.BM

    MUDiff: Unified Diffusion for Complete Molecule Generation

    Authors: Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup

    Abstract: Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures main… ▽ More

    Submitted 5 February, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

  32. arXiv:2304.14274  [pdf, other

    cs.SI cs.LG

    When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability

    Authors: Sitao Luan, Chenqing Hua, Minkai Xu, Qincheng Lu, Jiaqi Zhu, Xiao-Wen Chang, Jie Fu, Jure Leskovec, Doina Precup

    Abstract: Homophily principle, i.e., nodes with the same labels are more likely to be connected, has been believed to be the main reason for the performance superiority of Graph Neural Networks (GNNs) over Neural Networks on node classification tasks. Recent research suggests that, even in the absence of homophily, the advantage of GNNs still exists as long as nodes from the same class share similar neighbo… ▽ More

    Submitted 1 January, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  33. arXiv:2304.00046  [pdf, other

    cs.LG cs.AI

    Accelerating exploration and representation learning with offline pre-training

    Authors: Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand

    Abstract: Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  34. arXiv:2302.06784  [pdf, other

    cs.CL

    The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis and Algorithm for Robust Natural Language Generation

    Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie C. K. Cheung

    Abstract: State-of-the-art language generation models can degenerate when applied to open-ended generation problems such as text completion, story generation, or dialog modeling. This degeneration usually shows up in the form of incoherence, lack of vocabulary diversity, and self-repetition or copying from the context. In this paper, we postulate that ``human-like'' generations usually lie in a narrow and n… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

  35. arXiv:2301.10119  [pdf, other

    cs.LG cs.AI

    Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

    Authors: Safa Alver, Doina Precup

    Abstract: Learning models of the environment from pure interaction is often considered an essential component of building lifelong reinforcement learning agents. However, the common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment, regardless of whether they are important in coming up with optimal decisions or not. In this paper, we argue t… ▽ More

    Submitted 11 June, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: Published as a conference paper at CoLLAs 2023

  36. arXiv:2301.00512  [pdf, other

    cs.LG

    On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

    Authors: Sumana Basu, Marc-André Legault, Adriana Romero-Soriano, Doina Precup

    Abstract: Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Ma… ▽ More

    Submitted 1 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023

  37. arXiv:2212.14405  [pdf, other

    cs.LG

    Offline Policy Optimization in RL with Variance Regularizaton

    Authors: Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

    Abstract: Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization fo… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: Old Draft, Offline RL Workshop, NeurIPS'20;

  38. arXiv:2212.10822  [pdf, other

    cs.LG cs.AI

    Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Neural Networks

    Authors: Sitao Luan, Mingde Zhao, Chenqing Hua, Xiao-Wen Chang, Doina Precup

    Abstract: The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood information of nodes. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN models for learning on certain datasets, as they force the node representations similar, maki… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: Accepted as Oral Presentation at NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022)

  39. arXiv:2211.13337  [pdf, other

    cs.LG

    Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

    Authors: David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

    Abstract: Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with their l… ▽ More

    Submitted 5 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

  40. arXiv:2211.12100  [pdf, other

    cs.CV

    Simulating Human Gaze with Neural Visual Attention

    Authors: Leo Schwinn, Doina Precup, Bjoern Eskofier, Dario Zanca

    Abstract: Existing models of human visual attention are generally unable to incorporate direct task guidance and therefore cannot model an intent or goal when exploring a scene. To integrate guidance of any downstream visual task into attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To this end, we impose to neural networks the biological constraint of foveated vision and train a… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  41. arXiv:2211.03011  [pdf, other

    cs.LG eess.SY stat.ML

    On learning history based policies for controlling Markov decision processes

    Authors: Gandharv Patil, Aditya Mahajan, Doina Precup

    Abstract: Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

  42. arXiv:2210.16979  [pdf, ps, other

    cs.LG

    When Do We Need Graph Neural Networks for Node Classification?

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Xiao-Wen Chang, Doina Precup

    Abstract: Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by additionally making use of graph structure based on the relational inductive bias (edge bias), rather than treating the nodes as collections of independent and identically distributed (i.i.d.) samples. Though GNNs are believed to outperform basic NNs in real-world tasks, it is found that in some cases, GNNs have little performance… ▽ More

    Submitted 3 November, 2023; v1 submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted by 12th International Conference on Complex Networks and Their Applications

  43. arXiv:2210.07606  [pdf, other

    cs.LG cs.SI

    Revisiting Heterophily For Graph Neural Networks

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Jiaqi Zhu, Mingde Zhao, Shuyuan Zhang, Xiao-Wen Chang, Doina Precup

    Abstract: Graph Neural Networks (GNNs) extend basic Neural Networks (NNs) by using graph structures based on the relational inductive bias (homophily assumption). While GNNs have been commonly believed to outperform NNs in real-world tasks, recent work has identified a non-trivial set of datasets where their performance compared to NNs is not satisfactory. Heterophily has been considered the main cause of t… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

    Comments: Published at 36th Conference on Neural Information Processing Systems (NeurIPS 2022). arXiv admin note: substantial text overlap with arXiv:2109.05641

  44. arXiv:2210.05918  [pdf, ps, other

    cs.LG cs.AI eess.SY stat.ML

    Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

    Authors: Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

    Abstract: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges… ▽ More

    Submitted 11 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 2023

  45. arXiv:2210.02552  [pdf, other

    cs.LG

    Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning

    Authors: Flemming Kondrup, Thomas Jiralerspong, Elaine Lau, Nathan de Lara, Jacob Shkrob, My Duc Tran, Doina Precup, Sumana Basu

    Abstract: Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present DeepVent, a Conservative Q-Learning (CQL) based offli… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

    Comments: to be published in IAAI (Innovative Applications of Artificial Intelligence) 2023

  46. arXiv:2210.01800  [pdf, other

    cs.LG cs.AI

    Bayesian Q-learning With Imperfect Expert Demonstrations

    Authors: Fengdi Che, Xiru Zhu, Doina Precup, David Meger, Gregory Dudek

    Abstract: Guided exploration with expert demonstrations improves data efficiency for reinforcement learning, but current algorithms often overuse expert information. We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations. The algorithm avoids excessive reliance on expert data by relaxing the optimal expert assumption and gradually reducing th… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  47. arXiv:2209.07364  [pdf, other

    cs.LG

    Continuous MDP Homomorphisms and Homomorphic Policy Gradient

    Authors: Sahand Rezaei-Shoshtari, Rosie Zhao, Prakash Panangaden, David Meger, Doina Precup

    Abstract: Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximat… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022

  48. arXiv:2206.08442  [pdf, other

    cs.LG cs.AI

    Understanding Decision-Time vs. Background Planning in Model-Based Reinforcement Learning

    Authors: Safa Alver, Doina Precup

    Abstract: In model-based reinforcement learning, an agent can leverage a learned model to improve its way of behaving in different ways. Two prevalent approaches are decision-time planning and background planning. In this study, we are interested in understanding under what conditions and in which settings one of these two planning styles will perform better than the other in domains that require fast respo… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  49. arXiv:2205.09619  [pdf, other

    cs.LG

    Improving Robustness against Real-World and Worst-Case Distribution Shifts through Decision Region Quantification

    Authors: Leo Schwinn, Leon Bungert, An Nguyen, René Raab, Falk Pulsmeyer, Doina Precup, Björn Eskofier, Dario Zanca

    Abstract: The reliability of neural networks is essential for their use in safety-critical applications. Existing approaches generally aim at improving the robustness of neural networks to either real-world distribution shifts (e.g., common corruptions and perturbations, spatial transformations, and natural adversarial examples) or worst-case distribution shifts (e.g., optimized adversarial examples). In th… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  50. arXiv:2204.10374  [pdf, other

    cs.LG

    Learning how to Interact with a Complex Interface using Hierarchical Reinforcement Learning

    Authors: Gheorghe Comanici, Amelia Glaese, Anita Gergely, Daniel Toyama, Zafarali Ahmed, Tyler Jackson, Philippe Hamel, Doina Precup

    Abstract: Hierarchical Reinforcement Learning (HRL) allows interactive agents to decompose complex problems into a hierarchy of sub-tasks. Higher-level tasks can invoke the solutions of lower-level tasks as if they were primitive actions. In this work, we study the utility of hierarchical decompositions for learning an appropriate way to interact with a complex interface. Specifically, we train HRL agents t… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.