Skip to main content

Showing 1–13 of 13 results for author: Durugkar, I

.
  1. arXiv:2404.10740  [pdf, other

    cs.AI

    N-Agent Ad Hoc Teamwork

    Authors: Caroline Wang, Arrasy Rahman, Ishan Durugkar, Elad Liebman, Peter Stone

    Abstract: Current approaches to learning cooperative behaviors in multi-agent settings assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls \textit{all} agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a $\textit{single}$ agent in the scenario. However, many cooperat… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    ACM Class: I.2.11; I.2.1; I.2.6; I.2.8

  2. arXiv:2310.06794  [pdf, other

    cs.LG cs.AI cs.RO

    $f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

    Authors: Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang

    Abstract: Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonst… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at NeurIPS 2023

  3. arXiv:2211.04005  [pdf, other

    cs.LG cs.AI

    ABC: Adversarial Behavioral Cloning for Offline Mode-Seeking Imitation Learning

    Authors: Eddy Hudson, Ishan Durugkar, Garrett Warnell, Peter Stone

    Abstract: Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behavioral cloning (BC). In this work, we describe a key disadvantage of BC that arises due to the maximum likelihood objective function; namely that BC is mea… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  4. arXiv:2206.00233  [pdf, other

    cs.MA cs.AI cs.LG cs.RO

    DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

    Authors: Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone

    Abstract: Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communication. It examines the use of distribution matching to facilitate the coordination of independent agents. In the proposed sch… ▽ More

    Submitted 12 March, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    ACM Class: I.2.0; I.2.8; I.2.9; I.2.11

  5. arXiv:2110.15331  [pdf, other

    cs.LG cs.AI

    Wasserstein Distance Maximizing Intrinsic Control

    Authors: Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih

    Abstract: This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal. Mutual information based objectives have shown some success in learning skills that reach a diverse set of states in this setting. These objectives include a KL-divergence term, which is maximized by visiting distinct states even if those states are not far apart in th… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  6. arXiv:2105.13345  [pdf, other

    cs.LG

    Adversarial Intrinsic Motivation for Reinforcement Learning

    Authors: Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

    Abstract: Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks. Specifically,… ▽ More

    Submitted 28 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

  7. arXiv:2008.06738  [pdf, other

    cs.LG cs.AI stat.ML

    Reducing Sampling Error in Batch Temporal Difference Learning

    Authors: Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone

    Abstract: Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of ti… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: Accepted to International Conference on Machine Learning (ICML) 2020

  8. arXiv:2008.01594  [pdf, other

    cs.AI cs.LG

    An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

    Authors: Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone

    Abstract: We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning. This problem is particularly important in sim-to-real transfer because simulators inevitably model real-world dynamics imperfectly. In this pape… ▽ More

    Submitted 16 November, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

    Journal ref: Neural Information Processing Systems (NeurIPS 2020)

  9. arXiv:1904.03295  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-Preference Actor Critic

    Authors: Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine

    Abstract: Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. We introduce a general method to incorporate multiple different feedback channels into a single policy gradient lo… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: NeurIPS Workshop on Deep RL, 2018

  10. arXiv:1711.05851  [pdf, other

    cs.CL cs.AI

    Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

    Authors: Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum

    Abstract: Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of p… ▽ More

    Submitted 30 December, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: ICLR 2018

  11. arXiv:1611.01673  [pdf, other

    cs.LG cs.MA cs.NE

    Generative Multi-Adversarial Networks

    Authors: Ishan Durugkar, Ian Gemp, Sridhar Mahadevan

    Abstract: Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game. In this paper, we propose the \emph{Generative Multi-Adversarial Network} (GMAN), a framework that extends GANs to multiple discriminators. In previous work, the successful training of GANs requires modifying the minimax objective to accelerate training early on. In contrast… ▽ More

    Submitted 2 March, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: Accepted as a conference paper (poster) at ICLR 2017

  12. arXiv:1608.05983  [pdf, other

    cs.LG stat.ML

    Inverting Variational Autoencoders for Improved Generative Accuracy

    Authors: Ian Gemp, Ishan Durugkar, Mario Parente, M. Darby Dyar, Sridhar Mahadevan

    Abstract: Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$). In the case where the codomain has known structure, a large unfeatured dataset ($\mathbf{y}$) is potentially available. We develop a parameter-efficient, deep semi-supervised generative model for the… ▽ More

    Submitted 24 August, 2017; v1 submitted 21 August, 2016; originally announced August 2016.

  13. arXiv:1606.04615  [pdf, other

    cs.LG cs.AI cs.NE

    Deep Reinforcement Learning With Macro-Actions

    Authors: Ishan P. Durugkar, Clemens Rosenbaum, Stefan Dernbach, Sridhar Mahadevan

    Abstract: Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement learning approaches. We concentrate on macro-actions,… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.