Skip to main content

Showing 1–12 of 12 results for author: Asadi, K

Searching in archive stat. Search in all archives.
.
  1. arXiv:2102.09225  [pdf, other

    cs.LG stat.ML

    Continuous Doubly Constrained Batch Reinforcement Learning

    Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

    Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More

    Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 conference paper

  2. arXiv:2002.05518  [pdf, other

    cs.LG cs.AI stat.ML

    Learning State Abstractions for Transfer in Continuous Control

    Authors: Kavosh Asadi, David Abel, Michael L. Littman

    Abstract: Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that ab… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  3. arXiv:2002.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Radial-Basis Value Functions for Continuous Control

    Authors: Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman

    Abstract: A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the max… ▽ More

    Submitted 13 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)

  4. arXiv:2001.05411  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Lifelong Reinforcement Learning

    Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu **nai, Emmanuel Rachelson, Michael L. Littman

    Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfe… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: In proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 21 pages, 11 figures

  5. arXiv:1905.13320  [pdf, other

    cs.LG cs.AI stat.ML

    Combating the Compounding-Error Problem with a Multi-step Model

    Authors: Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman

    Abstract: Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  6. arXiv:1901.11078  [pdf, other

    cs.CV cs.LG stat.ML

    Real-world Map** of Gaze Fixations Using Instance Segmentation for Road Construction Safety Applications

    Authors: Idris Jeelani, Khashayar Asadi, Hariharan Ramshankar, Kevin Han, Alex Albert

    Abstract: Research studies have shown that a large proportion of hazards remain unrecognized, which expose construction workers to unanticipated safety risks. Recent studies have also found that a strong correlation exists between viewing patterns of workers, captured using eye-tracking devices, and their hazard recognition performance. Therefore, it is important to analyze the viewing patterns of workers t… ▽ More

    Submitted 1 February, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: 2019 TRB Annual meeting

  7. arXiv:1811.00128  [pdf, other

    cs.LG cs.AI stat.ML

    Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make po… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

  8. arXiv:1806.01265  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: Learning a generative model is a key component of model-based reinforcement learning. Though learning a good model in the tabular setting is a simple task, learning a useful model in the approximate setting is challenging. In this context, an important question is the loss function used for model learning as varying the loss function can have a remarkable impact on effectiveness of planning. Recen… ▽ More

    Submitted 8 July, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: Accepted at the FAIM workshop "Prediction and Generative Modeling in Reinforcement Learning", Stockholm, Sweden, 2018

  9. arXiv:1804.07193  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Continuity in Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Dipendra Misra, Michael L. Littman

    Abstract: We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We provide a novel bound on multi-step prediction error of Lipschitz models where we quantify the error using the Wasserstein metric. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Li… ▽ More

    Submitted 27 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Accepted for the 35th International Conference on Machine Learning (ICML 2018)

  10. arXiv:1709.00503  [pdf, other

    stat.ML cs.AI cs.LG

    Mean Actor Critic

    Authors: Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

    Abstract: We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate rel… ▽ More

    Submitted 22 May, 2018; v1 submitted 1 September, 2017; originally announced September 2017.

  11. arXiv:1612.06000  [pdf, other

    cs.AI cs.LG stat.ML

    Sample-efficient Deep Reinforcement Learning for Dialog Control

    Authors: Kavosh Asadi, Jason D. Williams

    Abstract: Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required t… ▽ More

    Submitted 18 December, 2016; originally announced December 2016.

  12. arXiv:1612.05628  [pdf, other

    cs.AI cs.LG stat.ML

    An Alternative Softmax Operator for Reinforcement Learning

    Authors: Kavosh Asadi, Michael L. Littman

    Abstract: A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly… ▽ More

    Submitted 14 June, 2017; v1 submitted 16 December, 2016; originally announced December 2016.