Skip to main content

Showing 1–26 of 26 results for author: Asadi, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01838  [pdf, other

    cs.LG cs.AI

    Learning the Target Network in Function Space

    Authors: Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor

    Abstract: We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algo… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to International Conference on Machine Learning (ICML24)

  2. arXiv:2310.05905  [pdf, other

    cs.LG cs.AI cs.RO

    TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models

    Authors: Zuxin Liu, Jesse Zhang, Kavosh Asadi, Yao Liu, Ding Zhao, Shoham Sabach, Rasool Fakoor

    Abstract: The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly because of the scarcity of data and the computational challenges associated with training or fine-tuning these large models for such applications. Prior work mainly emphasizes either effective pretraining of large models for decision-making or single-task adaptation. But real-wor… ▽ More

    Submitted 8 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Published on ICLR 2024

  3. arXiv:2306.17833  [pdf, other

    cs.LG cs.AI

    Resetting the Optimizer in Deep RL: An Empirical Study

    Authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach

    Abstract: We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain th… ▽ More

    Submitted 14 November, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted at Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  4. arXiv:2306.17750  [pdf, other

    cs.LG

    TD Convergence: An Optimization Perspective

    Authors: Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor

    Abstract: We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two f… ▽ More

    Submitted 8 November, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted at Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2205.05588  [pdf, other

    cs.AI cs.LG

    Characterizing the Action-Generalization Gap in Deep Q-Learning

    Authors: Zhiyuan Zhou, Cameron Allen, Kavosh Asadi, George Konidaris

    Abstract: We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: To appear at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2022)

  6. arXiv:2112.05848  [pdf, other

    cs.LG cs.AI

    Faster Deep Reinforcement Learning with Slower Online Network

    Authors: Kavosh Asadi, Rasool Fakoor, Omer Gottesman, Taesup Kim, Michael L. Littman, Alexander J. Smola

    Abstract: Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrap**. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with u… ▽ More

    Submitted 17 April, 2023; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: Published at the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  7. arXiv:2110.12276  [pdf, other

    cs.LG

    Coarse-Grained Smoothness for RL in Metric Spaces

    Authors: Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman

    Abstract: Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  8. arXiv:2109.07054  [pdf, other

    cs.LG cs.AI cs.DS cs.HC

    Convergence of a Human-in-the-Loop Policy-Gradient Algorithm With Eligibility Trace Under Reward, Policy, and Advantage Feedback

    Authors: Ishaan Shah, David Halpern, Kavosh Asadi, Michael L. Littman

    Abstract: Fluid human-agent communication is essential for the future of human-in-the-loop reinforcement learning. An agent must respond appropriately to feedback from its human trainer even before they have significant experience working together. Therefore, it is important that learning agents respond well to various feedback schemes human trainers are likely to provide. This work analyzes the COnvergent… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted into ICML 2021 workshops Human-AI Collaboration in Sequential Decision-Making and Human in the Loop Learning

  9. arXiv:2102.09225  [pdf, other

    cs.LG stat.ML

    Continuous Doubly Constrained Batch Reinforcement Learning

    Authors: Rasool Fakoor, Jonas Mueller, Kavosh Asadi, Pratik Chaudhari, Alexander J. Smola

    Abstract: Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produc… ▽ More

    Submitted 6 December, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 conference paper

  10. arXiv:2002.05518  [pdf, other

    cs.LG cs.AI stat.ML

    Learning State Abstractions for Transfer in Continuous Control

    Authors: Kavosh Asadi, David Abel, Michael L. Littman

    Abstract: Can simple algorithms with a good representation solve challenging reinforcement learning problems? In this work, we answer this question in the affirmative, where we take "simple learning algorithm" to be tabular Q-Learning, the "good representations" to be a learned state abstraction, and "challenging problems" to be continuous control tasks. Our main contribution is a learning algorithm that ab… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  11. arXiv:2002.01883  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Radial-Basis Value Functions for Continuous Control

    Authors: Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman

    Abstract: A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the max… ▽ More

    Submitted 13 March, 2021; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI)

  12. arXiv:2001.05411  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Lifelong Reinforcement Learning

    Authors: Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu **nai, Emmanuel Rachelson, Michael L. Littman

    Abstract: We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfe… ▽ More

    Submitted 22 March, 2021; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: In proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), 21 pages, 11 figures

  13. arXiv:1905.13320  [pdf, other

    cs.LG cs.AI stat.ML

    Combating the Compounding-Error Problem with a Multi-step Model

    Authors: Kavosh Asadi, Dipendra Misra, Seungchan Kim, Michel L. Littman

    Abstract: Model-based reinforcement learning is an appealing framework for creating agents that learn, plan, and act in sequential environments. Model-based algorithms typically involve learning a transition model that takes a state and an action and outputs the next state---a one-step model. This model can be composed with itself to enable predicting multiple steps into the future, but one-step prediction… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

  14. arXiv:1902.05099  [pdf

    cs.CY

    Virtual Manipulation in an Immersive Virtual Environment: Simulation of Virtual Assembly

    Authors: Mojtaba Noghabaei, Khashayar Asadi, Kevin Han

    Abstract: To fill the lack of research efforts in virtual assembly of modules and training, this paper presents a virtual manipulation of building objects in an Immersive Virtual Environment (IVE). A worker wearing a Virtual Reality (VR) head-mounted device (HMD) virtually perform an assembly of multiple modules while identifying any issues. Hand motions of the worker are tracked by a motion sensor mounted… ▽ More

    Submitted 30 January, 2019; originally announced February 2019.

    Comments: 8 pages, 4 figures

  15. arXiv:1901.11078  [pdf, other

    cs.CV cs.LG stat.ML

    Real-world Map** of Gaze Fixations Using Instance Segmentation for Road Construction Safety Applications

    Authors: Idris Jeelani, Khashayar Asadi, Hariharan Ramshankar, Kevin Han, Alex Albert

    Abstract: Research studies have shown that a large proportion of hazards remain unrecognized, which expose construction workers to unanticipated safety risks. Recent studies have also found that a strong correlation exists between viewing patterns of workers, captured using eye-tracking devices, and their hazard recognition performance. Therefore, it is important to analyze the viewing patterns of workers t… ▽ More

    Submitted 1 February, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

    Comments: 2019 TRB Annual meeting

  16. arXiv:1901.08630  [pdf

    cs.RO

    Real-time Scene Segmentation Using a Light Deep Neural Network Architecture for Autonomous Robot Navigation on Construction Sites

    Authors: Khashayar Asadi, Pengyu Chen, Kevin Han, Tianfu Wu, Edgar Lobaton

    Abstract: Camera-equipped unmanned vehicles (UVs) have received a lot of attention in data collection for construction monitoring applications. To develop an autonomous platform, the UV should be able to process multiple modules (e.g., context-awareness, control, localization, and map**) on an embedded platform. Pixel-wise semantic segmentation provides a UV with the ability to be contextually aware of it… ▽ More

    Submitted 24 January, 2019; originally announced January 2019.

    Comments: The 2019 ASCE International Conference on Computing in Civil Engineering

  17. arXiv:1901.08180  [pdf

    cs.RO

    Vision-based Obstacle Removal System for Autonomous Ground Vehicles Using a Robotic Arm

    Authors: Khashayar Asadi, Rahul Jain, Ziqian Qin, Mingda Sun, Mojtaba Noghabaei, Jeremy Cole, Kevin Han, Edgar Lobaton

    Abstract: Over the past few years, the use of camera-equipped robotic platforms for data collection and visually monitoring applications has exponentially grown. Cluttered construction sites with many objects (e.g., bricks, pipes, etc.) on the ground are challenging environments for a mobile unmanned ground vehicle (UGV) to navigate. To address this issue, this study presents a mobile UGV equipped with a st… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Comments: The 2019 ASCE International Conference on Computing in Civil Engineering

  18. arXiv:1812.01129  [pdf, other

    cs.LG cs.AI

    Mitigating Planner Overfitting in Model-Based Reinforcement Learning

    Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

    Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More

    Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

  19. arXiv:1811.00128  [pdf, other

    cs.LG cs.AI stat.ML

    Towards a Simple Approach to Multi-step Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: When environmental interaction is expensive, model-based reinforcement learning offers a solution by planning ahead and avoiding costly mistakes. Model-based agents typically learn a single-step transition model. In this paper, we propose a multi-step model that predicts the outcome of an action sequence with variable length. We show that this model is easy to learn, and that the model can make po… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

  20. arXiv:1806.01265  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Equivalence Between Wasserstein and Value-Aware Loss for Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Evan Cater, Dipendra Misra, Michael L. Littman

    Abstract: Learning a generative model is a key component of model-based reinforcement learning. Though learning a good model in the tabular setting is a simple task, learning a useful model in the approximate setting is challenging. In this context, an important question is the loss function used for model learning as varying the loss function can have a remarkable impact on effectiveness of planning. Recen… ▽ More

    Submitted 8 July, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: Accepted at the FAIM workshop "Prediction and Generative Modeling in Reinforcement Learning", Stockholm, Sweden, 2018

  21. arXiv:1804.07193  [pdf, other

    cs.LG cs.AI stat.ML

    Lipschitz Continuity in Model-based Reinforcement Learning

    Authors: Kavosh Asadi, Dipendra Misra, Michael L. Littman

    Abstract: We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We provide a novel bound on multi-step prediction error of Lipschitz models where we quantify the error using the Wasserstein metric. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Li… ▽ More

    Submitted 27 July, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

    Comments: Accepted for the 35th International Conference on Machine Learning (ICML 2018)

  22. arXiv:1803.01745  [pdf, other

    cs.RO

    Building an Integrated Mobile Robotic System for Real-Time Applications in Construction

    Authors: Khashayar Asadi, Hariharan Ramshankar, Harish Pullagurla, Aishwarya Bhandare, Suraj Shanbhag, Pooja Mehta, Spondon Kundu, Kevin Han, Edgar Lobaton, Tianfu Wu

    Abstract: One of the major challenges of a real-time autonomous robotic system for construction monitoring is to simultaneously localize, map, and navigate over the lifetime of the robot, with little or no human intervention. Past research on Simultaneous Localization and Map** (SLAM) and context-awareness are two active research areas in the computer vision and robotics communities. The studies that inte… ▽ More

    Submitted 18 April, 2018; v1 submitted 5 March, 2018; originally announced March 2018.

  23. arXiv:1709.00503  [pdf, other

    stat.ML cs.AI cs.LG

    Mean Actor Critic

    Authors: Cameron Allen, Kavosh Asadi, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, Michael Littman

    Abstract: We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent's explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. We prove that this approach reduces variance in the policy gradient estimate rel… ▽ More

    Submitted 22 May, 2018; v1 submitted 1 September, 2017; originally announced September 2017.

  24. arXiv:1702.03274  [pdf, other

    cs.AI cs.CL

    Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

    Authors: Jason D. Williams, Kavosh Asadi, Geoffrey Zweig

    Abstract: End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-to-end approaches, HCNs… ▽ More

    Submitted 24 April, 2017; v1 submitted 10 February, 2017; originally announced February 2017.

    Comments: Accepted as a long paper for the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017)

  25. arXiv:1612.06000  [pdf, other

    cs.AI cs.LG stat.ML

    Sample-efficient Deep Reinforcement Learning for Dialog Control

    Authors: Kavosh Asadi, Jason D. Williams

    Abstract: Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required t… ▽ More

    Submitted 18 December, 2016; originally announced December 2016.

  26. arXiv:1612.05628  [pdf, other

    cs.AI cs.LG stat.ML

    An Alternative Softmax Operator for Reinforcement Learning

    Authors: Kavosh Asadi, Michael L. Littman

    Abstract: A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one's weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly… ▽ More

    Submitted 14 June, 2017; v1 submitted 16 December, 2016; originally announced December 2016.