Skip to main content

Showing 1–50 of 64 results for author: Bedi, A S

.
  1. arXiv:2406.10918  [pdf, other

    cs.LG cs.AI cs.CL

    Embodied Question Answering via Multi-LLM Systems

    Authors: Bhrij Patel, Vishnu Sashank Dorbala, Dinesh Manocha, Amrit Singh Bedi

    Abstract: Embodied Question Answering (EQA) is an important problem, which involves an agent exploring the environment to answer user queries. In the existing literature, EQA has exclusively been studied in single-agent scenarios, where exploration can be time-consuming and costly. In this work, we consider EQA in a multi-agent framework involving multiple large language models (LLM) based agents independen… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 17 pages, 13 Figures, 4 Tables

  2. arXiv:2406.10892  [pdf, other

    cs.LG

    DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

    Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

    Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2405.20495  [pdf, other

    cs.CL cs.LG

    Transfer Q Star: Principled Decoding for LLM Alignment

    Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

    Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.13879  [pdf, other

    cs.GT cs.DC cs.LG econ.TH

    FACT or Fiction: Can Truthful Mechanisms Eliminate Federated Free Riding?

    Authors: Marco Bornstein, Amrit Singh Bedi, Abdirisak Mohamed, Furong Huang

    Abstract: Standard federated learning (FL) approaches are vulnerable to the free-rider dilemma: participating agents can contribute little to nothing yet receive a well-trained aggregated model. While prior mechanisms attempt to solve the free-rider dilemma, none have addressed the issue of truthfulness. In practice, adversarial agents can provide false information to the server in order to cheat its way ou… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 18 pages, 5 figures

  5. arXiv:2405.01843  [pdf, ps, other

    cs.LG cs.AI

    Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: The current state-of-the-art theoretical analysis of Actor-Critic (AC) algorithms significantly lags in addressing the practical aspects of AC implementations. This crucial gap needs bridging to bring the analysis in line with practical implementations of AC. To address this, we advocate for considering the MMCLG criteria: \textbf{M}ulti-layer neural network parametrization for actor/critic, \text… ▽ More

    Submitted 11 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024. This is a revised version of arXiv:2306.10486, where we have gone from finite action space to continuous action space, from average iterate convergence to last iterate convergence and from $ε^{-4}$ to $ε^{-3}$ sample complexity

  6. arXiv:2404.13423  [pdf, other

    cs.LG

    PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

    Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More

    Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  7. arXiv:2403.11925  [pdf, other

    cs.LG

    Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

    Authors: Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 26 Pages, 2 Figures

  8. arXiv:2403.09905  [pdf, other

    cs.RO cs.CV

    Right Place, Right Time! Towards ObjectNav for Non-Stationary Goals

    Authors: Vishnu Sashank Dorbala, Bhrij Patel, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment. We refer to this task Portable ObjectNav (or P-ObjectNav), and in this work, present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy. In contrast to ObjNav where target object locations are fixed for each epis… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 32

  9. arXiv:2402.10340  [pdf, other

    cs.RO cs.AI

    Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

    Authors: Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, **g Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

    Abstract: In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplo… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  10. arXiv:2402.08925  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

    Authors: Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting it… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  11. arXiv:2402.03494  [pdf, other

    cs.AI cs.RO

    Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

    Authors: Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi, Aniket Bera

    Abstract: While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the as… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 28 pages, 7 figures

  12. arXiv:2312.14436  [pdf, other

    cs.RO cs.LG

    REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, Pratap Tokekar, Dinesh Manocha, Amrit Singh Bedi

    Abstract: The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is heavily dependent on the design of the underlying reward function. However, a misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world. Current methods to mitigate this misalignment work by learning reward functions from human preference… ▽ More

    Submitted 14 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  13. arXiv:2310.15264  [pdf, other

    cs.CL cs.AI

    Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey

    Authors: Soumya Suvra Ghosal, Souradip Chakraborty, Jonas Gei**, Furong Huang, Dinesh Manocha, Amrit Singh Bedi

    Abstract: Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. However, despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs such as spreading misinformation, generating fake news, plagiarism in academia, and contami… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  14. arXiv:2310.13681  [pdf, other

    cs.GT cs.CY cs.DC cs.LG econ.TH

    Towards Realistic Mechanisms That Incentivize Federated Participation and Contribution

    Authors: Marco Bornstein, Amrit Singh Bedi, Anit Kumar Sahu, Furqan Khan, Furong Huang

    Abstract: Edge device participation in federating learning (FL) is typically studied through the lens of device-server communication (e.g., device dropout) and assumes an undying desire from edge devices to participate in FL. As a result, current FL frameworks are flawed when implemented in realistic settings, with many encountering the free-rider dilemma. In a step to push FL towards realistic settings, we… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: 24 pages, 11 figures

  15. arXiv:2310.00481  [pdf, other

    cs.RO

    LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments

    Authors: Chak Lam Shek, Xiyang Wu, Wesley A. Suttle, Carl Busart, Erin Zaroukian, Dinesh Manocha, Pratap Tokekar, Amrit Singh Bedi

    Abstract: Navigating robots through unstructured terrains is challenging, primarily due to the dynamic environmental changes. While humans adeptly navigate such terrains by using context from their observations, creating a similar context-aware navigation system for robots is difficult. The essence of the issue lies in the acquisition and interpretation of contextual information, a task complicated by the i… ▽ More

    Submitted 19 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

  16. arXiv:2308.02585  [pdf, other

    cs.LG

    PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

    Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment obj… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  17. arXiv:2306.10486  [pdf, ps, other

    cs.LG

    On the Global Convergence of Natural Actor-Critic with Two-layer Neural Network Parametrization

    Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal

    Abstract: Actor-critic algorithms have shown remarkable success in solving state-of-the-art decision-making problems. However, despite their empirical effectiveness, their theoretical underpinnings remain relatively unexplored, especially with neural network parametrization. In this paper, we delve into the study of a natural actor-critic algorithm that utilizes neural networks to represent the critic. Our… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2211.07675

    ACM Class: F.2.1

  18. arXiv:2306.06236  [pdf, other

    cs.MA cs.LG cs.RO

    iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

    Authors: Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  19. arXiv:2306.06192  [pdf, other

    cs.RO cs.AI cs.LG

    Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

    Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in roboti… ▽ More

    Submitted 20 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 11 pages, 9 figures, 2 tables

  20. arXiv:2304.04736  [pdf, other

    cs.CL cs.AI cs.LG

    On the Possibilities of AI-Generated Text Detection

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang

    Abstract: Our work addresses the critical issue of distinguishing text generated by Large Language Models (LLMs) from human-produced text, a task essential for numerous applications. Despite ongoing debate about the feasibility of such differentiation, we present evidence supporting its consistent achievability, except when human and machine text distributions are indistinguishable across their entire suppo… ▽ More

    Submitted 2 October, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

  21. arXiv:2303.07622  [pdf, other

    cs.RO cs.AI cs.LG

    RE-MOVE: An Adaptive Policy Design for Robotic Navigation Tasks in Dynamic Environments via Language-Based Feedback

    Authors: Souradip Chakraborty, Kasun Weerakoon, Prithvi Poddar, Mohamed Elnoor, Priya Narayanan, Carl Busart, Pratap Tokekar, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (REquest help and MOVE on) to adapt already trained policy to real-time changes in the environment without re-training vi… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  22. arXiv:2301.12083  [pdf, other

    cs.LG math.OC stat.ML

    Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

    Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

    Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak… ▽ More

    Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  23. arXiv:2301.12038  [pdf, other

    cs.LG cs.AI stat.ML

    STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha

    Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instanc… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  24. arXiv:2210.14026  [pdf, other

    cs.DC cs.LG math.OC

    SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

    Authors: Marco Bornstein, Tahseen Rabbani, Evan Wang, Amrit Singh Bedi, Furong Huang

    Abstract: The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this wo… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: 30 pages, 9 figures

  25. arXiv:2209.06415  [pdf, other

    cs.RO

    DMCA: Dense Multi-agent Navigation using Attention and Communication

    Authors: Senthil Hariharan Arul, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In decentralized multi-robot navigation, ensuring safe and efficient movement with limited environmental awareness remains a challenge. While robots traditionally navigate based on local observations, this approach falters in complex environments. A possible solution is to enhance understanding of the world through inter-agent communication, but mere information broadcasting falls short in efficie… ▽ More

    Submitted 25 June, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

  26. arXiv:2209.05738  [pdf, other

    cs.RO cs.MA

    RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

    Authors: Aakriti Agrawal, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots… ▽ More

    Submitted 27 February, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Journal ref: ICRA 2023

  27. arXiv:2209.02865  [pdf, other

    cs.RO cs.LG cs.MA

    DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments

    Authors: Aakriti Agrawal, Senthil Hariharan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Journal ref: IROS-2022

  28. arXiv:2207.03694  [pdf, other

    cs.RO

    HTRON:Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm

    Authors: Kasun Weerakoon, Souradip Chakraborty, Nare Karapetyan, Adarsh Jagan Sathyamoorthy, Amrit Singh Bedi, Dinesh Manocha

    Abstract: We present a novel approach to improve the performance of deep reinforcement learning (DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on carefully designed dense reward functions that learn the efficient behavior in an environment. We circumvent this issue by working only with sparse rewards (which are easy to design), and propose a novel adaptive Heavy-Tailed Re… ▽ More

    Submitted 10 October, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

  29. arXiv:2206.10815  [pdf, other

    cs.LG cs.DC math.OC

    FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

    Authors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

    Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin… ▽ More

    Submitted 1 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  30. arXiv:2206.08829  [pdf, other

    cs.LG cs.CR cs.DC stat.ML

    FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning

    Authors: Anis Elgabli, Chaouki Ben Issaid, Amrit S. Bedi, Ketan Rajawat, Mehdi Bennis, Vaneet Aggarwal

    Abstract: Newton-type methods are popular in federated learning due to their fast convergence. Still, they suffer from two main issues, namely: low communication efficiency and low privacy due to the requirement of sending Hessian information from clients to parameter server (PS). In this work, we introduced a novel framework called FedNew in which there is no need to transmit Hessian information from clien… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  31. arXiv:2206.05850  [pdf, other

    cs.LG cs.AI eess.SY

    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

    Authors: Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

    Abstract: We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value fu… ▽ More

    Submitted 16 May, 2024; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023

  32. arXiv:2206.05652  [pdf, other

    cs.LG cs.RO eess.SY

    Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

    Abstract: In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either rewa… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

  33. arXiv:2206.01162  [pdf, other

    cs.LG math.OC stat.ML

    Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

    Abstract: Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assump… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  34. arXiv:2203.00851  [pdf, other

    cs.RO math.OC

    Distributed Riemannian Optimization with Lazy Communication for Collaborative Geometric Estimation

    Authors: Yulun Tian, Amrit Singh Bedi, Alec Koppel, Miguel Calvo-Fullana, David M. Rosen, Jonathan P. How

    Abstract: We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and map** (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the ne… ▽ More

    Submitted 29 July, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: technical report (17 pages, 3 figures); to appear at IROS 2022

  35. arXiv:2201.12332  [pdf, other

    cs.LG cs.AI math.OC

    On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

    Authors: Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

    Abstract: We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that app… ▽ More

    Submitted 30 January, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  36. arXiv:2110.12929  [pdf, other

    math.OC stat.ML

    Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

    Authors: Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal

    Abstract: In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  37. arXiv:2110.11721  [pdf, other

    math.OC cs.CC cs.LG

    Projection-Free Algorithm for Stochastic Bi-level Optimization

    Authors: Zeeshan Akhtar, Amrit Singh Bedi, Srujan Teja Thomdapu, Ketan Rajawat

    Abstract: This work presents the first projection-free algorithm to solve stochastic bi-level optimization problems, where the objective function depends on the solution of another stochastic optimization problem. The proposed $\textbf{S}$tochastic $\textbf{Bi}$-level $\textbf{F}$rank-$\textbf{W}$olfe ($\textbf{SBFW}$) algorithm can be applied to streaming settings and does not make use of large batches or… ▽ More

    Submitted 3 April, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: 34 Pages

  38. arXiv:2109.06332  [pdf, other

    cs.LG

    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

    Authors: Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

    Abstract: Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to sol… ▽ More

    Submitted 13 July, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: This paper is the arXiv version with Appendices of the published AAAI paper: "Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach," in Proc. AAAI, Feb 2022. The paper has been further extended with concave utilities and constraints in v2

    Journal ref: AAAI 2022

  39. arXiv:2107.12797  [pdf, other

    stat.ML cs.LG

    Wasserstein-Splitting Gaussian Process Regression for Heterogeneous Online Bayesian Inference

    Authors: Michael E. Kepler, Alec Koppel, Amrit Singh Bedi, Daniel J. Stilwell

    Abstract: Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data. In this work, we seek to overcome these issues through (i) employing variational free energy approximations of GPs operating in tandem with online expectation pro… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  40. arXiv:2106.08414  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

    Authors: Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

    Abstract: Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and sui… ▽ More

    Submitted 2 January, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  41. arXiv:2106.00543  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

    Authors: Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

    Abstract: We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a \emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {\bf D}ecentralized {\bf S}hadow Reward {\bf A}ctor-{\bf C}rit… ▽ More

    Submitted 24 June, 2021; v1 submitted 29 May, 2021; originally announced June 2021.

  42. arXiv:2105.14772  [pdf, ps, other

    cs.LG cs.AI cs.DC

    Energy-Efficient and Federated Meta-Learning via Projected Stochastic Gradient Ascent

    Authors: Anis Elgabli, Chaouki Ben Issaid, Amrit S. Bedi, Mehdi Bennis, Vaneet Aggarwal

    Abstract: In this paper, we propose an energy-efficient federated meta-learning framework. The objective is to enable learning a meta-model that can be fine-tuned to a new task with a few number of samples in a distributed setting and at low computation and communication energy consumption. We assume that each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model. Ass… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

  43. arXiv:2008.05758  [pdf, other

    math.OC cs.LG eess.SP

    Conservative Stochastic Optimization with Expectation Constraints

    Authors: Zeeshan Akhtar, Amrit Singh Bedi, Ketan Rajawat

    Abstract: This paper considers stochastic convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables, in addition to deterministic convex constraints on the domain of the variables. Although the setting is generic and arises in different machine learning applications, online and efficient approaches for solving s… ▽ More

    Submitted 29 May, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

  44. arXiv:2007.02151  [pdf, other

    cs.LG stat.ML

    Variational Policy Gradient Method for Reinforcement Learning with General Utilities

    Authors: Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

    Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes se… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

  45. arXiv:2003.10550  [pdf, other

    cs.LG cs.IT stat.ML

    Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

    Authors: Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

    Abstract: Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB… ▽ More

    Submitted 21 March, 2022; v1 submitted 23 March, 2020; originally announced March 2020.

  46. arXiv:2003.03281  [pdf, ps, other

    math.OC cs.MA cs.RO

    Asynchronous and Parallel Distributed Pose Graph Optimization

    Authors: Yulun Tian, Alec Koppel, Amrit Singh Bedi, Jonathan P. How

    Abstract: We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and map**. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers i… ▽ More

    Submitted 30 June, 2023; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: full paper with appendices

  47. arXiv:2002.12475  [pdf, other

    stat.ML cs.AI cs.LG eess.SY math.OC

    Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

    Authors: Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

    Abstract: We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we cal… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

  48. arXiv:2001.00685  [pdf, other

    math.OC

    Online Trajectory Optimization Using Inexact Gradient Feedback for Time-Varying Environments

    Authors: Mohan Krishna Nutalapati, Amrit Singh Bedi, Ketan Rajawat, Marceau Coupechoux

    Abstract: This paper considers the problem of online trajectory design under time-varying environments. We formulate the general trajectory optimization problem within the framework of time-varying constrained convex optimization and proposed a novel version of the online gradient ascent algorithm for such problems. Moreover, the gradient feedback is noisy and allows us to use the proposed algorithm for a r… ▽ More

    Submitted 6 January, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: arXiv admin note: text overlap with arXiv:1804.04860

  49. arXiv:1910.10453  [pdf, other

    cs.LG cs.DC cs.IT cs.NI stat.ML

    Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning

    Authors: Anis Elgabli, Jihong Park, Amrit S. Bedi, Chaouki Ben Issaid, Mehdi Bennis, Vaneet Aggarwal

    Abstract: In this article, we propose a communication-efficient decentralized machine learning (ML) algorithm, coined quantized group ADMM (Q-GADMM). To reduce the number of communication links, every worker in Q-GADMM communicates only with two neighbors, while updating its model via the group alternating direction method of multipliers (GADMM). Moreover, each worker transmits the quantized difference betw… ▽ More

    Submitted 3 October, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 19 pages, 8 figures; to appear in IEEE Transactions on Communications

  50. arXiv:1909.11555  [pdf, other

    eess.SP cs.LG math.OC

    Optimally Compressed Nonparametric Online Learning

    Authors: Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler

    Abstract: Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models. To go beyond linear in the online setting, nonparametric methods are of interest due to their universality and ability to stably incorporate new information via convexity or Bayes' Rule. Unfortunately, when used online, nonparametric meth… ▽ More

    Submitted 17 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.