Skip to main content

Showing 1–50 of 64 results for author: Koppel, A

.
  1. arXiv:2406.15567  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    SAIL: Self-Improving Efficient Online Alignment of Large Language Models

    Authors: Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 24 pages, 6 figures, 3 tables

  2. arXiv:2406.13992  [pdf, ps, other

    cs.MA eess.SY

    Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective

    Authors: Muhammad Aneeq uz Zaman, Mathieu Laurière, Alec Koppel, Tamer Başar

    Abstract: In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainti… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for publication in L4DC 2024

  3. arXiv:2405.07432  [pdf, other

    stat.ML cs.LG eess.SY

    Compressed Online Learning of Conditional Mean Embedding

    Authors: Boya Hou, Sina Sanjari, Alec Koppel, Subhonmesh Bose

    Abstract: The conditional mean embedding (CME) encodes Markovian stochastic kernels through their actions on probability distributions embedded within the reproducing kernel Hilbert spaces (RKHS). The CME plays a key role in several well-known machine learning tasks such as reinforcement learning, analysis of dynamical systems, etc. We present an algorithm to learn the CME incrementally from data via an ope… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 39 pages

  4. arXiv:2403.11925  [pdf, other

    cs.LG

    Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

    Authors: Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

    Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 26 Pages, 2 Figures

  5. arXiv:2403.11345  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective

    Authors: Muhammad Aneeq uz Zaman, Alec Koppel, Mathieu Laurière, Tamer Başar

    Abstract: We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in th… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  6. arXiv:2403.08936  [pdf, other

    cs.MA cs.AI cs.RO

    Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

    Authors: Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Amrit Bedi, Pratap Tokekar

    Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  7. arXiv:2402.08925  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences

    Authors: Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences inherent in data collected from multiple users. In this work, we first derive an impossibility result of alignment with single reward RLHF, thereby highlighting it… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  8. arXiv:2311.10927  [pdf, other

    cs.GT cs.LG

    Learning Payment-Free Resource Allocation Mechanisms

    Authors: Sihan Zeng, Sujay Bhatt, Eleonora Kreacic, Parisa Hassanzadeh, Alec Koppel, Sumitra Ganesh

    Abstract: We consider the design of mechanisms that allocate limited resources among self-interested agents using neural networks. Unlike the recent works that leverage machine learning for revenue maximization in auctions, we consider welfare maximization as the key objective in the payment-free setting. Without payment exchange, it is unclear how we can align agents' incentives to achieve the desired obje… ▽ More

    Submitted 12 April, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  9. arXiv:2310.07320  [pdf, other

    cs.LG eess.SY

    Byzantine-Resilient Decentralized Multi-Armed Bandits

    Authors: **gxuan Zhu, Alec Koppel, Alvaro Velasquez, Ji Liu

    Abstract: In decentralized cooperative multi-armed bandits (MAB), each agent observes a distinct stream of rewards, and seeks to exchange information with others to select a sequence of arms so as to minimize its regret. Agents in the cooperative setting can outperform a single agent running a MAB method such as Upper-Confidence Bound (UCB) independently. In this work, we study how to recover such salient b… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  10. arXiv:2308.02585  [pdf, other

    cs.LG

    PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Huazheng Wang, Mengdi Wang, Furong Huang

    Abstract: We present a novel unified bilevel optimization-based framework, \textsf{PARL}, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning using utility or preference-based feedback. We identify a major gap within current algorithmic designs for solving policy alignment due to a lack of precise characterization of the dependence of the alignment obj… ▽ More

    Submitted 30 April, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

  11. arXiv:2306.15444  [pdf, other

    math.OC cs.LG stat.ML

    Limited-Memory Greedy Quasi-Newton Method with Non-asymptotic Superlinear Convergence Rate

    Authors: Zhan Gao, Aryan Mokhtari, Alec Koppel

    Abstract: Non-asymptotic convergence analysis of quasi-Newton methods has gained attention with a landmark result establishing an explicit local superlinear rate of O$((1/\sqrt{t})^t)$. The methods that obtain this rate, however, exhibit a well-known drawback: they require the storage of the previous Hessian approximation matrix or all past curvature information to form the current Hessian inverse approxima… ▽ More

    Submitted 18 October, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  12. arXiv:2306.06192  [pdf, other

    cs.RO cs.AI cs.LG

    Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

    Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in roboti… ▽ More

    Submitted 20 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 11 pages, 9 figures, 2 tables

  13. arXiv:2306.05046  [pdf, other

    cs.LG

    A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

    Authors: Yifan Yang, Alec Koppel, Zheng Zhang

    Abstract: Learning with noisy labels is an important topic for scalable training in many real-world scenarios. However, few previous research considers this problem in the online setting, where the arrival of data is streaming. In this paper, we propose a novel gradient-based approach to enable the detection of noisy labels for the online learning of model parameters, named Online Gradient-based Robust Sele… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  14. arXiv:2305.17568  [pdf, other

    cs.LG math.OC

    Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

    Authors: Donghao Ying, Yunkai Zhang, Yuhao Ding, Alec Koppel, Javad Lavaei

    Abstract: We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploratio… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: 50 pages

  15. arXiv:2305.17283  [pdf, other

    math.OC cs.LG

    Sharpened Lazy Incremental Quasi-Newton Method

    Authors: Aakash Lahoti, Spandan Senapati, Ketan Rajawat, Alec Koppel

    Abstract: The problem of minimizing the sum of $n$ functions in $d$ dimensions is ubiquitous in machine learning and statistics. In many applications where the number of observations $n$ is large, it is necessary to use incremental or stochastic methods, as their per-iteration cost is independent of $n$. Of these, Quasi-Newton (QN) methods strike a balance between the per-iteration cost and the convergence… ▽ More

    Submitted 12 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 36 pages, 2 figures; Accepted to AISTATS 2024

  16. arXiv:2302.07938  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Scalable Multi-Agent Reinforcement Learning with General Utilities

    Authors: Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei

    Abstract: We study the scalable multi-agent reinforcement learning (MARL) with general utilities, defined as nonlinear functions of the team's long-term state-action occupancy measure. The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team. By exploiting the spatial correlation decay property of th… ▽ More

    Submitted 26 August, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

    Comments: Supplementary material for the contribution to American Control Conference 2023 under the same title

  17. arXiv:2301.12083  [pdf, other

    cs.LG math.OC stat.ML

    Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

    Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

    Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak… ▽ More

    Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  18. arXiv:2301.12038  [pdf, other

    cs.LG cs.AI stat.ML

    STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha

    Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instanc… ▽ More

    Submitted 18 September, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  19. arXiv:2208.11639  [pdf, ps, other

    cs.LG cs.GT cs.MA

    Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path

    Authors: Muhammad Aneeq uz Zaman, Alec Koppel, Sujay Bhatt, Tamer Başar

    Abstract: We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional approaches, we alleviate the need for a mean-field oracle by develo** an algorithm that approximates the Mean-Field Equilibrium (MFE) using the single sample path of the generic agent. We call this {\it Sandbox Learning}, as it can be used as a warm-start for any agent learning in a multi-agent non-cooperati… ▽ More

    Submitted 11 April, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: Accepted for publication in AISTATS 2023

  20. arXiv:2206.10815  [pdf, other

    cs.LG cs.DC math.OC

    FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

    Authors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

    Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin… ▽ More

    Submitted 1 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

  21. arXiv:2206.05652  [pdf, other

    cs.LG cs.RO eess.SY

    Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Pratap Tokekar, Dinesh Manocha

    Abstract: In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG) algorithm to deal with the challenges of sparse rewards in continuous control problems. Sparse reward is common in continuous control robotics tasks such as manipulation and navigation, and makes the learning problem hard due to non-trivial estimation of value functions over the state space. This demands either rewa… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

  22. arXiv:2206.01162  [pdf, other

    cs.LG math.OC stat.ML

    Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

    Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

    Abstract: Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assump… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

  23. arXiv:2203.00851  [pdf, other

    cs.RO math.OC

    Distributed Riemannian Optimization with Lazy Communication for Collaborative Geometric Estimation

    Authors: Yulun Tian, Amrit Singh Bedi, Alec Koppel, Miguel Calvo-Fullana, David M. Rosen, Jonathan P. How

    Abstract: We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and map** (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the ne… ▽ More

    Submitted 29 July, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: technical report (17 pages, 3 figures); to appear at IROS 2022

  24. arXiv:2202.10538  [pdf, other

    math.OC

    Sharpened Quasi-Newton Methods: Faster Superlinear Rate and Larger Local Convergence Neighborhood

    Authors: Qiujiang **, Alec Koppel, Ketan Rajawat, Aryan Mokhtari

    Abstract: Non-asymptotic analysis of quasi-Newton methods have gained traction recently. In particular, several works have established a non-asymptotic superlinear rate of $\mathcal{O}((1/\sqrt{t})^t)$ for the (classic) BFGS method by exploiting the fact that its error of Newton direction approximation approaches zero. Moreover, a greedy variant of BFGS was recently proposed which accelerates its convergenc… ▽ More

    Submitted 15 June, 2022; v1 submitted 21 February, 2022; originally announced February 2022.

  25. arXiv:2201.12332  [pdf, other

    cs.LG cs.AI math.OC

    On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

    Authors: Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

    Abstract: We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that app… ▽ More

    Submitted 30 January, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

  26. arXiv:2201.08832  [pdf, other

    cs.LG math.OC

    Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search

    Authors: Wesley A. Suttle, Alec Koppel, Ji Liu

    Abstract: In this work, we propose an information-directed objective for infinite-horizon reinforcement learning (RL), called the occupancy information ratio (OIR), inspired by the information ratio objectives used in previous information-directed sampling schemes for multi-armed bandits and Markov decision processes as well as recent advances in general utility RL. The OIR, comprised of a ratio between the… ▽ More

    Submitted 28 December, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

  27. arXiv:2201.07130  [pdf, other

    cs.LG stat.ML

    Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

    Authors: Cole Hawkins, Alec Koppel, Zheng Zhang

    Abstract: A fundamental challenge in Bayesian inference is efficient representation of a target distribution. Many non-parametric approaches do so by sampling a large number of points using variants of Markov Chain Monte Carlo (MCMC). We propose an MCMC variant that retains only those posterior samples which exceed a KSD threshold, which we call KSD Thinning. We establish the convergence and complexity trad… ▽ More

    Submitted 21 April, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: New version corrects error in notation regarding dictionary size growth. Little o to Omega

  28. arXiv:2110.12929  [pdf, other

    math.OC stat.ML

    Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming

    Authors: Alec Koppel, Amrit Singh Bedi, Bhargav Ganguly, Vaneet Aggarwal

    Abstract: In tabular multi-agent reinforcement learning with average-cost criterion, a team of agents sequentially interacts with the environment and observes local incentives. We focus on the case that the global reward is a sum of local rewards, the joint policy factorizes into agents' marginals, and full state observability. To date, few global optimality guarantees exist even for this simple setting, as… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  29. arXiv:2110.06401  [pdf, other

    cs.RO

    Distributed Gaussian Process Map** for Robot Teams with Time-varying Communication

    Authors: James Di, Ehsan Zobeidi, Alec Koppel, Nikolay Atanasov

    Abstract: Multi-agent map** is a fundamentally important capability for autonomous robot task coordination and execution in complex environments. While successful algorithms have been proposed for map** using individual platforms, cooperative online map** for teams of robots remains largely a challenge. We focus on probabilistic variants of map** due to its potential utility in downstream tasks such… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  30. arXiv:2109.06332  [pdf, other

    cs.LG

    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

    Authors: Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

    Abstract: Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to sol… ▽ More

    Submitted 13 July, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: This paper is the arXiv version with Appendices of the published AAAI paper: "Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach," in Proc. AAAI, Feb 2022. The paper has been further extended with concave utilities and constraints in v2

    Journal ref: AAAI 2022

  31. arXiv:2107.12797  [pdf, other

    stat.ML cs.LG

    Wasserstein-Splitting Gaussian Process Regression for Heterogeneous Online Bayesian Inference

    Authors: Michael E. Kepler, Alec Koppel, Amrit Singh Bedi, Daniel J. Stilwell

    Abstract: Gaussian processes (GPs) are a well-known nonparametric Bayesian inference technique, but they suffer from scalability problems for large sample sizes, and their performance can degrade for non-stationary or spatially heterogeneous data. In this work, we seek to overcome these issues through (i) employing variational free energy approximations of GPs operating in tandem with online expectation pro… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  32. arXiv:2106.08414  [pdf, other

    cs.LG cs.AI eess.SY math.OC stat.ML

    On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

    Authors: Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel

    Abstract: Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and sui… ▽ More

    Submitted 2 January, 2023; v1 submitted 15 June, 2021; originally announced June 2021.

  33. arXiv:2106.00543  [pdf, other

    stat.ML cs.AI cs.LG math.OC

    MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

    Authors: Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

    Abstract: We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a \emph{general utility}. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the {\bf D}ecentralized {\bf S}hadow Reward {\bf A}ctor-{\bf C}rit… ▽ More

    Submitted 24 June, 2021; v1 submitted 29 May, 2021; originally announced June 2021.

  34. arXiv:2103.16170  [pdf, other

    cs.RO

    Dense Incremental Metric-Semantic Map** for Multi-Agent Systems via Sparse Gaussian Process Regression

    Authors: Ehsan Zobeidi, Alec Koppel, Nikolay Atanasov

    Abstract: We develop an online probabilistic metric-semantic map** approach for mobile robot teams relying on streaming RGB-D observations. The generated maps contain full continuous distributional information about the geometric surfaces and semantic labels (e.g., chair, table, wall). Our approach is based on online Gaussian Process (GP) training and inference, and avoids the complexity of GP classificat… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.

  35. Composable Learning with Sparse Kernel Representations

    Authors: Ekaterina Tolstaya, Ethan Stump, Alec Koppel, Alejandro Ribeiro

    Abstract: We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space. We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function (NAF). This representation of the policy enables efficiently composing multiple learned models without additional training sa… ▽ More

    Submitted 29 March, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

  36. arXiv:2011.07142  [pdf, other

    stat.ML cs.CV cs.LG math.OC

    Sparse Representations of Positive Functions via First and Second-Order Pseudo-Mirror Descent

    Authors: Abhishek Chakraborty, Ketan Rajawat, Alec Koppel

    Abstract: We consider expected risk minimization problems when the range of the estimator is required to be nonnegative, motivated by the settings of maximum likelihood estimation (MLE) and trajectory optimization. To facilitate nonlinear interpolation, we hypothesize that the search space is a Reproducing Kernel Hilbert Space (RKHS). We develop first and second-order variants of stochastic mirror descent e… ▽ More

    Submitted 3 May, 2022; v1 submitted 13 November, 2020; originally announced November 2020.

  37. arXiv:2009.04950  [pdf, other

    cs.LG math.OC stat.ML

    A Markov Decision Process Approach to Active Meta Learning

    Authors: Bingjia Wang, Alec Koppel, Vikram Krishnamurthy

    Abstract: In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task, which yields well-tuned models for specific use, but does not adapt well to new contexts. By contrast, in meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously, in pursuit of greater gene… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  38. arXiv:2007.02151  [pdf, other

    cs.LG stat.ML

    Variational Policy Gradient Method for Reinforcement Learning with General Utilities

    Authors: Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang

    Abstract: In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes se… ▽ More

    Submitted 4 July, 2020; originally announced July 2020.

  39. arXiv:2007.01219  [pdf, ps, other

    eess.SP cs.LG

    Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems

    Authors: Zhan Gao, Alec Koppel, Alejandro Ribeiro

    Abstract: Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-size is required for exact asymptotic convergence with the fact that constant step-size learns faster in finite time up to an error. To do so, rather than fixing the mini-bat… ▽ More

    Submitted 9 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

  40. arXiv:2004.11094  [pdf, other

    stat.ML cs.LG math.ST

    Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck

    Authors: Alec Koppel, Hrusikesha Pradhan, Ketan Rajawat

    Abstract: Gaussian processes provide a framework for nonlinear nonparametric Bayesian inference widely applicable across science and engineering. Unfortunately, their computational burden scales cubically with the training sample size, which in the case that samples arrive in perpetuity, approaches infinity. This issue necessitates approximations for use with streaming data, which to date mostly lack conver… ▽ More

    Submitted 15 July, 2021; v1 submitted 23 April, 2020; originally announced April 2020.

  41. arXiv:2004.04843  [pdf, other

    cs.LG cs.MA eess.SY math.OC stat.ML

    Policy Gradient using Weak Derivatives for Reinforcement Learning

    Authors: Sujay Bhatt, Alec Koppel, Vikram Krishnamurthy

    Abstract: This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: 1 figure

  42. arXiv:2003.12637  [pdf, other

    eess.SP cs.IT

    Collaborative Beamforming Under Localization Errors: A Discrete Optimization Approach

    Authors: Erfaun Noorani, Yagiz Savas, Alec Koppel, John Baras, Ufuk Topcu, Brian M. Sadler

    Abstract: We consider a network of agents that locate themselves in an environment through sensor measurements and aim to transmit a message signal to a base station via collaborative beamforming. The agents' sensor measurements result in localization errors, which degrade the quality of service at the base station due to unknown phase offsets that arise in the agents' communication channels. Assuming that… ▽ More

    Submitted 17 March, 2021; v1 submitted 27 March, 2020; originally announced March 2020.

  43. arXiv:2003.10550  [pdf, other

    cs.LG cs.IT stat.ML

    Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

    Authors: Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

    Abstract: Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB… ▽ More

    Submitted 21 March, 2022; v1 submitted 23 March, 2020; originally announced March 2020.

  44. arXiv:2003.03281  [pdf, ps, other

    math.OC cs.MA cs.RO

    Asynchronous and Parallel Distributed Pose Graph Optimization

    Authors: Yulun Tian, Alec Koppel, Amrit Singh Bedi, Jonathan P. How

    Abstract: We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and map**. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers i… ▽ More

    Submitted 30 June, 2023; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: full paper with appendices

  45. arXiv:2002.12475  [pdf, other

    stat.ML cs.AI cs.LG eess.SY math.OC

    Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

    Authors: Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel

    Abstract: We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we cal… ▽ More

    Submitted 27 February, 2020; originally announced February 2020.

  46. arXiv:1910.08412  [pdf, other

    cs.LG math.OC stat.ML

    On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

    Authors: Harshat Kumar, Alec Koppel, Alejandro Ribeiro

    Abstract: Reinforcement learning, mathematically described by Markov Decision Problems, may be approached either through dynamic programming or policy search. Actor-critic algorithms combine the merits of both approaches by alternating between steps to estimate the value function and policy gradient updates. Due to the fact that the updates exhibit correlated noise and biased gradient updates, only the asym… ▽ More

    Submitted 27 January, 2023; v1 submitted 18 October, 2019; originally announced October 2019.

  47. arXiv:1909.11555  [pdf, other

    eess.SP cs.LG math.OC

    Optimally Compressed Nonparametric Online Learning

    Authors: Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler

    Abstract: Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models. To go beyond linear in the online setting, nonparametric methods are of interest due to their universality and ability to stably incorporate new information via convexity or Bayes' Rule. Unfortunately, when used online, nonparametric meth… ▽ More

    Submitted 17 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

  48. arXiv:1909.10279  [pdf, other

    math.ST cs.CC stat.CO

    Nearly Consistent Finite Particle Estimates in Streaming Importance Sampling

    Authors: Alec Koppel, Amrit Singh Bedi, Brian M. Sadler, Victor Elvira

    Abstract: In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC) sampling is usually required. {Importance sampling is a standard MC tool that approximates this unavailable distribution with a set of weighted samples.} This pr… ▽ More

    Submitted 5 April, 2021; v1 submitted 23 September, 2019; originally announced September 2019.

  49. arXiv:1909.05442  [pdf, other

    math.OC cs.LG eess.SP

    Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony

    Authors: Amrit Singh Bedi, Alec Koppel, Ketan Rajawat, Brian M. Sadler

    Abstract: An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

  50. arXiv:1908.00510  [pdf, ps, other

    math.OC eess.SP stat.ML

    Adaptive Kernel Learning in Heterogeneous Networks

    Authors: Hrusikesha Pradhan, Amrit Singh Bedi, Alec Koppel, Ketan Rajawat

    Abstract: We consider learning in decentralized heterogeneous networks: agents seek to minimize a convex functional that aggregates data across the network, while only having access to their local data streams. We focus on the case where agents seek to estimate a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). To incentivize coordination while respecting network heterog… ▽ More

    Submitted 1 June, 2021; v1 submitted 1 August, 2019; originally announced August 2019.