Skip to main content

Showing 1–4 of 4 results for author: Alatur, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.15776  [pdf, other

    cs.LG stat.ML

    Truly No-Regret Learning in Constrained MDPs

    Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

    Abstract: Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in a… ▽ More

    Submitted 18 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  2. arXiv:2306.07749  [pdf, other

    cs.LG cs.GT cs.MA

    Provably Learning Nash Policies in Constrained Markov Potential Games

    Authors: Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause

    Abstract: Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collision… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 30 pages

  3. arXiv:2306.07001  [pdf, ps, other

    cs.LG stat.ML

    Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

    Authors: Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He

    Abstract: Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one… ▽ More

    Submitted 30 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  4. arXiv:1902.08036  [pdf, other

    cs.LG stat.ML

    Multi-Player Bandits: The Adversarial Case

    Authors: Pragnya Alatur, Kfir Y. Levy, Andreas Krause

    Abstract: We consider a setting where multiple players sequentially choose among a common set of actions (arms). Motivated by a cognitive radio networks application, we assume that players incur a loss upon colliding, and that communication between players is not possible. Existing approaches assume that the system is stationary. Yet this assumption is often violated in practice, e.g., due to signal strengt… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.