Skip to main content

Showing 1–15 of 15 results for author: Kumar, P R

Searching in archive math. Search in all archives.
.
  1. arXiv:2405.02769  [pdf, other

    cs.LG cs.MA math.OC

    Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization

    Authors: Youbang Sun, Tao Liu, P. R. Kumar, Shahin Shahrampour

    Abstract: This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning. In this work, agents are assumed to have access to an oracle with exact policy evaluation and seek to maximize their respective independent rewards. Each individual's reward is assumed to depend on the actions of all the agents in the multi-agent system, leading t… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  2. arXiv:2310.09727  [pdf, other

    cs.LG math.OC

    Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

    Authors: Youbang Sun, Tao Liu, Ruida Zhou, P. R. Kumar, Shahin Shahrampour

    Abstract: This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $ε$-Nash Equilibrium (NE) within… ▽ More

    Submitted 27 October, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: Will appear in NeurIPS 2023

  3. arXiv:2307.08875  [pdf, other

    cs.LG cs.RO math.OC

    Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

    Authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales… ▽ More

    Submitted 10 December, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Neurips 2023

  4. arXiv:2206.05357  [pdf, other

    cs.LG math.OC

    Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

    Authors: Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically inc… ▽ More

    Submitted 18 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  5. arXiv:2201.10542  [pdf, other

    math.OC cs.LG eess.SY

    Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

    Authors: Akshay Mete, Rahul Singh, P. R. Kumar

    Abstract: We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It sim… ▽ More

    Submitted 24 March, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=7pNV4PCjbQy

  6. arXiv:2111.00552  [pdf, other

    cs.LG cs.AI math.OC

    Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

    Authors: Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) a… ▽ More

    Submitted 3 February, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

  7. arXiv:2003.09596  [pdf, ps, other

    cs.LG cs.IT math.DS stat.ML

    Learning in Networked Control Systems

    Authors: Rahul Singh, P. R. Kumar

    Abstract: We design adaptive controller (learning rule) for a networked control system (NCS) in which data packets containing control information are transmitted across a lossy wireless channel. We propose Upper Confidence Bounds for Networked Control Systems (UCB-NCS), a learning rule that maintains confidence intervals for the estimates of plant parameters $(A_{(\star)},B_{(\star)})$, and channel reliabil… ▽ More

    Submitted 21 March, 2020; originally announced March 2020.

    Comments: Submitted to CDC and LCSS (http://ieee-cssletters.dei.unipd.it/index.php)

  8. arXiv:1903.00988  [pdf, other

    math.OC

    Optimal Control of Thermostatic Loads for Planning Aggregate Consumption: Characterization of Solution and Explicit Strategies

    Authors: Fernando A. C. C. Fontes, Abhishek Halder, Jorge Becerril, P. R. Kumar

    Abstract: We consider the problem of planning the aggregate energy consumption for a set of thermostatically controlled loads for demand response, accounting price forecast trajectory and thermal comfort constraints. We address this as a continuous-time optimal control problem, and analytically characterize the structure of its solution in the general case. In the special case when the price forecast is mon… ▽ More

    Submitted 8 May, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: Submitted to IEEE Conference on Decision and Control L-CSS+CDC 2019. An appendix with proof of intermediate results, not present in the version submitted to L-CSS+CDC for lack of space, is added here for completeness and for reviewers convenience

  9. arXiv:1902.07418  [pdf, other

    cs.NI math.OC

    Optimal Decentralized Dynamic Policies for Video Streaming over Wireless Channels

    Authors: Rahul Singh, P. R. Kumar

    Abstract: The problem addressed is that of optimally controlling, in a decentralized fashion, the download of mobile video, which is expected to comprise 75 % of total mobile data traffic by 2020. The server can dynamically choose which packets to download to clients, from among several packets which encode their videos at different resolutions, as well as the power levels of their transmissions. This allow… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

  10. arXiv:1609.07229  [pdf, other

    math.OC

    Optimal Power Consumption for Demand Response of Thermostatically Controlled Loads

    Authors: Abhishek Halder, Xinbo Geng, Fernando A. C. C. Fontes, P. R. Kumar, Le Xie

    Abstract: We consider the problem of determining the optimal aggregate power consumption of a population of thermostatically controlled loads. This is motivated by the problem of synthesizing the demand response for a load serving entity (LSE) serving a population of such customers. We show how the LSE can opportunistically design the aggregate reference consumption to minimize its energy procurement cost,… ▽ More

    Submitted 25 June, 2018; v1 submitted 23 September, 2016; originally announced September 2016.

    Comments: 17 pages

  11. arXiv:1608.03013  [pdf, other

    cs.RO math.OC

    Belief Space Planning Simplified: Trajectory-Optimized LQG (T-LQG) (Extended Report)

    Authors: Mohammadhussein Rafieisakhaei, Suman Chakravorty, P. R. Kumar

    Abstract: Planning under motion and observation uncertainties requires solution of a stochastic control problem in the space of feedback policies. In this paper, we reduce the general (n^2+n)-dimensional belief space planning problem to an (n)-dimensional problem by obtaining a Linear Quadratic Gaussian (LQG) design with the best nominal performance. Then, by taking the underlying trajectory of the LQG cont… ▽ More

    Submitted 11 August, 2016; v1 submitted 9 August, 2016; originally announced August 2016.

    Comments: 20 pages, 12 figures. A shorter version has been submitted to WAFR, 2016

  12. arXiv:1606.09564  [pdf, other

    eess.SY math.OC

    Architecture and Algorithms for Privacy Preserving Thermal Inertial Load Management by A Load Serving Entity

    Authors: Abhishek Halder, Xinbo Geng, P. R. Kumar, Le Xie

    Abstract: Motivated by the growing importance of demand response in modern power system's operations, we propose an architecture and supporting algorithms for privacy preserving thermal inertial load management as a service provided by the load serving entity (LSE). We focus on an LSE managing a population of its customers' air conditioners, and propose a contractual model where the LSE guarantees quality o… ▽ More

    Submitted 29 November, 2016; v1 submitted 30 June, 2016; originally announced June 2016.

  13. arXiv:1606.08741  [pdf, other

    eess.SY math.DS

    Dynamic Watermarking: Active Defense of Networked Cyber-Physical Systems

    Authors: Bharadwaj Satchidanandan, P. R. Kumar

    Abstract: The coming decades may see the large scale deployment of networked cyber-physical systems to address global needs in areas such as energy, water, healthcare, and transportation. However, as recent events have shown, such systems are vulnerable to cyber attacks. Being safety critical, their disruption or misbehavior can cause economic losses or injuries and loss of life. It is therefore important t… ▽ More

    Submitted 27 June, 2016; originally announced June 2016.

  14. arXiv:1606.01608  [pdf, other

    cs.NI eess.SY math.OC

    Throughput Optimal Decentralized Scheduling of Multi-Hop Networks with End-to-End Deadline Constraints: Unreliable Links

    Authors: Rahul Singh, P. R. Kumar

    Abstract: We consider unreliable multi-hop networks serving multiple flows in which packets not delivered to their destination nodes by their deadlines are dropped. We address the design of policies for routing and scheduling packets that optimize any specified weighted average of the throughputs of the flows. We provide a new approach which directly yields an optimal distributed scheduling policy that atta… ▽ More

    Submitted 5 June, 2016; originally announced June 2016.

    MSC Class: 93E03; 49N30; 49N15; 60K25 ACM Class: C.2.1

    Journal ref: EEE Transactions on Automatic Control, vol. 64, no. 1, pp. 127-142, Jan. 2019

  15. arXiv:1605.08926  [pdf, other

    math.OC

    Decentralized Control via Dynamic Stochastic Prices: The Independent System Operator Problem

    Authors: Rahul Singh, P. R. Kumar, Le Xie

    Abstract: A smart grid connects wind or solar or storage farms, fossil fuel plants, industrialor commercial loads, or load serving entities, modeled as stochastic dynamical systems. In each time period, they consume or supply electrical energy, with the constraint that total generation equals consumption. Each agent's utility is either the benefit accrued from consumption, or negative of generation cost. Th… ▽ More

    Submitted 29 June, 2016; v1 submitted 28 May, 2016; originally announced May 2016.

    MSC Class: 91B50; 93E03