Skip to main content

Showing 1–42 of 42 results for author: Kalathil, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07315  [pdf, other

    eess.SY cs.AI cs.LG

    Structured Reinforcement Learning for Media Streaming at the Wireless Edge

    Authors: Archana Bura, Sarat Chandra Bobbili, Shreyas Rameshkumar, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to… ▽ More

    Submitted 16 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 15 pages, 14 figures

  2. arXiv:2402.13946  [pdf, other

    cs.LG cs.CR

    AttackGNN: Red-Teaming GNNs in Hardware Security Using Reinforcement Learning

    Authors: Vasudev Gohil, Satwik Patnaik, Dileep Kalathil, Jeyavijayan Rajendran

    Abstract: Machine learning has shown great promise in addressing several critical hardware security problems. In particular, researchers have developed novel graph neural network (GNN)-based techniques for detecting intellectual property (IP) piracy, detecting hardware Trojans (HTs), and reverse engineering circuits, to name a few. These techniques have demonstrated outstanding accuracy and have received mu… ▽ More

    Submitted 26 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: To appear in USENIX Security Symposium, 2024

  3. arXiv:2312.15340  [pdf, other

    eess.SY cs.LG

    Meta-Learning-Based Adaptive Stability Certificates for Dynamical Systems

    Authors: Amit Jena, Dileep Kalathil, Le Xie

    Abstract: This paper addresses the problem of Neural Network (NN) based adaptive stability certification in a dynamical system. The state-of-the-art methods, such as Neural Lyapunov Functions (NLFs), use NN-based formulations to assess the stability of a non-linear dynamical system and compute a Region of Attraction (ROA) in the state space. However, under parametric uncertainty, if the values of system par… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: This article has been accepted for AAAI-24 (The 38th Annual AAAI Conference on Artificial Intelligence)

  4. arXiv:2311.00226  [pdf, other

    eess.SP cs.LG

    Transformers are Provably Optimal In-context Estimators for Wireless Communications

    Authors: Vishnu Teja Kunde, Vicram Rajagopalan, Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Srinivas Shakkottai, Dileep Kalathil, Jean-Francois Chamberland

    Abstract: Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization. The canonical communication problem of estimating transmitted symbols from received observations can be modelled as an in-context learning problem: Received observations are essentially a noisy fun… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

    Comments: 13 pages, 2 figures, 2 tables, preprint; abstract, references, theory updated

  5. arXiv:2310.18434  [pdf, other

    cs.LG stat.ML

    Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 33 pages, preprint

  6. arXiv:2307.08875  [pdf, other

    cs.LG cs.RO math.OC

    Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

    Authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales… ▽ More

    Submitted 10 December, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Neurips 2023

  7. arXiv:2306.04050  [pdf, ps, other

    cs.IT cs.CL cs.LG

    LLMZip: Lossless Text Compression using Large Language Models

    Authors: Chandra Shekhara Kaushik Valmeekam, Krishna Narayanan, Dileep Kalathil, Jean-Francois Chamberland, Srinivas Shakkottai

    Abstract: We provide new estimates of an asymptotic upper bound on the entropy of English using the large language model LLaMA-7B as a predictor for the next token given a window of past tokens. This estimate is significantly smaller than currently available estimates in \cite{cover1978convergent}, \cite{lutati2023focus}. A natural byproduct is an algorithm for lossless compression of English text which com… ▽ More

    Submitted 26 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: 7 pages, 4 figures, 4 tables, preprint, added results on using LLMs with arithmetic coding

  8. arXiv:2305.03097  [pdf, other

    cs.LG cs.AI

    Federated Ensemble-Directed Offline Reinforcement Learning

    Authors: Desik Rengarajan, Nitin Ragothaman, Dileep Kalathil, Srinivas Shakkottai

    Abstract: We consider the problem of federated offline reinforcement learning (RL), a scenario under which distributed learning agents must collaboratively learn a high-quality control policy only using small pre-collected datasets generated according to different unknown behavior policies. Naively combining a standard offline RL approach with a standard federated learning approach to solve this problem can… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  9. arXiv:2303.02783  [pdf, other

    cs.LG cs.AI stat.ML

    Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning

    Authors: Zaiyan Xu, Kishan Panaganti, Dileep Kalathil

    Abstract: We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment. We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertai… ▽ More

    Submitted 20 May, 2023; v1 submitted 5 March, 2023; originally announced March 2023.

    Comments: Appeared in AISTATS 2023

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9728-9754, 2023

  10. arXiv:2302.12320  [pdf, other

    math.OC cs.LG eess.SY

    Dynamic Regret Analysis of Safe Distributed Online Optimization for Convex and Non-convex Problems

    Authors: Ting-Jui Chang, Sapana Chaudhary, Dileep Kalathil, Shahin Shahrampour

    Abstract: This paper addresses safe distributed online optimization over an unknown set of linear safety constraints. A network of agents aims at jointly minimizing a global, time-varying function, which is only partially observable to each individual agent. Therefore, agents must engage in local communications to generate a safe sequence of actions competitive with the best minimizer sequence in hindsight,… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  11. arXiv:2209.13048  [pdf, other

    cs.LG cs.RO

    Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

    Authors: Desik Rengarajan, Sapana Chaudhary, Jaewon Kim, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often assoc… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: Accepted to NeurIPS 2022; first two authors contributed equally

  12. arXiv:2209.08381  [pdf, other

    cs.RO

    Robust Reinforcement Learning Algorithm for Vision-based Ship Landing of UAVs

    Authors: Vishnu Saj, Bochan Lee, Dileep Kalathil, Moble Benedict

    Abstract: This paper addresses the problem of develo** an algorithm for autonomous ship landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs), using only a monocular camera in the UAV for tracking and localization. Ship landing is a challenging task due to the small landing space, six degrees of freedom ship deck motion, limited visual references for localization, and adv… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

  13. arXiv:2208.12878  [pdf, other

    cs.LG cs.AI cs.CR

    DETERRENT: Detecting Trojans using Reinforcement Learning

    Authors: Vasudev Gohil, Satwik Patnaik, Hao Guo, Dileep Kalathil, Jeyavijayan, Rajendran

    Abstract: Insertion of hardware Trojans (HTs) in integrated circuits is a pernicious threat. Since HTs are activated under rare trigger conditions, detecting them using random logic simulations is infeasible. In this work, we design a reinforcement learning (RL) agent that circumvents the exponential search space and returns a minimal set of patterns that is most likely to detect HTs. Experimental results o… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

    Comments: Published in 2022 Design Automation Conference (DAC)

  14. arXiv:2208.10259  [pdf, ps, other

    cs.LG eess.SY

    Meta-Learning Online Control for Linear Dynamical Systems

    Authors: Deepan Muthirayan, Dileep Kalathil, Pramod P. Khargonekar

    Abstract: In this paper, we consider the problem of finding a meta-learning online control algorithm that can learn across the tasks when faced with a sequence of $N$ (similar) control tasks. Each task involves controlling a linear dynamical system for a finite horizon of $T$ time steps. The cost function and system noise at each time step are adversarial and unknown to the controller before taking the cont… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

  15. arXiv:2208.05129  [pdf, other

    cs.LG cs.AI stat.ML

    Robust Reinforcement Learning using Offline Data

    Authors: Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh

    Abstract: The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the uncertainty in model parameters. Parameter uncertainty commonly occurs in many real-world RL applications due to simulator modeling errors, changes in the real-world system dynamics over time, and adversarial disturbances. Robust RL is typically formulated as a max-min problem, where the objective is to… ▽ More

    Submitted 18 October, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Appeared in Neural Information Processing Systems (NeurIPS) 2022

  16. arXiv:2207.07731  [pdf, other

    eess.SY cs.LG

    Distributed Learning of Neural Lyapunov Functions for Large-Scale Networked Dissipative Systems

    Authors: Amit Jena, Tong Huang, S. Sivaranjani, Dileep Kalathil, Le Xie

    Abstract: This paper considers the problem of characterizing the stability region of a large-scale networked system comprised of dissipative nonlinear subsystems, in a distributed and computationally tractable way. One standard approach to estimate the stability region of a general nonlinear system is to first find a Lyapunov function for the system and characterize its region of attraction as the stability… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  17. arXiv:2206.05357  [pdf, other

    cs.LG math.OC

    Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

    Authors: Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically inc… ▽ More

    Submitted 18 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  18. arXiv:2202.13005  [pdf, other

    cs.RO

    Intelligent Vision-based Autonomous Ship Landing of VTOL UAVs

    Authors: Bochan Lee, Vishnu Saj, Moble Benedict, Dileep Kalathil

    Abstract: The paper discusses an intelligent vision-based control solution for autonomous tracking and landing of Vertical Take-Off and Landing (VTOL) capable Unmanned Aerial Vehicles (UAVs) on ships without utilizing GPS signal. The central idea involves automating the Navy helicopter ship landing procedure where the pilot utilizes the ship as the visual reference for long-range tracking; however, refers t… ▽ More

    Submitted 17 September, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

  19. arXiv:2202.04628  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

    Authors: Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil, Srinivas Shakkottai

    Abstract: A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. T… ▽ More

    Submitted 13 February, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

  20. arXiv:2112.09865  [pdf, other

    stat.ML cs.LG

    Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

    Authors: Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep Pati, Bani Mallick

    Abstract: We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algori… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: 23 pages, 6 figures, manuscript under review

  21. arXiv:2112.01506  [pdf, other

    cs.LG stat.ML

    Sample Complexity of Robust Reinforcement Learning with a Generative Model

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an… ▽ More

    Submitted 14 May, 2022; v1 submitted 2 December, 2021; originally announced December 2021.

    Comments: Published in the International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

  22. arXiv:2112.00885  [pdf, other

    cs.LG cs.AI

    DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning

    Authors: Archana Bura, Aria HasanzadeZonuzy, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland

    Abstract: Safe reinforcement learning is extremely challenging--not only must the agent explore an unknown environment, it must do so while ensuring no safety constraint violations. We formulate this safe reinforcement learning (RL) problem using the framework of a finite-horizon Constrained Markov Decision Process (CMDP) with an unknown transition probability function, where we model the safety requirement… ▽ More

    Submitted 17 October, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted to NeurIPS 2022

  23. arXiv:2111.15041  [pdf, ps, other

    cs.LG eess.SY

    Online Learning for Predictive Control with Provable Regret Guarantees

    Authors: Deepan Muthirayan, Jianjun Yuan, Dileep Kalathil, Pramod P. Khargonekar

    Abstract: We study the problem of online learning in predictive control of an unknown linear dynamical system with time varying cost functions which are unknown apriori. Specifically, we study the online learning problem where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions. The… ▽ More

    Submitted 31 October, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

  24. arXiv:2111.07430  [pdf, other

    cs.LG math.OC

    Safe Online Convex Optimization with Unknown Linear Safety Constraints

    Authors: Sapana Chaudhary, Dileep Kalathil

    Abstract: We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of actions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has acc… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: 18 pages

  25. arXiv:2111.00552  [pdf, other

    cs.LG cs.AI math.OC

    Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

    Authors: Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) a… ▽ More

    Submitted 3 February, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

  26. arXiv:2110.02332  [pdf, ps, other

    cs.RO cs.LG

    OTTR: Off-Road Trajectory Tracking using Reinforcement Learning

    Authors: Akhil Nagariya, Dileep Kalathil, Srikanth Saripalli

    Abstract: In this work, we present a novel Reinforcement Learning (RL) algorithm for the off-road trajectory tracking problem. Off-road environments involve varying terrain types and elevations, and it is difficult to model the interaction dynamics of specific off-road vehicles with such a diverse and complex environment. Standard RL policies trained on a simulator will fail to operate in such challenging r… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  27. arXiv:2106.02684  [pdf, ps, other

    cs.LG

    Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

    Authors: Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

    Abstract: We address the issue of safety in reinforcement learning. We pose the problem in an episodic framework of a constrained Markov decision process. Existing results have shown that it is possible to achieve a reward regret of $\tilde{\mathcal{O}}(\sqrt{K})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{K})$ constraint violation in $K$ episodes. A critical question that arises is whether it is possible… ▽ More

    Submitted 24 January, 2023; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Appear in NeurIPS 2021. Revise Algorithm 2 and proof of Lemma 5.6

  28. arXiv:2008.05699  [pdf, other

    cs.RO eess.SY

    A Vision-Based Control Method for Autonomous Landing of Vertical Flight Aircraft On a Moving Platform Without Using GPS

    Authors: Bochan Lee, Vishnu Saj, Moble Benedict, Dileep Kalathil

    Abstract: The paper discusses a novel vision-based estimation and control approach to enable fully autonomous tracking and landing of vertical take-off and landing (VTOL) capable unmanned aerial vehicles (UAVs) on moving platforms without relying on a GPS signal. A unique feature of the present method is that it accomplishes this task without tracking the landing pad itself; however, by utilizing a standard… ▽ More

    Submitted 16 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: Presented at the VFS International 76th Annual Forum & Technology Display, October 6-8, 2020. Submitted to the Journal of Guidance, Control, and Dynamics(under review)

  29. arXiv:2008.00311  [pdf, other

    cs.LG stat.ML

    Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs

    Authors: Aria HasanzadeZonuzy, Archana Bura, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goa… ▽ More

    Submitted 1 March, 2021; v1 submitted 1 August, 2020; originally announced August 2020.

  30. arXiv:2006.11683  [pdf, other

    math.OC cs.GT cs.LG cs.MA

    Reinforcement Learning for Mean Field Games with Strategic Complementarities

    Authors: Kiyeob Lee, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai

    Abstract: Mean Field Games (MFG) are the class of games with a very large number of agents and the standard equilibrium concept is a Mean Field Equilibrium (MFE). Algorithms for learning MFE in dynamic MFGs are unknown in general. Our focus is on an important subclass that possess a monotonicity property called Strategic Complementarities (MFG-SC). We introduce a natural refinement to the equilibrium concep… ▽ More

    Submitted 1 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

  31. arXiv:2006.11608  [pdf, other

    cs.LG eess.SY stat.ML

    Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-… ▽ More

    Submitted 11 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: 26 pages, 12 figures, 2 tables

  32. arXiv:2004.00472  [pdf, other

    cs.NI eess.SY

    Learning to Cache and Caching to Learn: Regret Analysis of Caching Algorithms

    Authors: Archana Bura, Desik Rengarajan, Dileep Kalathil, Srinivas Shakkottai, Jean-Francois Chamberland-Tremblay

    Abstract: Crucial performance metrics of a caching algorithm include its ability to quickly and accurately learn a popularity distribution of requests. However, a majority of work on analytical performance analysis focuses on hit probability after an asymptotically large time has elapsed. We consider an online learning viewpoint, and characterize the "regret" in terms of the finite time difference between t… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

  33. Bounded Regret for Finitely Parameterized Multi-Armed Bandits

    Authors: Kishan Panaganti, Dileep Kalathil

    Abstract: We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call Finitely… ▽ More

    Submitted 7 November, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 15 pages, 7 figures, Reinforcement Learning, Multi-armed Bandits, Sequential Decision Making

  34. arXiv:2002.07368  [pdf, other

    math.OC cs.RO

    D2C 2.0: Decoupled Data-Based Approach for Learning to Control Stochastic Nonlinear Systems via Model-Free ILQR

    Authors: Karthikeya S Parunandi, Aayushman Sharma, Suman Chakravorty, Dileep Kalathil

    Abstract: In this paper, we propose a structured linear parameterization of a feedback policy to solve the model-free stochastic optimal control problem. This parametrization is corroborated by a decoupling principle that is shown to be near-optimal under a small noise assumption, both in theory and by empirical analyses. Further, we incorporate a model-free version of the Iterative Linear Quadratic Regulat… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  35. arXiv:1904.08361  [pdf, other

    cs.LG cs.RO eess.SY stat.ML

    Decoupled Data Based Approach for Learning to Control Nonlinear Dynamical Systems

    Authors: Ran Wang, Karthikeya Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty

    Abstract: This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. This class of problems are typically addressed in stochastic adaptive control and reinforcement learning literature using model-based and model-free approaches respectively. Both methods rely on solving a dyna… ▽ More

    Submitted 17 April, 2019; originally announced April 2019.

  36. arXiv:1901.00959  [pdf, other

    cs.LG eess.IV stat.ML

    QFlow: A Learning Approach to High QoE Video Streaming at the Wireless Edge

    Authors: Rajarshi Bhattacharyya, Archana Bura, Desik Rengarajan, Mason Rumuly, Bainan Xia, Srinivas Shakkottai, Dileep Kalathil, Ricky K. P. Mok, Amogh Dhamdhere

    Abstract: The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adopt… ▽ More

    Submitted 13 May, 2020; v1 submitted 3 January, 2019; originally announced January 2019.

    Comments: Submitted to ToN in May, 2020

  37. arXiv:1801.00825  [pdf, other

    cs.NI

    FlowBazaar: A Market-Mediated Software Defined Communications Ecosystem at the Wireless Edge

    Authors: Rajarshi Bhattacharyya, Bainan Xia, Desik Rengarajan, Srinivas Shakkottai, Dileep Kalathil

    Abstract: The predominant use of wireless access networks is for media streaming applications, which are only gaining popularity as ever more devices become available for this purpose. However, current access networks treat all packets identically, and lack the agility to determine which clients are most in need of service at a given time. Software reconfigurability of networking devices has seen wide adopt… ▽ More

    Submitted 23 January, 2019; v1 submitted 2 January, 2018; originally announced January 2018.

    Comments: Submitted to WiOpt, 2019

  38. arXiv:1712.07742  [pdf, other

    cs.GT math.OC

    Mechanism Design for Demand Response Programs

    Authors: Deepan Muthirayan, Dileep Kalathil, Kameshwar Poolla, Pravin Varaiya

    Abstract: Demand Response (DR) programs serve to reduce the consumption of electricity at times when the supply is scarce and expensive. The utility informs the aggregator of an anticipated DR event. The aggregator calls on a subset of its pool of recruited agents to reduce their electricity use. Agents are paid for reducing their energy consumption from contractually established baselines. Baselines are co… ▽ More

    Submitted 29 April, 2019; v1 submitted 20 December, 2017; originally announced December 2017.

  39. arXiv:1505.00553  [pdf, other

    stat.ML cs.LG

    On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits

    Authors: Naumaan Nayyar, Dileep Kalathil, Rahul Jain

    Abstract: We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an unknown distribution. The objective is to design a policy that maximiz… ▽ More

    Submitted 1 December, 2016; v1 submitted 4 May, 2015; originally announced May 2015.

  40. arXiv:1412.0180  [pdf, other

    math.OC cs.LG

    Empirical Q-Value Iteration

    Authors: Dileep Kalathil, Vivek S. Borkar, Rahul Jain

    Abstract: We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown. Unlike the classical learning algorithms for MDPs, such as Q-learning and actor-critic algorithms, this algorithm doesn't depend on a stochastic approximation-based method. We show that our algorithm, which we call the e… ▽ More

    Submitted 29 January, 2019; v1 submitted 30 November, 2014; originally announced December 2014.

  41. arXiv:1411.0728  [pdf, ps, other

    cs.LG cs.GT eess.SY math.OC

    Approachability in Stackelberg Stochastic Games with Vector Costs

    Authors: Dileep Kalathil, Vivek Borkar, Rahul Jain

    Abstract: The notion of approachability was introduced by Blackwell [1] in the context of vector-valued repeated games. The famous Blackwell's approachability theorem prescribes a strategy for approachability, i.e., for `steering' the average cost of a given agent towards a given target set, irrespective of the strategies of the other agents. In this paper, motivated by the multi-objective optimization/deci… ▽ More

    Submitted 20 June, 2016; v1 submitted 3 November, 2014; originally announced November 2014.

    Comments: 18 Pages, Submitted to Dynamic Games and Applications

  42. arXiv:1206.3582  [pdf, other

    math.OC cs.LG eess.SY

    Decentralized Learning for Multi-player Multi-armed Bandits

    Authors: Dileep Kalathil, Naumaan Nayyar, Rahul Jain

    Abstract: We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. We consider both i.i.d. reward model and Markovian reward model. In the i.i.d. model each arm is modelled as an i.i.d. process with an unknown distribution with an unknown mean. In the Markovian mod… ▽ More

    Submitted 14 June, 2012; originally announced June 2012.

    Comments: 33 pages, 3 figures. Submitted to IEEE Transactions on Information Theory