Search | arXiv e-print repository

Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces

Authors: Perusha Moodley, Pramod Kaushik, Dhillu Thambi, Mark Trovinger, Praveen Paruchuri, Xia Hong, Benjamin Rosman

Abstract: Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good repres… ▽ More Decision Transformers, in their vanilla form, struggle to perform on image-based environments with multi-discrete action spaces. Although enhanced Decision Transformer architectures have been developed to improve performance, these methods have not specifically addressed this problem of multi-discrete action spaces which hampers existing Decision Transformer architectures from learning good representations. To mitigate this, we propose Multi-State Action Tokenisation (M-SAT), an approach for tokenising actions in multi-discrete action spaces that enhances the model's performance in such environments. Our approach involves two key changes: disentangling actions to the individual action level and tokenising the actions with auxiliary state information. These two key changes also improve individual action level interpretability and visibility within the attention layers. We demonstrate the performance gains of M-SAT on challenging ViZDoom environments with multi-discrete action spaces and image-based state spaces, including the Deadly Corridor and My Way Home scenarios, where M-SAT outperforms the baseline Decision Transformer without any additional data or heavy computational overheads. Additionally, we find that removing positional encoding does not adversely affect M-SAT's performance and, in some cases, even improves it. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19626 [pdf, other]

Safety through feedback in Constrained RL

Authors: Shashank Reddy Chirra, Pradeep Varakantham, Praveen Paruchuri

Abstract: In safety-critical RL settings, the inclusion of an additional cost function is often favoured over the arduous task of modifying the reward function to ensure the agent's safe behaviour. However, designing or evaluating such a cost function can be prohibitively expensive. For instance, in the domain of self-driving, designing a cost function that encompasses all unsafe behaviours (e.g. aggressive… ▽ More In safety-critical RL settings, the inclusion of an additional cost function is often favoured over the arduous task of modifying the reward function to ensure the agent's safe behaviour. However, designing or evaluating such a cost function can be prohibitively expensive. For instance, in the domain of self-driving, designing a cost function that encompasses all unsafe behaviours (e.g. aggressive lane changes) is inherently complex. In such scenarios, the cost function can be learned from feedback collected offline in between training rounds. This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends to beyond state-level feedback, thus, reducing the burden on the evaluator. Inferring the cost function in such settings poses challenges, particularly in assigning credit to individual states based on trajectory-level feedback. To address this, we propose a surrogate objective that transforms the problem into a state-level supervised classification task with noisy labels, which can be solved efficiently. Additionally, it is often infeasible to collect feedback on every trajectory generated by the agent, hence, two fundamental questions arise: (1) Which trajectories should be presented to the human? and (2) How many trajectories are necessary for effective learning? To address these questions, we introduce \textit{novelty-based sampling} that selectively involves the evaluator only when the the agent encounters a \textit{novel} trajectory. We showcase the efficiency of our method through experimentation on several benchmark Safety Gymnasium environments and realistic self-driving scenarios. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2307.12661 [pdf, other]

Algorithmic construction of Lyapunov functions for continuous vector fields via convex semi-infinite programs

Authors: Raavi Gupta, Sameep Chattopadhyay, Pradyumna Paruchuri, Debasish Chatterjee

Abstract: This article presents a novel numerically tractable technique for synthesizing Lyapunov functions for equilibria of nonlinear vector fields. In broad strokes, corresponding to an isolated equilibrium point of a given vector field, a selection is made of a compact neighborhood of the equilibrium and a dictionary of functions in which a Lyapunov function is expected to lie. Then an algorithmic proce… ▽ More This article presents a novel numerically tractable technique for synthesizing Lyapunov functions for equilibria of nonlinear vector fields. In broad strokes, corresponding to an isolated equilibrium point of a given vector field, a selection is made of a compact neighborhood of the equilibrium and a dictionary of functions in which a Lyapunov function is expected to lie. Then an algorithmic procedure based on the recent work [DACC22] is deployed on the preceding neighborhood-dictionary pair and charged with the task of finding a function satisfying a compact family of inequalities that defines the behavior of a Lyapunov function on the selected neighborhood. The technique applies to continuous nonlinear vector fields without special algebraic structures and does not even require their analytical expressions to proceed. Several numerical examples are presented to illustrate our results. △ Less

Submitted 25 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: 29 pages. Submitted

MSC Class: 93D05; 93D20; 65K05; 65P40

arXiv:2307.01304 [pdf, other]

A numerical algorithm for attaining the Chebyshev bound in optimal learning

Authors: Pradyumna Paruchuri, Debasish Chatterjee

Abstract: Given a compact subset of a Banach space, the Chebyshev center problem consists of finding a minimal circumscribing ball containing the set. In this article we establish a numerically tractable algorithm for solving the Chebyshev center problem in the context of optimal learning from a finite set of data points. For a hypothesis space realized as a compact but not necessarily convex subset of a fi… ▽ More Given a compact subset of a Banach space, the Chebyshev center problem consists of finding a minimal circumscribing ball containing the set. In this article we establish a numerically tractable algorithm for solving the Chebyshev center problem in the context of optimal learning from a finite set of data points. For a hypothesis space realized as a compact but not necessarily convex subset of a finite-dimensional subspace of some underlying Banach space, this algorithm computes the Chebyshev radius and the Chebyshev center of the hypothesis space, thereby solving the problem of optimal recovery of functions from data. The algorithm itself is based on, and significantly extends, recent results for near-optimal solutions of convex semi-infinite problems by means of targeted sampling, and it is of independent interest. Several examples of numerical computations of Chebyshev centers are included in order to illustrate the effectiveness of the algorithm. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 22 pages, 16 figures

arXiv:2302.14442 [pdf, other]

City-scale Pollution Aware Traffic Routing by Sampling Max Flows using MCMC

Authors: Shreevignesh Suriyanarayanan, Praveen Paruchuri, Girish Varma

Abstract: A significant cause of air pollution in urban areas worldwide is the high volume of road traffic. Long-term exposure to severe pollution can cause serious health issues. One approach towards tackling this problem is to design a pollution-aware traffic routing policy that balances multiple objectives of i) avoiding extreme pollution in any area ii) enabling short transit times, and iii) making effe… ▽ More A significant cause of air pollution in urban areas worldwide is the high volume of road traffic. Long-term exposure to severe pollution can cause serious health issues. One approach towards tackling this problem is to design a pollution-aware traffic routing policy that balances multiple objectives of i) avoiding extreme pollution in any area ii) enabling short transit times, and iii) making effective use of the road capacities. We propose a novel sampling-based approach for this problem. We provide the first construction of a Markov Chain that can sample integer max flow solutions of a planar graph, with theoretical guarantees that the probabilities depend on the aggregate transit length. We designed a traffic policy using diverse samples and simulated traffic on real-world road maps using the SUMO traffic simulator. We observe a considerable decrease in areas with severe pollution when experimented with maps of large cities across the world compared to other approaches. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted in AAAI 2023 (AI for Social Impact Track)

arXiv:2301.09892 [pdf, other]

Learning Effective Strategies for Moving Target Defense with Switching Costs

Authors: Vignesh Viswanathan, Megha Bose, Praveen Paruchuri

Abstract: Moving Target Defense (MTD) has emerged as a key technique in various security applications as it takes away the attacker's ability to perform reconnaissance for exploiting a system's vulnerabilities. However, most of the existing research in the field assumes unrealistic access to information about the attacker's motivations and/or actions when develo** MTD strategies. Many of the existing appr… ▽ More Moving Target Defense (MTD) has emerged as a key technique in various security applications as it takes away the attacker's ability to perform reconnaissance for exploiting a system's vulnerabilities. However, most of the existing research in the field assumes unrealistic access to information about the attacker's motivations and/or actions when develo** MTD strategies. Many of the existing approaches also assume complete knowledge regarding the vulnerabilities of a system and how each of these vulnerabilities can be exploited by an attacker. In this work, we aim to create algorithms that generate effective Moving Target Defense strategies that do not rely on prior knowledge about the attackers. Our work assumes that the only way the defender receives information about its own reward is via interaction with the attacker in a repeated game setting. Depending on the amount of information that can be obtained from the interactions, we devise two different algorithms using multi-armed bandit formulation to identify efficient strategies. We then evaluate our algorithms using data mined from the National Vulnerability Database to showcase that they match the performance of the state-of-the-art techniques, despite using a lot less amount of information. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2201.10127 [pdf, other]

Multi-unit Double Auctions: Equilibrium Analysis and Bidding Strategy using DDPG in Smart-grids

Authors: Sanjay Chandlekar, Easwar Subramanian, Sanjay Bhat, Praveen Paruchuri, Sujit Gujar

Abstract: Periodic double auctions (PDA) have applications in many areas such as in e-commerce, intra-day equity markets, and day-ahead energy markets in smart-grids. While the trades accomplished using PDAs are worth trillions of dollars, finding a reliable bidding strategy in such auctions is still a challenge as it requires the consideration of future auctions. A participating buyer in a PDA has to desig… ▽ More Periodic double auctions (PDA) have applications in many areas such as in e-commerce, intra-day equity markets, and day-ahead energy markets in smart-grids. While the trades accomplished using PDAs are worth trillions of dollars, finding a reliable bidding strategy in such auctions is still a challenge as it requires the consideration of future auctions. A participating buyer in a PDA has to design its bidding strategy by planning for current and future auctions. Many equilibrium-based bidding strategies proposed are complex to use in real-time. In the current exposition, we propose a scale-based bidding strategy for buyers participating in PDA. We first present an equilibrium analysis for single-buyer single-seller multi-unit single-shot k-Double auctions. Specifically, we analyze the situation when a seller and a buyer trade two identical units of quantity in a double auction where both the buyer and the seller deploy a simple, scale-based bidding strategy. The equilibrium analysis becomes intractable as the number of participants increases. To be useful in more complex settings such as wholesale markets in smart-grids, we model equilibrium bidding strategy as a learning problem. We develop a deep deterministic policy gradient (DDPG) based learning strategy, DDPGBBS, for a participating agent in PDAs to suggest an action at any auction instance. DDPGBBS, which empirically follows the obtained theoretical equilibrium, is easily extendable when the number of buyers/sellers increases. We take Power Trading Agent Competition's (PowerTAC) wholesale market PDA as a testbed to evaluate our novel bidding strategy. We benchmark our DDPG based strategy against several baselines and state-of-the-art bidding strategies of the PowerTAC wholesale market PDA and demonstrate the efficacy of DDPGBBS against several benchmarked strategies. △ Less

Submitted 22 February, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: Accepted for publication in the proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS-22)

arXiv:2112.05495 [pdf, other]

How Private Is Your RL Policy? An Inverse RL Based Analysis Framework

Authors: Kritika Prakash, Fiza Husain, Praveen Paruchuri, Sujit P. Gujar

Abstract: Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteratio… ▽ More Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using an Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using this framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on multiple instances of the FrozenLake domain of varying complexities. Based on the analysis performed, we infer a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We do so by quantifying the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: 15 pages, 7 figures, 5 tables, version accepted at AAAI 2022

arXiv:1911.08260 [pdf, other]

Bidding in Smart Grid PDAs: Theory, Analysis and Strategy (Extended Version)

Authors: Susobhan Ghosh, Sujit Gujar, Praveen Paruchuri, Easwar Subramanian, Sanjay P. Bhat

Abstract: Periodic Double Auctions (PDAs) are commonly used in the real world for trading, e.g. in stock markets to determine stock opening prices, and energy markets to trade energy in order to balance net demand in smart grids, involving trillions of dollars in the process. A bidder, participating in such PDAs, has to plan for bids in the current auction as well as for the future auctions, which highlight… ▽ More Periodic Double Auctions (PDAs) are commonly used in the real world for trading, e.g. in stock markets to determine stock opening prices, and energy markets to trade energy in order to balance net demand in smart grids, involving trillions of dollars in the process. A bidder, participating in such PDAs, has to plan for bids in the current auction as well as for the future auctions, which highlights the necessity of good bidding strategies. In this paper, we perform an equilibrium analysis of single unit single-shot double auctions with a certain clearing price and payment rule, which we refer to as ACPR, and find it intractable to analyze as number of participating agents increase. We further derive the best response for a bidder with complete information in a single-shot double auction with ACPR. Leveraging the theory developed for single-shot double auction and taking the PowerTAC wholesale market PDA as our testbed, we proceed by modeling the PDA of PowerTAC as an MDP. We propose a novel bidding strategy, namely MDPLCPBS. We empirically show that MDPLCPBS follows the equilibrium strategy for double auctions that we previously analyze. In addition, we benchmark our strategy against the baseline and the state-of-the-art bidding strategies for the PowerTAC wholesale market PDAs, and show that MDPLCPBS outperforms most of them consistently. △ Less

Submitted 23 November, 2019; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted for publication in the proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

arXiv:1901.06562 [pdf, ps, other]

doi 10.1016/j.automatica.2019.06.009

Discrete time optimal control with frequency constraints for non-smooth systems

Authors: Shruti Kotpalliwar, Pradyumna Paruchuri, Debasish Chatterjee, Ravi Banavar

Abstract: We present a Pontryagin maximum principle for discrete time optimal control problems with (a) pointwise constraints on the control actions and the states, (b) frequency constraints on the control and the state trajectories, and (c) nonsmooth dynamical systems. Pointwise constraints on the states and the control actions represent desired and/or physical limitations on the states and the control val… ▽ More We present a Pontryagin maximum principle for discrete time optimal control problems with (a) pointwise constraints on the control actions and the states, (b) frequency constraints on the control and the state trajectories, and (c) nonsmooth dynamical systems. Pointwise constraints on the states and the control actions represent desired and/or physical limitations on the states and the control values; such constraints are important and are widely present in the optimal control literature. Constraints of the type (b), while less standard in the literature, effectively serve the purpose of describing important spectral properties of inertial actuators and systems. The conjunction of constraints of the type (a) and (b) is a relatively new phenomenon in optimal control but are important for the synthesis control trajectories with a high degree of fidelity. The maximum principle established here provides first order necessary conditions for optimality that serve as a starting point for the synthesis of control trajectories corresponding to a large class of constrained motion planning problems that have high accuracy in a computationally tractable fashion. Moreover, the ability to handle a reasonably large class of nonsmooth dynamical systems that arise in practice ensures broad applicability our theory, and we include several illustrations of our results on standard problems. △ Less

Submitted 27 March, 2019; v1 submitted 19 January, 2019; originally announced January 2019.

arXiv:1803.03052 [pdf, ps, other]

A frequency-constrained geometric Pontryagin maximum principle on matrix Lie groups

Authors: Shruti Kotpalliwar, Pradyumna Paruchuri, Karmvir Singh Phogat, Debasish Chatterjee, Ravi Banavar

Abstract: In this article we present a geometric discrete-time Pontryagin maximum principle (PMP) on matrix Lie groups that incorporates frequency constraints on the controls in addition to pointwise constraints on the states and control actions directly at the stage of the problem formulation. This PMP gives first order necessary conditions for optimality, and leads to two-point boundary value problems tha… ▽ More In this article we present a geometric discrete-time Pontryagin maximum principle (PMP) on matrix Lie groups that incorporates frequency constraints on the controls in addition to pointwise constraints on the states and control actions directly at the stage of the problem formulation. This PMP gives first order necessary conditions for optimality, and leads to two-point boundary value problems that may be solved by shooting techniques to arrive at optimal trajectories. We validate our theoretical results with a numerical experiment on the attitude control of a spacecraft on the Lie group SO(3). △ Less

Submitted 27 March, 2019; v1 submitted 8 March, 2018; originally announced March 2018.

arXiv:1708.04419 [pdf, ps, other]

doi 10.1109/TAC.2019.2893160

Discrete time Pontryagin maximum principle for optimal control problems under state-action-frequency constraints

Authors: Pradyumna Paruchuri, Debasish Chatterjee

Abstract: We establish a Pontryagin maximum principle for discrete time optimal control problems under the following three types of constraints: a) constraints on the states pointwise in time, b) constraints on the control actions pointwise in time, and c) constraints on the frequency spectrum of the optimal control trajectories. While the first two types of constraints are already included in the existing… ▽ More We establish a Pontryagin maximum principle for discrete time optimal control problems under the following three types of constraints: a) constraints on the states pointwise in time, b) constraints on the control actions pointwise in time, and c) constraints on the frequency spectrum of the optimal control trajectories. While the first two types of constraints are already included in the existing versions of the Pontryagin maximum principle, it turns out that the third type of constraints cannot be recast in any of the standard forms of the existing results for the original control system. We provide two different proofs of our Pontryagin maximum principle in this article, and include several special cases fine-tuned to control-affine nonlinear and linear system models. In particular, for minimization of quadratic cost functions and linear time invariant control systems, we provide tight conditions under which the optimal controls under frequency constraints are either normal or abnormal. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: 31 pages

MSC Class: 49K21

Showing 1–12 of 12 results for author: Paruchuri, P