Skip to main content

Showing 1–25 of 25 results for author: A, P L

.
  1. arXiv:2310.18743  [pdf, other

    cs.LG

    Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

    Authors: Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

    Abstract: We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR. Next, in the context of UBSR optimization, we derive an expression for the UBSR gradient under a smooth p… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

  2. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  3. arXiv:2210.05918  [pdf, ps, other

    cs.LG cs.AI eess.SY stat.ML

    Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

    Authors: Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

    Abstract: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges… ▽ More

    Submitted 11 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Journal ref: Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 2023

  4. arXiv:2208.00290  [pdf, ps, other

    math.OC cs.LG

    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

    Authors: Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the delta sphere. We analyze the bias and variance of… ▽ More

    Submitted 30 June, 2023; v1 submitted 30 July, 2022; originally announced August 2022.

  5. arXiv:2205.05843  [pdf, ps, other

    stat.ML cs.IT cs.LG

    A Survey of Risk-Aware Multi-Armed Bandits

    Authors: Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan

    Abstract: In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and a risk-aware performance measure is preferable, so as to capture losses in the case of adverse events. This survey aims to consolidate and summarise… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: 11 pages; Unabridged version of a a survey paper of the same title accepted to IJCAI-ECAI, 2022

  6. arXiv:2202.11046  [pdf, other

    cs.LG

    A policy gradient approach for optimization of smooth risk measures

    Authors: Nithia Vijayan, Prashanth L. A

    Abstract: We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy an… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:2107.04422

  7. arXiv:2107.04422  [pdf, other

    cs.LG

    Policy Gradient Methods for Distortion Risk Measures

    Authors: Nithia Vijayan, Prashanth L. A

    Abstract: We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respectively. We derive a variant of the policy gradient theorem that caters to the DRM objective, and integra… ▽ More

    Submitted 4 February, 2024; v1 submitted 9 July, 2021; originally announced July 2021.

  8. arXiv:2101.02137  [pdf, other

    cs.LG

    Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

    Authors: Nithia Vijayan, Prashanth L. A

    Abstract: We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm is a straightforward combination of importance sampling-based off-policy evaluation with SF-based gradient estimation. The second algorithm, inspired by the sto… ▽ More

    Submitted 23 June, 2024; v1 submitted 6 January, 2021; originally announced January 2021.

  9. arXiv:2002.11440  [pdf, ps, other

    cs.LG math.OC stat.ML

    Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles

    Authors: Nirav Bhavsar, Prashanth L. A

    Abstract: We introduce biased gradient oracles to capture a setting where the function measurements have an estimation error that can be controlled through a batch size parameter. Our proposed oracles are appealing in several practical contexts, for instance, risk measure estimation from a batch of independent and identically distributed (i.i.d.) samples, or simulation optimization, where the function measu… ▽ More

    Submitted 16 May, 2021; v1 submitted 26 February, 2020; originally announced February 2020.

  10. arXiv:1912.10398  [pdf, other

    cs.LG stat.ML

    Estimation of Spectral Risk Measures

    Authors: Ajay Kumar Pandey, Prashanth L. A., Sanjay P. Bhat

    Abstract: We consider the problem of estimating a spectral risk measure (SRM) from i.i.d. samples, and propose a novel method that is based on numerical integration. We show that our SRM estimate concentrates exponentially, when the underlying distribution has bounded support. Further, we also consider the case when the underlying distribution is either Gaussian or exponential, and derive a concentration bo… ▽ More

    Submitted 22 December, 2019; originally announced December 2019.

  11. arXiv:1902.10709  [pdf, ps, other

    math.ST cs.LG stat.ML

    A Wasserstein distance approach for concentration of empirical risk estimates

    Authors: Prashanth L. A., Sanjay P. Bhat

    Abstract: This paper presents a unified approach based on Wasserstein distance to derive concentration bounds for empirical estimates for two broad classes of risk measures defined in the paper. The classes of risk measures introduced include as special cases well known risk measures from the finance literature such as conditional value at risk (CVaR), optimized certainty equivalent risk, spectral risk meas… ▽ More

    Submitted 10 May, 2022; v1 submitted 27 February, 2019; originally announced February 2019.

  12. arXiv:1902.02953  [pdf, ps, other

    cs.LG stat.ML

    Correlated bandits or: How to minimize mean-squared error online

    Authors: Vinay Praneeth Boda, Prashanth L. A

    Abstract: While the objective in traditional multi-armed bandit problems is to find the arm with the highest mean, in many settings, finding an arm that best captures information about other arms is of interest. This objective, however, requires learning the underlying correlation structure and not just the means of the arms. Sensors placement for industrial surveillance and cellular network monitoring are… ▽ More

    Submitted 26 June, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

  13. arXiv:1901.00997  [pdf, ps, other

    cs.LG stat.ML

    Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

    Authors: Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla

    Abstract: Conditional Value-at-Risk (CVaR) is a widely used risk metric in applications such as finance. We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions. In the light-tailed case, we use a classical CVaR estimator based on the empirical distribution constructed from the samples. For heavy-tailed random variables, we assume a… ▽ More

    Submitted 25 August, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

  14. arXiv:1810.09126  [pdf, ps, other

    cs.LG math.OC stat.ML

    Risk-Sensitive Reinforcement Learning via Policy Gradient Search

    Authors: Prashanth L. A., Michael Fu

    Abstract: The objective in a traditional reinforcement learning (RL) problem is to find a policy that optimizes the expected value of a performance metric such as the infinite-horizon cumulative discounted or long-run average cost/reward. In practice, optimizing the expected value alone may not be satisfactory, in that it may be desirable to incorporate the notion of risk into the optimization problem formu… ▽ More

    Submitted 23 May, 2022; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: To appear in "Foundations and Trends in Machine Learning"

  15. arXiv:1808.02871  [pdf, ps, other

    math.OC cs.LG

    Random directions stochastic approximation with deterministic perturbations

    Authors: Prashanth L A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus

    Abstract: We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order algorithms to incorporate deterministic perturbations. We show that the gradient and/or Hessian estimates in the resulting algorithms with deterministic perturb… ▽ More

    Submitted 28 March, 2019; v1 submitted 8 August, 2018; originally announced August 2018.

  16. arXiv:1808.01739  [pdf, ps, other

    cs.LG stat.ML

    Concentration bounds for empirical conditional value-at-risk: The unbounded case

    Authors: Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan

    Abstract: In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event. Conditional Value-at-Risk (CVaR) is a popular risk measure for modeling the aforementioned objective. We consider the problem of estimating CVaR from i.i.d. samples of an unbou… ▽ More

    Submitted 6 August, 2018; originally announced August 2018.

  17. arXiv:1611.10283  [pdf, ps, other

    cs.LG stat.ML

    Bandit algorithms to emulate human decision making using probabilistic distortions

    Authors: Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

    Abstract: Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the reward distributions: the classic $K$-armed bandit and the linearly parameterized bandit settings. We consider the aforementioned problems in the regret minimization as… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 November, 2016; originally announced November 2016.

    Comments: The material in this paper was presented in part at the 2017 AAAI Conference on Artificial Intelligence

  18. arXiv:1609.07087  [pdf, other

    cs.LG stat.ML

    (Bandit) Convex Optimization with Biased Noisy Gradient Oracles

    Authors: Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

    Abstract: Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients. Depending on the properties of the function to be optimized and the nature of ``noise'' in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In t… ▽ More

    Submitted 4 July, 2020; v1 submitted 22 September, 2016; originally announced September 2016.

  19. arXiv:1507.07984  [pdf, ps, other

    cs.LG math.OC

    A constrained optimization perspective on actor critic algorithms and application to network routing

    Authors: Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

    Abstract: We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routin… ▽ More

    Submitted 28 July, 2015; originally announced July 2015.

  20. arXiv:1506.02632  [pdf, other

    cs.LG math.OC

    Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

    Authors: Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

    Abstract: Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting pre… ▽ More

    Submitted 26 February, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

  21. arXiv:1502.05577  [pdf, ps, other

    math.OC cs.LG

    Adaptive system optimization using random directions stochastic approximation

    Authors: Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

    Abstract: We present novel algorithms for simulation optimization using random directions stochastic approximation (RDSA). These include first-order (gradient) as well as second-order (Newton) schemes. We incorporate both continuous-valued as well as discrete-valued perturbations into both our algorithms. The former are chosen to be independent and identically distributed (i.i.d.) symmetric, uniformly distr… ▽ More

    Submitted 8 August, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

  22. arXiv:1405.2690  [pdf, ps, other

    stat.ML cs.LG math.OC

    Policy Gradients for CVaR-Constrained MDPs

    Authors: Prashanth L. A.

    Abstract: We study a risk-constrained version of the stochastic shortest path (SSP) problem, where the risk measure considered is Conditional Value-at-Risk (CVaR). We propose two algorithms that obtain a locally risk-optimal policy by employing four tools: stochastic approximation, mini batches, policy gradients and importance sampling. Both the algorithms incorporate a CVaR estimation procedure, along the… ▽ More

    Submitted 12 May, 2014; originally announced May 2014.

  23. arXiv:1403.6530  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

    Authors: Prashanth L. A., Mohammad Ghavamzadeh

    Abstract: In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounte… ▽ More

    Submitted 18 March, 2015; v1 submitted 25 March, 2014; originally announced March 2014.

  24. arXiv:1312.7292  [pdf, ps, other

    eess.SY cs.LG

    Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

    Authors: Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar

    Abstract: In this paper, we consider an intrusion detection application for Wireless Sensor Networks (WSNs). We study the problem of scheduling the sleep times of the individual sensors to maximize the network lifetime while kee** the tracking error to a minimum. We formulate this problem as a partially-observable Markov decision process (POMDP) with continuous state-action spaces, in a manner similar to… ▽ More

    Submitted 23 March, 2014; v1 submitted 27 December, 2013; originally announced December 2013.

  25. arXiv:1307.3176  [pdf, other

    cs.LG stat.ML

    Fast gradient descent for drifting least squares regression, with application to bandits

    Authors: Nathaniel Korda, Prashanth L. A., Rémi Munos

    Abstract: Online learning algorithms require to often recompute least squares regression estimates of parameters. We study improving the computational complexity of such algorithms by using stochastic gradient descent (SGD) type schemes in place of classic regression solvers. We show that SGD schemes efficiently track the true solutions of the regression problems, even in the presence of a drift. This findi… ▽ More

    Submitted 20 November, 2014; v1 submitted 11 July, 2013; originally announced July 2013.