Skip to main content

Showing 1–25 of 25 results for author: Bhatnagar, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2304.10951  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

    Authors: Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

    Abstract: We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the… ▽ More

    Submitted 21 April, 2023; originally announced April 2023.

  2. Case-Base Neural Networks: survival analysis with time-varying, higher-order interactions

    Authors: Jesse Islam, Maxime Turgeon, Robert Sladek, Sahir Bhatnagar

    Abstract: In the context of survival analysis, data-driven neural network-based methods have been developed to model complex covariate effects. While these methods may provide better predictive performance than regression-based approaches, not all can model time-varying interactions and complex baseline hazards. To address this, we propose Case-Base Neural Networks (CBNNs) as a new approach that combines th… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

  3. arXiv:2212.10477  [pdf, ps, other

    cs.LG math.ST stat.ML

    Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

    Authors: Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

    Abstract: We present in this paper a family of generalized simultaneous perturbation-based gradient search (GSPGS) estimators that use noisy function measurements. The number of function measurements required by each estimator is guided by the desired level of accuracy. We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present t… ▽ More

    Submitted 12 November, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: The material in this paper was presented in part at the Conference on Information Sciences and Systems (CISS) in March 2023

  4. arXiv:2206.12267  [pdf, other

    stat.ME stat.AP

    Efficient Penalized Generalized Linear Mixed Models for Variable Selection and Genetic Risk Prediction in High-Dimensional Data

    Authors: Julien St-Pierre, Karim Oualkacha, Sahir Rai Bhatnagar

    Abstract: Sparse regularized regression methods are now widely used in genome-wide association studies (GWAS) to address the multiple testing burden that limits discovery of potentially important predictors. Linear mixed models (LMMs) have become an attractive alternative to principal components (PC) adjustment to account for population structure and relatedness in high-dimensional penalized models. However… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: 26 pages, 5 figures

  5. arXiv:2205.13609  [pdf, ps, other

    stat.ME

    Variable Selection for Individualized Treatment Rules with Discrete Outcomes

    Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sylvie D Lambert, Sahir Bhatnagar

    Abstract: An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a l… ▽ More

    Submitted 29 September, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  6. arXiv:2101.07359  [pdf, other

    stat.ME stat.CO

    Variable Selection in Regression-based Estimation of Dynamic Treatment Regimes

    Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sahir Bhatnagar

    Abstract: Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data… ▽ More

    Submitted 3 December, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

  7. arXiv:2009.10629  [pdf, ps, other

    math.OC stat.CO stat.ML

    Accelerated Gradient Methods for Sparse Statistical Learning with Nonconvex Penalties

    Authors: Kai Yang, Masoud Asgharian, Sahir Bhatnagar

    Abstract: Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprising two components: a convex loss and a penalty function. While AG methods perform well for convex penalties, such as the LASSO, convergence issues may arise when it is applied to nonconvex penalties, such as SCAD. A recent proposal generalizes Nesterov's AG method to the nonconvex setting. The propo… ▽ More

    Submitted 28 November, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 42 pages, 13 figures

    Journal ref: Stat Comput 34, 59 (2024)

  8. arXiv:2009.10264  [pdf, other

    stat.ME

    casebase: An Alternative Framework For Survival Analysis and Comparison of Event Rates

    Authors: Sahir Rai Bhatnagar, Maxime Turgeon, Jesse Islam, James A. Hanley, Olli Saarela

    Abstract: In epidemiological studies of time-to-event data, a quantity of interest to the clinician and the patient is the risk of an event given a covariate profile. However, methods relying on time matching or risk-set sampling (including Cox regression) eliminate the baseline hazard from the likelihood expression or the estimating function. The baseline hazard then needs to be estimated separately using… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: 31 pages, 10 figures

  9. arXiv:2008.13066  [pdf, other

    stat.ML cs.LG stat.ME

    Computer Model Calibration with Time Series Data using Deep Learning and Quantile Regression

    Authors: Saumya Bhatnagar, Won Chang, Seon** Kim Jiali Wang

    Abstract: Computer models play a key role in many scientific and engineering problems. One major source of uncertainty in computer model experiment is input parameter uncertainty. Computer model calibration is a formal statistical procedure to infer input parameters by combining information from model runs and observational data. The existing standard calibration framework suffers from inferential issues wh… ▽ More

    Submitted 8 September, 2020; v1 submitted 29 August, 2020; originally announced August 2020.

  10. arXiv:1911.05697  [pdf, other

    cs.LG stat.ML

    A Convergent Off-Policy Temporal Difference Algorithm

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of off-policy prediction. Temporal Difference (TD) learning algorithms are a popular class of algorithms for solving the prediction problem. TD algorithms with linear… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  11. arXiv:1906.06659  [pdf, ps, other

    cs.LG cs.GT stat.ML

    A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation that has been successfully app… ▽ More

    Submitted 18 March, 2022; v1 submitted 16 June, 2019; originally announced June 2019.

  12. arXiv:1905.03970  [pdf, other

    cs.LG cs.AI stat.ML

    Reinforcement Learning in Non-Stationary Environments

    Authors: Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

    Abstract: Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, one often encounters situations with non-stationary environments and in these scenarios, RL methods yield sub-optimal decisions. In this pape… ▽ More

    Submitted 19 May, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

    Journal ref: Applied Intelligence 2020

  13. Generalized Second Order Value Iteration in Markov Decision Processes

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

    Abstract: Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. Su… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted for publication at IEEE Transactions on Automatic Control

  14. Successive Over Relaxation Q-Learning

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

    Abstract: In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In literature, a successive over-relaxation base… ▽ More

    Submitted 13 June, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Journal ref: IEEE Control Systems Letters 2019

  15. An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

    Abstract: One of the popular measures of central tendency that provides better representation and interesting insights of the data compared to the other measures like mean and median is the metric mode. If the analytical form of the density function is known, mode is an argument of the maximum value of the density function and one can apply the optimization techniques to find mode. In many of the practical… ▽ More

    Submitted 3 June, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Journal ref: IEEE Control Systems Letters 2019

  16. arXiv:1806.06720  [pdf, other

    cs.LG stat.ML

    An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

    Authors: A** George Joseph, Shalabh Bhatnagar

    Abstract: In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic appro… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1609.09449

  17. arXiv:1802.07935  [pdf, other

    math.OC math.DS stat.ML

    Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar, Daniel E. Quevedo

    Abstract: Asynchronous stochastic approximations (SAs) are an important class of model-free algorithms, tools and techniques that are popular in multi-agent and distributed control scenarios. To counter Bellman's curse of dimensionality, such algorithms are coupled with function approximations. Although the learning/ control problem becomes more tractable, function approximations affect stability and conver… ▽ More

    Submitted 2 May, 2019; v1 submitted 22 February, 2018; originally announced February 2018.

    MSC Class: 62L20; 93E35; 49L20; 68T05

  18. arXiv:1709.04673  [pdf, ps, other

    eess.SY math.DS stat.ML

    Analyzing Approximate Value Iteration Algorithms

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar

    Abstract: In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman's curse of dimensionality. In this paper, they are us… ▽ More

    Submitted 30 May, 2021; v1 submitted 14 September, 2017; originally announced September 2017.

    MSC Class: 62L20; 93E35; 37B25; 34A60; 90C39; 37C25

  19. arXiv:1604.00151  [pdf, other

    eess.SY stat.ML

    Analysis of gradient descent methods with non-diminishing, bounded errors

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar

    Abstract: The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically. In particular, sufficient conditions are presented for both stability (almost sure boundedness of the iterates) and convergence of GD with bounded, (possibly) non-diminishing gradient errors. In addition to ensuring stability, such an algorit… ▽ More

    Submitted 18 September, 2017; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: arXiv admin note: text overlap with arXiv:1502.01953, IEEE Transactions on Automatic Control, 2017

    MSC Class: 93E15; 93E35

  20. arXiv:1504.06043  [pdf, ps, other

    eess.SY stat.ML

    Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar

    Abstract: We are interested in understanding stability (almost sure boundedness) of stochastic approximation algorithms (SAs) driven by a `controlled Markov' process. Analyzing this class of algorithms is important, since many reinforcement learning (RL) algorithms can be cast as SAs driven by a `controlled Markov' process. In this paper, we present easily verifiable sufficient conditions for stability and… ▽ More

    Submitted 17 May, 2018; v1 submitted 23 April, 2015; originally announced April 2015.

    Comments: 18 pages

    MSC Class: 62L20; 93E03; 93E35; 34A60

  21. arXiv:1503.09105  [pdf, ps, other

    math.DS cs.AI stat.ML

    Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

    Authors: Prasenjit Karmakar, Shalabh Bhatnagar

    Abstract: We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in… ▽ More

    Submitted 25 February, 2017; v1 submitted 31 March, 2015; originally announced March 2015.

    Comments: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 2017

  22. arXiv:1502.01956  [pdf, ps, other

    eess.SY math.DS stat.ML

    Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar

    Abstract: In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields. This paper builds on the works of Borkar and Perkins & Leslie. The framework presented herein is more general as compared to the synchronous two timescale framework of Perkins \& Leslie, however the assumptions involved are easily… ▽ More

    Submitted 9 October, 2015; v1 submitted 6 February, 2015; originally announced February 2015.

    MSC Class: 62L20; 93E03; 93E35; 34A60

    Journal ref: Stochastics 2016

  23. arXiv:1502.01953  [pdf, ps, other

    eess.SY math.DS stat.ML

    A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions

    Authors: Arunselvan Ramaswamy, Shalabh Bhatnagar

    Abstract: In this paper the stability theorem of Borkar and Meyn is extended to include the case when the mean field is a differential inclusion. Two different sets of sufficient conditions are presented that guarantee the stability and convergence of stochastic recursive inclusions. Our work builds on the works of Benaim, Hofbauer and Sorin as well as Borkar and Meyn. As a corollary to one of the main theo… ▽ More

    Submitted 27 September, 2016; v1 submitted 6 February, 2015; originally announced February 2015.

    MSC Class: 62L20; 93E03; 93E35; 34A60

  24. arXiv:1401.2086  [pdf, ps, other

    cs.GT cs.LG stat.ML

    Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

    Authors: H. L Prasad, L. A. Prashanth, Shalabh Bhatnagar

    Abstract: We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a $N$-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution poi… ▽ More

    Submitted 2 July, 2015; v1 submitted 8 January, 2014; originally announced January 2014.

  25. arXiv:1206.4832  [pdf, other

    cs.IT cs.LG stat.ME

    Smoothed Functional Algorithms for Stochastic Optimization using q-Gaussian Distributions

    Authors: Debarghya Ghoshdastidar, Ambedkar Dukkipati, Shalabh Bhatnagar

    Abstract: Smoothed functional (SF) schemes for gradient estimation are known to be efficient in stochastic optimization algorithms, specially when the objective is to improve the performance of a stochastic system. However, the performance of these methods depends on several parameters, such as the choice of a suitable smoothing kernel. Different kernels have been studied in literature, which include Gaussi… ▽ More

    Submitted 3 July, 2014; v1 submitted 21 June, 2012; originally announced June 2012.

    ACM Class: G.1.6; I.6.8