Skip to main content

Showing 1–7 of 7 results for author: Kamanchi, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2007.02510  [pdf, other

    stat.AP cs.LG

    An Application of Newsboy Problem in Supply Chain Optimisation of Online Fashion E-Commerce

    Authors: Chandramouli Kamanchi, Gopinath Ashok Kumar, Nachiappan Sundaram, Ravindra Babu T, Chaithanya Bandi

    Abstract: We describe a supply chain optimization model deployed in an online fashion e-commerce company in India called Myntra. Our model is simple, elegant and easy to put into service. The model utilizes historic data and predicts the quantity of Stock Kee** Units (SKUs) to hold so that the metrics "Fulfilment Index" and "Utilization Index" are optimized. We present the mathematics central to our model… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

  2. arXiv:1911.05697  [pdf, other

    cs.LG stat.ML

    A Convergent Off-Policy Temporal Difference Algorithm

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: Learning the value function of a given policy (target policy) from the data samples obtained from a different policy (behavior policy) is an important problem in Reinforcement Learning (RL). This problem is studied under the setting of off-policy prediction. Temporal Difference (TD) learning algorithms are a popular class of algorithms for solving the prediction problem. TD algorithms with linear… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

  3. Generalized Speedy Q-learning

    Authors: Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role. It is possible to generalize the Bellman operator using the technique of successive relaxa… ▽ More

    Submitted 12 February, 2020; v1 submitted 1 November, 2019; originally announced November 2019.

    Journal ref: in IEEE Control Systems Letters, vol. 4, no. 3, pp. 524-529, July 2020

  4. arXiv:1906.06659  [pdf, ps, other

    cs.LG cs.GT stat.ML

    A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

    Authors: Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

    Abstract: We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in the literature. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this work, we compute the solution of the two-player zero-sum game utilizing the technique of successive relaxation that has been successfully app… ▽ More

    Submitted 18 March, 2022; v1 submitted 16 June, 2019; originally announced June 2019.

  5. Generalized Second Order Value Iteration in Markov Decision Processes

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

    Abstract: Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. Su… ▽ More

    Submitted 17 September, 2021; v1 submitted 10 May, 2019; originally announced May 2019.

    Comments: Accepted for publication at IEEE Transactions on Automatic Control

  6. Successive Over Relaxation Q-Learning

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

    Abstract: In a discounted reward Markov Decision Process (MDP), the objective is to find the optimal value function, i.e., the value function corresponding to an optimal policy. This problem reduces to solving a functional equation known as the Bellman equation and a fixed point iteration scheme known as the value iteration is utilized to obtain the solution. In literature, a successive over-relaxation base… ▽ More

    Submitted 13 June, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

    Journal ref: IEEE Control Systems Letters 2019

  7. An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

    Authors: Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

    Abstract: One of the popular measures of central tendency that provides better representation and interesting insights of the data compared to the other measures like mean and median is the metric mode. If the analytical form of the density function is known, mode is an argument of the maximum value of the density function and one can apply the optimization techniques to find mode. In many of the practical… ▽ More

    Submitted 3 June, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Journal ref: IEEE Control Systems Letters 2019