Skip to main content

Showing 1–1 of 1 results for author: Sangadi, T

.
  1. arXiv:2406.07892  [pdf, ps, other

    cs.LG cs.AI

    Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP

    Authors: Tejaram Sangadi, L. A. Prashanth, Krishna Jagannathan

    Abstract: Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature, though only asymptotic guarantees are available for this algorithm. We derive finite sampl… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.