A Concentration Bound for TD(0) with Function Approximation

Chandak, Siddharth; Borkar, Vivek S.

Computer Science > Machine Learning

arXiv:2312.10424 (cs)

[Submitted on 16 Dec 2023]

Title:A Concentration Bound for TD(0) with Function Approximation

Authors:Siddharth Chandak, Vivek S. Borkar

View PDF HTML (experimental)

Abstract:We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

Comments:	Submitted to Stochastic Systems
Subjects:	Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Cite as:	arXiv:2312.10424 [cs.LG]
	(or arXiv:2312.10424v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.10424

Submission history

From: Siddharth Chandak [view email]
[v1] Sat, 16 Dec 2023 11:33:12 UTC (17 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2023-12

Change to browse by:

cs
cs.SY
eess
eess.SY
stat
stat.ML

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:A Concentration Bound for TD(0) with Function Approximation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Concentration Bound for TD(0) with Function Approximation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators