Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Gouverneur, Amaury; Rodríguez-Gálvez, Borja; Oechtering, Tobias J.; Skoglund, Mikael

Statistics > Machine Learning

arXiv:2403.03361 (stat)

[Submitted on 5 Mar 2024]

Title:Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Authors:Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

View PDF HTML (experimental)

Abstract:This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling.
Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

Comments:	15 pages: 8 of main text and 7 of appendices
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2403.03361 [stat.ML]
	(or arXiv:2403.03361v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2403.03361

Submission history

From: Amaury Gouverneur [view email]
[v1] Tue, 5 Mar 2024 23:08:18 UTC (85 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-03

Change to browse by:

cs
stat
stat.ML

References & Citations

export BibTeX citation

Statistics > Machine Learning

Title:Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators