An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Pankayaraj, Pathmanathan; Maithripala, D. H. S.

Computer Science > Machine Learning

arXiv:1910.02635v1 (cs)

[Submitted on 7 Oct 2019 (this version), latest version 21 Feb 2020 (v3)]

Title:An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Authors:Pathmanathan Pankayaraj, D. H. S. Maithripala

View PDF

Abstract:Existing studies of the Multi Agent Multi Armed Bandit (MAMAB) problem, with the exception of a very few, consider the case where the agents observe their neighbors according to a static network graph. They also mostly rely on a running consensus for the estimation of the option rewards. Two of the exceptions consider a problem where agents observe instantaneous rewards and actions of their neighbors through an iid ER graph process based communication strategy. In this paper we propose a UCB based option allocation rule that guarantees logarithmic regret even if the graph depends on the history of choices made by the agents. The paper also proposes a novel communication strategy that significantly outperforms the iid ER graph based communication strategy. In both the ER graph and the dependent graph strategy, the regret is shown to depend on the connectivity of the graph in a particularly interesting way where there exists an optimal connectivity of the graph that is less than the full connectivity of the graph.

Comments:	The preprint submitted for review to the 2020 European Control Conference (ECC)
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.02635 [cs.LG]
	(or arXiv:1910.02635v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1910.02635

Submission history

From: D. H. S. Maithripala [view email]
[v1] Mon, 7 Oct 2019 07:05:36 UTC (1,241 KB)
[v2] Tue, 15 Oct 2019 06:08:42 UTC (1,241 KB)
[v3] Fri, 21 Feb 2020 16:18:33 UTC (653 KB)

Computer Science > Machine Learning

Title:An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators