Skip to main content

Showing 1–3 of 3 results for author: Landgren, P

.
  1. arXiv:2003.01312  [pdf, other

    math.OC cs.LG stat.ML

    Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

    Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

    Abstract: We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independen… ▽ More

    Submitted 11 August, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

  2. arXiv:1606.00911  [pdf, other

    eess.SY cs.LG math.OC

    Distributed Cooperative Decision-Making in Multiarmed Bandits: Frequentist and Bayesian Algorithms

    Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

    Abstract: We study distributed cooperative decision-making under the explore-exploit tradeoff in the multiarmed bandit (MAB) problem. We extend the state-of-the-art frequentist and Bayesian algorithms for single-agent MAB problems to cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph. We rely on a running consensus algorithm for eac… ▽ More

    Submitted 17 September, 2019; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 IEEE Conference on Decision and Control (CDC). The second statement of Proposition 1 and Theorem 1 are new from arXiv:1512.06888v3 and Lemma 1 is new. These are used to prove regret bounds in Theorems 2 and 3

  3. arXiv:1512.06888  [pdf, other

    eess.SY cs.MA math.OC stat.ML

    On Distributed Cooperative Decision-Making in Multiarmed Bandits

    Authors: Peter Landgren, Vaibhav Srivastava, Naomi Ehrich Leonard

    Abstract: We study the explore-exploit tradeoff in distributed cooperative decision-making using the context of the multiarmed bandit (MAB) problem. For the distributed cooperative MAB problem, we design the cooperative UCB algorithm that comprises two interleaved distributed processes: (i) running consensus algorithms for estimation of rewards, and (ii) upper-confidence-bound-based heuristics for selection… ▽ More

    Submitted 16 September, 2019; v1 submitted 21 December, 2015; originally announced December 2015.

    Comments: This revision provides a correction to the original paper, which appeared in the Proceedings of the 2016 European Control Conference (ECC). The second statement of Proposition 1, Theorem 1 and their proofs are new. The new Theorem 1 is used to prove the regret bounds in Theorem 2