Search | arXiv e-print repository

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

Authors: Anusha Lalitha, Kousha Kalantari, Yifei Ma, Anoop Deoras, Branislav Kveton

Abstract: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than… ▽ More We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2110.00751 [pdf, other]

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

Authors: Erdem Bıyık, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh

Abstract: When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-A… ▽ More When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We demonstrate that naïve extensions of single-agent optimal MAB algorithms fail when applied for decentralized bandit teams. Instead, we propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. We analytically show that our proposed strategy achieves logarithmic regret, and provide extensive experiments involving human-AI and human-robot collaboration to validate our theoretical findings. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy. △ Less

Submitted 16 December, 2021; v1 submitted 2 October, 2021; originally announced October 2021.

Comments: 14 pages, 13 figures. To be presented at "Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) 2022". Also presented at "Artificial Intelligence for Human-Robot Interaction (AI-HRI) at AAAI Fall Symposium Series 2021"

Report number: AIHRI/2021/46

arXiv:2012.00077 [pdf, ps, other]

On Error Exponents of Almost-Fixed-Length Channel Codes and Hypothesis Tests

Authors: Anusha Lalitha, Tara Javidi

Abstract: We examine a new class of channel coding strategies, and hypothesis tests referred to as almost-fixed-length strategies that have little flexibility in the stop** time over fixed-length strategies. The stop** time of these strategies is allowed to be slightly large only on a rare set of sample paths with an exponentially small probability. We show that almost-fixed-length channel coding strate… ▽ More We examine a new class of channel coding strategies, and hypothesis tests referred to as almost-fixed-length strategies that have little flexibility in the stop** time over fixed-length strategies. The stop** time of these strategies is allowed to be slightly large only on a rare set of sample paths with an exponentially small probability. We show that almost-fixed-length channel coding strategies can achieve Burnashev's optimal error exponent. Similarly, almost-fixed length hypothesis tests are shown to bridge the gap between hypothesis testing with fixed sample size and sequential hypothesis testing and improve the trade-off between type-I and type-II error exponents. △ Less

Submitted 30 November, 2020; originally announced December 2020.

arXiv:2010.10569 [pdf, other]

Bayesian Algorithms for Decentralized Stochastic Bandits

Authors: Anusha Lalitha, Andrea Goldsmith

Abstract: We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. In each round, agents choose an arm to play and subsequently send a message to their neighbors. The goal is to minimize cumulativ… ▽ More We study a decentralized cooperative multi-agent multi-armed bandit problem with $K$ arms and $N$ agents connected over a network. In our model, each arm's reward distribution is same for all agents, and rewards are drawn independently across agents and over time steps. In each round, agents choose an arm to play and subsequently send a message to their neighbors. The goal is to minimize cumulative regret averaged over the entire network. We propose a decentralized Bayesian multi-armed bandit framework that extends single-agent Bayesian bandit algorithms to the decentralized setting. Specifically, we study an information assimilation algorithm that can be combined with existing Bayesian algorithms, and using this, we propose a decentralized Thompson Sampling algorithm and decentralized Bayes-UCB algorithm. We analyze the decentralized Thompson Sampling algorithm under Bernoulli rewards and establish a problem-dependent upper bound on the cumulative regret. We show that regret incurred scales logarithmically over the time horizon with constants that match those of an optimal centralized agent with access to all observations across the network. Our analysis also characterizes the cumulative regret in terms of the network structure. Through extensive numerical studies, we show that our extensions of Thompson Sampling and Bayes-UCB incur lesser cumulative regret than the state-of-art algorithms inspired by the Upper Confidence Bound algorithm. We implement our proposed decentralized Thompson Sampling under gossip protocol, and over time-varying networks, where each communication link has a fixed probability of failure. △ Less

Submitted 27 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Submitted to IEEE Journal on Selected Areas in Information Theory (JSAIT) issue on Sequential, Active, and Reinforcement Learning

arXiv:1905.10466 [pdf, other]

Decentralized Bayesian Learning over Graphs

Authors: Anusha Lalitha, Xinghan Wang, Osman Kilinc, Yongxi Lu, Tara Javidi, Farinaz Koushanfar

Abstract: We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian a… ▽ More We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian and peer-to-peer variant of federated learning in which each agent keeps a "posterior probability distribution" over a global model parameters. The agent update its "posterior" based on 1) the local training data and 2) the asynchronous communication and model aggregation with their 1-hop neighbors. This Bayesian formulation allows for a systematic treatment of model aggregation over any arbitrary connected graph. Furthermore, it provides strong analytic guarantees on converge in the realizable case as well as a closed form characterization of the rate of convergence. We also show that our methodology can be combined with efficient Bayesian inference techniques to train Bayesian neural networks in a decentralized manner. By empirical studies we show that our theoretical analysis can guide the design of network/social interactions and data partitioning to achieve convergence. △ Less

Submitted 24 May, 2019; originally announced May 2019.

arXiv:1901.11173 [pdf, other]

Peer-to-peer Federated Learning on Graphs

Authors: Anusha Lalitha, Osman Cihan Kilinc, Tara Javidi, Farinaz Koushanfar

Abstract: We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a distributed learning algorithm in which nodes update their belief by aggregate information from their one-hop neighbors to learn a model that best fits the obser… ▽ More We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a distributed learning algorithm in which nodes update their belief by aggregate information from their one-hop neighbors to learn a model that best fits the observations over the entire network. In addition, we also obtain sufficient conditions to ensure that the probability of error is small for every node in the network. We discuss approximations required for applying this algorithm to train Deep Neural Networks (DNNs). Experiments on training linear regression model and on training a DNN show that the proposed learning rule algorithm provides a significant improvement in the accuracy compared to the case where nodes learn without cooperation. △ Less

Submitted 30 January, 2019; originally announced January 2019.

arXiv:1712.05865 [pdf, other]

doi 10.1109/JSTSP.2018.2850751

Improved Target Acquisition Rates with Feedback Codes

Authors: Anusha Lalitha, Nancy Ronquillo, Tara Javidi

Abstract: This paper considers the problem of acquiring an unknown target location (among a finite number of locations) via a sequence of measurements, where each measurement consists of simultaneously probing a group of locations. The resulting observation consists of a sum of an indicator of the target's presence in the probed region, and a zero mean Gaussian noise term whose variance is a function of the… ▽ More This paper considers the problem of acquiring an unknown target location (among a finite number of locations) via a sequence of measurements, where each measurement consists of simultaneously probing a group of locations. The resulting observation consists of a sum of an indicator of the target's presence in the probed region, and a zero mean Gaussian noise term whose variance is a function of the measurement vector. An equivalence between the target acquisition problem and channel coding over a binary input additive white Gaussian noise (BAWGN) channel with state and feedback is established. Utilizing this information theoretic perspective, a two-stage adaptive target search strategy based on the sorted Posterior Matching channel coding strategy is proposed. Furthermore, using information theoretic converses, the fundamental limits on the target acquisition rate for adaptive and non-adaptive strategies are characterized. As a corollary to the non-asymptotic upper bound of the expected number of measurements under the proposed two-stage strategy, and to non-asymptotic lower bound of the expected number of measurements for optimal non-adaptive search strategy, a lower bound on the adaptivity gain is obtained. The adaptivity gain is further investigated in different asymptotic regimes of interest. △ Less

Submitted 15 December, 2017; originally announced December 2017.

Showing 1–7 of 7 results for author: Lalitha, A