Search | arXiv e-print repository

Bandits with Abstention under Expert Advice

Authors: Stephen Pasteris, Alberto Rumi, Maximilian Thiessen, Shota Saito, Atsushi Miyauchi, Fabio Vitale, Mark Herbster

Abstract: We study the classic problem of prediction with expert advice under bandit feedback. Our model assumes that one action, corresponding to the learner's abstention from play, has no reward or loss on every trial. We propose the CBA algorithm, which exploits this assumption to obtain reward bounds that can significantly improve those of the classical Exp4 algorithm. We can view our problem as the agg… ▽ More We study the classic problem of prediction with expert advice under bandit feedback. Our model assumes that one action, corresponding to the learner's abstention from play, has no reward or loss on every trial. We propose the CBA algorithm, which exploits this assumption to obtain reward bounds that can significantly improve those of the classical Exp4 algorithm. We can view our problem as the aggregation of confidence-rated predictors when the learner has the option of abstention from play. Importantly, we are the first to achieve bounds on the expected cumulative reward for general confidence-rated predictors. In the special case of specialists we achieve a novel reward bound, significantly improving previous bounds of SpecialistExp (treating abstention as another action). As an example application, we discuss learning unions of balls in a finite metric space. In this contextual setting, we devise an efficient implementation of CBA, reducing the runtime from quadratic to almost linear in the number of contexts. Preliminary experiments show that CBA improves over existing bandit algorithms. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2312.15433 [pdf, ps, other]

Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

Authors: Yuko Kuroki, Alberto Rumi, Taira Tsuchiya, Fabio Vitale, Nicolò Cesa-Bianchi

Abstract: We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\log(dKT)}{Δ_{\min}}$, where $Δ_{\min}$ is the minimum suboptimality gap over the $d$-… ▽ More We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\log(dKT)}{Δ_{\min}}$, where $Δ_{\min}$ is the minimum suboptimality gap over the $d$-dimensional context space. In the adversarial regime, we obtain either the first-order $\widetilde{O}(dK\sqrt{L^*})$ bound, or the second-order $\widetilde{O}(dK\sqrt{Λ^*})$ bound, where $L^*$ is the cumulative loss of the best action and $Λ^*$ is a notion of the cumulative second moment for the losses incurred by the algorithm. Moreover, we develop an algorithm based on FTRL with Shannon entropy regularizer that does not require the knowledge of the inverse of the covariance matrix, and achieves a polylogarithmic regret in the stochastic regime while obtaining $\widetilde{O}\big(dK\sqrt{T}\big)$ regret bounds in the adversarial regime. △ Less

Submitted 19 February, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: Accepted at AISTATS2024

arXiv:2311.05975 [pdf, other]

Sum-max Submodular Bandits

Authors: Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi

Abstract: Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on facility location, $M$-medians, and hitting sets. We show that all functions in this class… ▽ More Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on facility location, $M$-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{O}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2111.08249 [pdf, other]

Bengali Handwritten Grapheme Classification: Deep Learning Approach

Authors: Tarun Roy, Hasib Hasan, Kowsar Hossain, Masuma Akter Rumi

Abstract: Despite being one of the most spoken languages in the world ($6^{th}$ based on population), research regarding Bengali handwritten grapheme (smallest functional unit of a writing system) classification has not been explored widely compared to other prominent languages. Moreover, the large number of combinations of graphemes in the Bengali language makes this classification task very challenging. W… ▽ More Despite being one of the most spoken languages in the world ($6^{th}$ based on population), research regarding Bengali handwritten grapheme (smallest functional unit of a writing system) classification has not been explored widely compared to other prominent languages. Moreover, the large number of combinations of graphemes in the Bengali language makes this classification task very challenging. With an effort to contribute to this research problem, we participate in a Kaggle competition \cite{kaggle_link} where the challenge is to separately classify three constituent elements of a Bengali grapheme in the image: grapheme root, vowel diacritics, and consonant diacritics. We explore the performances of some existing neural network models such as Multi-Layer Perceptron (MLP) and state of the art ResNet50. To further improve the performance we propose our own convolution neural network (CNN) model for Bengali grapheme classification with validation root accuracy 95.32\%, vowel accuracy 98.61\%, and consonant accuracy 98.76\%. We also explore Region Proposal Network (RPN) using VGGNet with a limited setting that can be a potential future direction to improve the performance. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: 8 pages, 15 figures, pre-print

arXiv:2101.07706 [pdf, other]

Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks

Authors: Peng Jiang, Masuma Akter Rumi

Abstract: Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data recursively from neighboring nodes. To reduce the computation overhead, previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors. Although these methods have successfully accelerated the training, they mainly focus on the si… ▽ More Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data recursively from neighboring nodes. To reduce the computation overhead, previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors. Although these methods have successfully accelerated the training, they mainly focus on the single-machine setting. As real-world graphs are large, training GCNs in distributed systems is desirable. However, we found that the existing neighbor sampling methods do not work well in a distributed setting. Specifically, a naive implementation may incur a huge amount of communication of feature vectors among different machines. To address this problem, we propose a communication-efficient neighbor sampling method in this work. Our main idea is to assign higher sampling probabilities to the local nodes so that remote nodes are accessed less frequently. We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training. Our experiments with node classification benchmarks show that our method significantly reduces the communication overhead for distributed GCN training with little accuracy loss. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Showing 1–5 of 5 results for author: Rumi, A