Search | arXiv e-print repository

Price of Safety in Linear Best Arm Identification

Authors: Xuedong Shang, Igor Colin, Merwan Barlier, Hamza Cherkaoui

Abstract: We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safet… ▽ More We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 20 pages, 1 figures

arXiv:2209.13203 [pdf, ps, other]

doi 10.1109/MSP.2022.3177872

Monte-Carlo Sampling Approach to Model Selection: A Primer

Authors: Petre Stoica, Xiaolei Shang, Yuanbo Cheng

Abstract: Any data modeling exercise has two main components: parameter estimation and model selection. The latter will be the topic of this lecture note. More concretely we will introduce several Monte-Carlo sampling-based rules for model selection using the maximum a posteriori (MAP) approach. Model selection problems are omnipresent in signal processing applications: examples include selecting the order… ▽ More Any data modeling exercise has two main components: parameter estimation and model selection. The latter will be the topic of this lecture note. More concretely we will introduce several Monte-Carlo sampling-based rules for model selection using the maximum a posteriori (MAP) approach. Model selection problems are omnipresent in signal processing applications: examples include selecting the order of an autoregressive predictor, the length of the impulse response of a communication channel, the number of source signals im**ing on an array of sensors, the order of a polynomial trend, the number of components of a NMR signal, and so on. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Journal ref: IEEE Signal Processing Magazine, Vol, 39, no. 5, pp. 85--2, 2022

arXiv:2103.01312 [pdf, other]

UCB Momentum Q-learning: Correcting the bias without forgetting

Authors: Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

Abstract: We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process. UCBMQ is based on Q-learning where we add a momentum term and rely on the principle of optimism in face of uncertainty to deal with exploration. Our new technical ingredient of UCBMQ is the use of momentum to correct the… ▽ More We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process. UCBMQ is based on Q-learning where we add a momentum term and rely on the principle of optimism in face of uncertainty to deal with exploration. Our new technical ingredient of UCBMQ is the use of momentum to correct the bias that Q-learning suffers while, at the same time, limiting the impact it has on the second-order term of the regret. For UCBMQ, we are able to guarantee a regret of at most $O(\sqrt{H^3SAT}+ H^4 S A )$ where $H$ is the length of an episode, $S$ the number of states, $A$ the number of actions, $T$ the number of episodes and ignoring terms in poly-$\log(SAHT)$. Notably, UCBMQ is the first algorithm that simultaneously matches the lower bound of $Ω(\sqrt{H^3SAT})$ for large enough $T$ and has a second-order term (with respect to the horizon $T$) that scales only linearly with the number of states $S$. △ Less

Submitted 18 March, 2022; v1 submitted 1 March, 2021; originally announced March 2021.

arXiv:2010.08061 [pdf, ps, other]

Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses

Authors: Xuedong Shang, Han Shao, Jian Qian

Abstract: Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th en… ▽ More Multi-armed bandits are widely applied in scenarios like recommender systems, for which the goal is to maximize the click rate. However, more factors should be considered, e.g., user stickiness, user growth rate, user experience assessment, etc. In this paper, we model this situation as a problem of $K$-armed bandit with multiple losses. We define relative loss vector of an arm where the $i$-th entry compares the arm and the optimal arm with respect to the $i$-th loss. We study two goals: (a) finding the arm with the minimum $\ell^\infty$-norm of relative losses with a given confidence level (which refers to fixed-confidence best-arm identification); (b) minimizing the $\ell^\infty$-norm of cumulative relative losses (which refers to regret minimization). For goal (a), we derive a problem-dependent sample complexity lower bound and discuss how to achieve matching algorithms. For goal (b), we provide a regret lower bound of $Ω(T^{2/3})$ and provide a matching algorithm. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 14 pages

arXiv:2007.00953 [pdf, other]

Gamification of Pure Exploration for Linear Bandits

Authors: Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

Abstract: We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new… ▽ More We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation. △ Less

Submitted 2 July, 2020; originally announced July 2020.

Comments: 11+25 pages. To be published in the proceedings of ICML 2020

arXiv:1910.10945 [pdf, other]

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

Authors: Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

Abstract: We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T… ▽ More We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stop** rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors. △ Less

Submitted 28 October, 2019; v1 submitted 24 October, 2019; originally announced October 2019.

arXiv:1809.02394 [pdf, other]

Deep Feature Learning of Multi-Network Topology for Node Classification

Authors: Hansheng Xue, Jiajie Peng, Xuequn Shang

Abstract: Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has b… ▽ More Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has been proved to be helpful on tasks of network analysis, especially node classification. For many real-world systems, multiple types of relations are naturally represented by multiple networks. However, existing network embedding methods mainly focus on single network embedding and neglect the information shared among different networks. In this paper, we propose a novel multiple network embedding method based on semisupervised autoencoder, named DeepMNE, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account. We evaluate DeepMNE on the task of node classification with two real-world datasets. The experimental results demonstrate the superior performance of our method over four state-of-the-art algorithms. △ Less

Submitted 7 September, 2018; originally announced September 2018.

arXiv:1510.08692 [pdf, ps, other]

Covariance-Controlled Adaptive Langevin Thermostat for Large-Scale Bayesian Sampling

Authors: Xiaocheng Shang, Zhanxing Zhu, Benedict Leimkuhler, Amos J. Storkey

Abstract: Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stoch… ▽ More Monte Carlo sampling for Bayesian posterior inference is a common approach used in machine learning. The Markov Chain Monte Carlo procedures that are used are often discrete-time analogues of associated stochastic differential equations (SDEs). These SDEs are guaranteed to leave invariant the required posterior distribution. An area of current research addresses the computational benefits of stochastic gradient methods in this setting. Existing techniques rely on estimating the variance or covariance of the subsampling error, and typically assume constant variance. In this article, we propose a covariance-controlled adaptive Langevin thermostat that can effectively dissipate parameter-dependent noise while maintaining a desired target distribution. The proposed method achieves a substantial speedup over popular alternative schemes for large-scale machine learning applications. △ Less

Submitted 12 February, 2020; v1 submitted 29 October, 2015; originally announced October 2015.

Journal ref: Advances in Neural Information Processing Systems, 28, 37-45, (2015)

Showing 1–8 of 8 results for author: Shang, X