Search | arXiv e-print repository

Auditing Privacy Mechanisms via Label Inference Attacks

Authors: Róbert István Busa-Fekete, Travis Dick, Claudio Gentile, Andrés Muñoz Medina, Adam Smith, Marika Swanberg

Abstract: We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., aggregate of labels from different users or noisy labels output by randomized response), compared to an attacke… ▽ More We propose reconstruction advantage measures to audit label privatization mechanisms. A reconstruction advantage measure quantifies the increase in an attacker's ability to infer the true label of an unlabeled example when provided with a private version of the labels in a dataset (e.g., aggregate of labels from different users or noisy labels output by randomized response), compared to an attacker that only observes the feature vectors, but may have prior knowledge of the correlation between features and labels. We consider two such auditing measures: one additive, and one multiplicative. These incorporate previous approaches taken in the literature on empirical auditing and differential privacy. The measures allow us to place a variety of proposed privatization schemes -- some differentially private, some not -- on the same footing. We analyze these measures theoretically under a distributional model which encapsulates reasonable adversarial settings. We also quantify their behavior empirically on real and simulated prediction tasks. Across a range of experimental settings, we find that differentially private schemes dominate or match the privacy-utility tradeoff of more heuristic approaches. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2401.01329 [pdf, other]

Self-Supervised Millimeter Wave Indoor Localization using Tiny Neural Networks

Authors: Anish Shastri, Steve Blandino, Camillo Gentile, Chieh** Lai, Paolo Casari

Abstract: The quasi-optical propagation of millimeter-wave signals enables high-accuracy localization algorithms that employ geometric approaches or machine learning models. However, most algorithms require information on the indoor environment, may entail the collection of large training datasets, or bear an infeasible computational burden for commercial off-the-shelf (COTS) devices. In this work, we propo… ▽ More The quasi-optical propagation of millimeter-wave signals enables high-accuracy localization algorithms that employ geometric approaches or machine learning models. However, most algorithms require information on the indoor environment, may entail the collection of large training datasets, or bear an infeasible computational burden for commercial off-the-shelf (COTS) devices. In this work, we propose to use tiny neural networks (NNs) to learn the relationship between angle difference-of-arrival (ADoA) measurements and locations of a receiver in an indoor environment. To relieve training data collection efforts, we resort to a self-supervised approach by bootstrap** the training of our neural network through location estimates obtained from a state-of-the-art localization algorithm. We evaluate our scheme via mmWave measurements from indoor 60-GHz double-directional channel sounding. We process the measurements to yield dominant multipath components, use the corresponding angles to compute ADoA values, and finally obtain location fixes. Results show that the tiny NN achieves sub-meter errors in 74\% of the cases, thus performing as good as or even better than the state-of-the-art algorithm, with significantly lower computational complexity. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 13 pages, 11 figures

arXiv:2306.15371 [pdf, ps, other]

A New Mathematical Optimization-Based Method for the m-invariance Problem

Authors: Adrian Tobar, Jordi Castro, Claudio Gentile

Abstract: The issue of ensuring privacy for users who share their personal information has been a growing priority in a business and scientific environment where the use of different types of data and the laws that protect it have increased in tandem. Different technologies have been widely developed for static publications, i.e., where the information is published only once, such as k-anonymity and ε-diffe… ▽ More The issue of ensuring privacy for users who share their personal information has been a growing priority in a business and scientific environment where the use of different types of data and the laws that protect it have increased in tandem. Different technologies have been widely developed for static publications, i.e., where the information is published only once, such as k-anonymity and ε-differential privacy. In the case where microdata information is published dynamically, although established notions such as m-invariance and τ-safety already exist, developments for improving utility remain superficial. We propose a new heuristic approach for the NP-hard combinatorial problem of m-invariance and τ-safety, which is based on a mathematical optimization column generation scheme. The quality of a solution to m-invariance and τ-safety can be measured by the Information Loss (IL), a value in [0,100], the closer to 0 the better. We show that our approach improves by far current heuristics, providing in some instances solutions with ILs of 1.87, 8.5 and 1.93, while the state-of-the art methods reported ILs of 39.03, 51.84 and 57.97, respectively. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.04828 [pdf, other]

Fast and Effective GNN Training with Linearized Random Spanning Trees

Authors: Francesco Bonchi, Claudio Gentile, Francesco Paolo Nerini, André Panisson, Fabio Vitale

Abstract: We present a new effective and scalable framework for training GNNs in node classification tasks, based on the effective resistance, a powerful tool solidly rooted in graph theory. Our approach progressively refines the GNN weights on an extensive sequence of random spanning trees, suitably transformed into path graphs that retain essential topological and node information of the original graph. T… ▽ More We present a new effective and scalable framework for training GNNs in node classification tasks, based on the effective resistance, a powerful tool solidly rooted in graph theory. Our approach progressively refines the GNN weights on an extensive sequence of random spanning trees, suitably transformed into path graphs that retain essential topological and node information of the original graph. The sparse nature of these path graphs substantially lightens the computational burden of GNN training. This not only enhances scalability but also effectively addresses common issues like over-squashing, over-smoothing, and performance deterioration caused by overfitting in small training set regimes. We carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we apply our framework to graph convolutional networks, showing simultaneous improvement of both training speed and test accuracy over a wide pool of representative baselines. △ Less

Submitted 14 February, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.02869 [pdf, other]

Data-Driven Online Model Selection With Regret Guarantees

Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile

Abstract: We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner. Model selection is performed by regret balancing but, unlike the recent literature on this subject, we do not assume any prior… ▽ More We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies recommended by each base learner. Model selection is performed by regret balancing but, unlike the recent literature on this subject, we do not assume any prior knowledge about the base learners like candidate regret guarantees; instead, we uncover these quantities in a data-driven manner. The meta-learner is therefore able to leverage the realized regret incurred by each base learner for the learning environment at hand (as opposed to the expected regret), and single out the best such regret. We design two model selection algorithms operating with this more ambitious notion of regret and, besides proving model selection guarantees via regret balancing, we experimentally demonstrate the compelling practical benefits of dealing with actual regrets instead of candidate regret bounds. △ Less

Submitted 23 January, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.17544 [pdf, ps, other]

Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods

Authors: Guanghui Wang, Zihao Hu, Claudio Gentile, Vidya Muthukumar, Jacob Abernethy

Abstract: First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known as implicit bias, plays a critical role in understanding the generalization capabilities of optimization algorithms. Recent research has revealed that in separable binary classification tasks gradient-d… ▽ More First-order optimization methods tend to inherently favor certain solutions over others when minimizing an underdetermined training objective that has multiple global optima. This phenomenon, known as implicit bias, plays a critical role in understanding the generalization capabilities of optimization algorithms. Recent research has revealed that in separable binary classification tasks gradient-descent-based methods exhibit an implicit bias for the $\ell_2$-maximal margin classifier. Similarly, generic optimization methods, such as mirror descent and steepest descent, have been shown to converge to maximal margin classifiers defined by alternative geometries. While gradient-descent-based algorithms provably achieve fast implicit bias rates, corresponding rates in the literature for generic optimization methods are relatively slow. To address this limitation, we present a series of state-of-the-art implicit bias rates for mirror descent and steepest descent algorithms. Our primary technique involves transforming a generic optimization algorithm into an online optimization dynamic that solves a regularized bilinear game, providing a unified framework for analyzing the implicit bias of various optimization methods. Our accelerated rates are derived by leveraging the regret bounds of online learning algorithms within this game framework. We then show the flexibility of this framework by analyzing the implicit bias in adversarial training, and again obtain significantly improved convergence rates. △ Less

Submitted 7 April, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: Undated version: New results for implicit bias in adversarial training

arXiv:2302.05765 [pdf, other]

Adversarial Online Collaborative Filtering

Authors: Stephen Pasteris, Fabio Vitale, Mark Herbster, Claudio Gentile, Andre' Panisson

Abstract: We investigate the problem of online collaborative filtering under no-repetition constraints, whereby users need to be served content in an online fashion and a given user cannot be recommended the same content item more than once. We start by designing and analyzing an algorithm that works under biclustering assumptions on the user-item preference matrix, and show that this algorithm exhibits an… ▽ More We investigate the problem of online collaborative filtering under no-repetition constraints, whereby users need to be served content in an online fashion and a given user cannot be recommended the same content item more than once. We start by designing and analyzing an algorithm that works under biclustering assumptions on the user-item preference matrix, and show that this algorithm exhibits an optimal regret guarantee, while being fully adaptive, in that it is oblivious to any prior knowledge about the sequence of users, the universe of items, as well as the biclustering parameters of the preference matrix. We then propose a more robust version of this algorithm which operates with general matrices. Also this algorithm is parameter free, and we prove regret guarantees that scale with the amount by which the preference matrix deviates from a biclustered structure. To our knowledge, these are the first results on online collaborative filtering that hold at this level of generality and adaptivity under no-repetition constraints. Finally, we complement our theoretical findings with simple experiments on real-world datasets aimed at both validating the theory and empirically comparing to standard baselines. This comparison shows the competitive advantage of our approach over these baselines. △ Less

Submitted 29 December, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.03784 [pdf, ps, other]

Leveraging User-Triggered Supervision in Contextual Bandits

Authors: Alekh Agarwal, Claudio Gentile, Teodor V. Marinov

Abstract: We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new fram… ▽ More We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2302.03115 [pdf, other]

Easy Learning from Label Proportions

Authors: Robert Istvan Busa-Fekete, Hee** Choi, Travis Dick, Claudio Gentile, Andres Munoz medina

Abstract: We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on a… ▽ More We consider the problem of Learning from Label Proportions (LLP), a weakly supervised classification setup where instances are grouped into "bags", and only the frequency of class labels at each bag is available. Albeit, the objective of the learner is to achieve low task loss at an individual instance level. Here we propose Easyllp: a flexible and simple-to-implement debiasing approach based on aggregate labels, which operates on arbitrary loss functions. Our technique allows us to accurately estimate the expected loss of an arbitrary model at an individual level. We showcase the flexibility of our approach by applying it to popular learning frameworks, like Empirical Risk Minimization (ERM) and Stochastic Gradient Descent (SGD) with provable guarantees on instance level performance. More concretely, we exhibit a variance reduction technique that makes the quality of LLP learning deteriorate only by a factor of k (k being bag size) in both ERM and SGD setups, as compared to full supervision. Finally, we validate our theoretical results on multiple datasets demonstrating our algorithm performs as well or better than previous LLP approaches in spite of its simplicity. △ Less

Submitted 13 February, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2211.16309 [pdf, other]

A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations

Authors: Sohan Rudra, Saksham Goel, Anirban Santara, Claudio Gentile, Laurent Perron, Fei Xia, Vikas Sindhwani, Carolina Parada, Gaurav Aggarwal

Abstract: Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static obj… ▽ More Object-goal navigation (Object-nav) entails searching, recognizing and navigating to a target object. Object-nav has been extensively studied by the Embodied-AI community, but most solutions are often restricted to considering static objects (e.g., television, fridge, etc.). We propose a modular framework for object-nav that is able to efficiently search indoor environments for not just static objects but also movable objects (e.g. fruits, glasses, phones, etc.) that frequently change their positions due to human intervention. Our contextual-bandit agent efficiently explores the environment by showing optimism in the face of uncertainty and learns a model of the likelihood of spotting different objects from each navigable location. The likelihoods are used as rewards in a weighted minimum latency solver to deduce a trajectory for the robot. We evaluate our algorithms in two simulated environments and a real-world setting, to demonstrate high sample efficiency and reliability. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Shorter version accepted at NeurIPS 2022 Workshop on Robot Learning: Trustworthy Robotics

arXiv:2206.14912 [pdf, ps, other]

Best of Both Worlds Model Selection

Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile

Abstract: We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a… ▽ More We study the problem of model selection in bandit scenarios in the presence of nested policy classes, with the goal of obtaining simultaneous adversarial and stochastic ("best of both worlds") high-probability regret guarantees. Our approach requires that each base learner comes with a candidate regret bound that may or may not hold, while our meta algorithm plays each base learner according to a schedule that keeps the base learner's candidate regret bounds balanced until they are detected to violate their guarantees. We develop careful mis-specification tests specifically designed to blend the above model selection criterion with the ability to leverage the (potentially benign) nature of the environment. We recover the model selection guarantees of the CORRAL algorithm for adversarial environments, but with the additional benefit of achieving high probability regret bounds, specifically in the case of nested adversarial linear bandits. More importantly, our model selection results also hold simultaneously in stochastic environments under gap assumptions. These are the first theoretical results that achieve best of both world (stochastic and adversarial) guarantees while performing model selection in (linear) bandit scenarios. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 10 pages in main, 43 pages appendix

arXiv:2202.05448 [pdf, ps, other]

Fast Rates in Pool-Based Batch Active Learning

Authors: Claudio Gentile, Zhilei Wang, Tong Zhang

Abstract: We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper… ▽ More We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand ({\em pool-based} active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario. △ Less

Submitted 13 June, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: This is an extended version of arXiv:2202.05448v1, which has title "Achieving Minimax Rates in Pool-Based Batch Active Learning" and was accepted by ICML 2022 https://icml.cc/virtual/2022/poster/16505

arXiv:2112.02866 [pdf, ps, other]

Nonstochastic Bandits with Composite Anonymous Feedback

Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Claudio Gentile, Yishay Mansour

Abstract: We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with… ▽ More We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $\sqrt{(d+1)KT}$, where $d$ is the maximum delay, $K$ is the number of arms, and $T$ is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting. △ Less

Submitted 24 September, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2109.06131 [pdf, other]

A Framework for Develo** Algorithms for Estimating Propagation Parameters from Measurements

Authors: Akbar Sayeed, Peter Vouras, Camillo Gentile, Alec Weiss, Jeanne Quimby, Zihang Cheng, Bassel Modad, Yuning Zhang, Chethan An**appa, Fatih Erden, Ozgur Ozdemir, Robert Muller, Diego Dupleich, Han Niu, 6David Michelson, 6Aidan Hughes

Abstract: A framework is proposed for develo** and evaluating algorithms for extracting multipath propagation components (MPCs) from measurements collected by sounders at millimeter-wave (mmW) frequencies. To focus on algorithmic performance, an idealized model is proposed for the spatial frequency response of the propagation environment measured by a sounder. The input to the sounder model is a pre-deter… ▽ More A framework is proposed for develo** and evaluating algorithms for extracting multipath propagation components (MPCs) from measurements collected by sounders at millimeter-wave (mmW) frequencies. To focus on algorithmic performance, an idealized model is proposed for the spatial frequency response of the propagation environment measured by a sounder. The input to the sounder model is a pre-determined set of MPC parameters that serve as the "ground truth." A three-dimensional angle-delay (beamspace) representation of the measured spatial frequency response serves as a natural domain for implementing and analyzing MPC extraction algorithms. Metrics for quantifying the error in estimated MPC parameters are introduced. Initial results are presented for a greedy matching pursuit algorithm that performs a least-squares (LS) reconstruction of the MPC path gains within the iterations. The results indicate that the simple greedy-LS algorithm has the ability to extract MPCs over a large dynamic range, and suggest several avenues for further performance improvement through extensions of the greedy-LS algorithm as well as by incorporating features of other algorithms, such as SAGE and RIMAX. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Journal ref: IEEE Globecom 2020

arXiv:2107.14263 [pdf, other]

Batch Active Learning at Scale

Authors: Gui Citovsky, Giulia DeSalvo, Claudio Gentile, Lazaros Karydas, Anand Rajagopalan, Afshin Rostamizadeh, Sanjiv Kumar

Abstract: The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivit… ▽ More The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings. △ Less

Submitted 29 July, 2021; originally announced July 2021.

arXiv:2107.05745 [pdf, ps, other]

Adapting to Misspecification in Contextual Bandits

Authors: Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Abstract: A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, but typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexibl… ▽ More A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, but typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for $\varepsilon$-misspecified contextual bandits that adapt to unknown model misspecification -- both for finite and infinite action settings. Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge. Specializing to linear contextual bandits with infinite actions in $d$ dimensions, we obtain the first algorithm that achieves the optimal $O(d\sqrt{T} + \varepsilon\sqrt{d}T)$ regret bound for unknown misspecification level $\varepsilon$. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin, which we anticipate will find broader use. △ Less

Submitted 12 July, 2021; originally announced July 2021.

Comments: Appeared at NeurIPS 2020

arXiv:2106.03546 [pdf, other]

On Learning to Rank Long Sequences with Contextual Bandits

Authors: Anirban Santara, Claudio Gentile, Gaurav Aggarwal, Shuai Li

Abstract: Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specia… ▽ More Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We evaluate our algorithms on a number of real-world datasets, and show significantly improved empirical performance as compared to known cascading bandit baselines. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Report number: PMLR 151:767-797

Journal ref: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:767-797, 2022

arXiv:2106.03243 [pdf, ps, other]

Neural Active Learning with Performance Guarantees

Authors: Pranjal Awasthi, Christoph Dann, Claudio Gentile, Ayush Sekhari, Zhilei Wang

Abstract: We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the… ▽ More We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting. △ Less

Submitted 6 June, 2021; originally announced June 2021.

Comments: 30 pages

arXiv:2012.13045 [pdf, ps, other]

Regret Bound Balancing and Elimination for Model Selection in Bandits and RL

Authors: Aldo Pacchiano, Christoph Dann, Claudio Gentile, Peter Bartlett

Abstract: We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regr… ▽ More We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base algorithm comes with a candidate regret bound that may or may not hold during all rounds. In each round, our approach plays a base algorithm to keep the candidate regret bounds of all remaining base algorithms balanced, and eliminates algorithms that violate their candidate bound. We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor. This factor is reasonably small in several applications, including linear bandits and MDPs with nested function classes, linear bandits with unknown misspecification, and LinUCB applied to linear bandits with different confidence parameters. We further show that, under a suitable gap-assumption, this factor only scales with the number of base algorithms and not their complexity when the number of rounds is large enough. Finally, unlike recent efforts in model selection for linear stochastic bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment, rather than a stochastic one. △ Less

Submitted 23 December, 2020; originally announced December 2020.

Comments: 57 pages

arXiv:2012.03522 [pdf, ps, other]

Online Model Selection: a Rested Bandit Formulation

Authors: Leonardo Cella, Claudio Gentile, Massimiliano Pontil

Abstract: Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learn… ▽ More Motivated by a natural problem in online model selection with bandit information, we introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm expected losses decrease with the number of times the arm has been played. The shape of the expected loss functions is similar across arms, and is assumed to be available up to unknown parameters that have to be learned on the fly. We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game. We analyze an arm elimination algorithm whose regret vanishes as the time horizon increases. The actual rate of convergence depends in a detailed way on the postulated functional form of the expected losses. Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function so as to identify the best arm as quickly as possible. We complement our analysis with a lower bound, indicating strengths and limitations of the proposed solution. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2006.01235 [pdf, ps, other]

doi 10.1109/GLOBECOM42002.2020.9322374

Quasi-Deterministic Channel Model for mmWaves: Mathematical Formalization and Validation

Authors: Mattia Lecci, Michele Polese, Chieh** Lai, Jian Wang, Camillo Gentile, Nada Golmie, Michele Zorzi

Abstract: 5G and beyond networks will use, for the first time ever, the millimeter wave (mmWave) spectrum for mobile communications. Accurate performance evaluation is fundamental to the design of reliable mmWave networks, with accuracy rooted in the fidelity of the channel models. At mmWaves, the model must account for the spatial characteristics of propagation since networks will employ highly directional… ▽ More 5G and beyond networks will use, for the first time ever, the millimeter wave (mmWave) spectrum for mobile communications. Accurate performance evaluation is fundamental to the design of reliable mmWave networks, with accuracy rooted in the fidelity of the channel models. At mmWaves, the model must account for the spatial characteristics of propagation since networks will employ highly directional antennas to counter the much greater pathloss. In this regard, Quasi-Deterministic (QD) models are highly accurate channel models, which characterize the propagation in terms of clusters of multipath components, given by a reflected ray and multiple diffuse components of any given Computer Aided Design (CAD) scenario. This paper introduces a detailed mathematical formulation for QD models at mmWaves, that can be used as a reference for their implementation and development. Moreover, it compares channel instances obtained with an open source NIST QD model implementation against real measurements at 60 GHz, substantiating the accuracy of the model. Results show that, when comparing the proposed model and deterministic rays alone with a measurement campaign, the Kolmogorov-Smirnov (KS) test of the QD model improves by up to 0.537. △ Less

Submitted 9 February, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: 6 pages, 5 figures, 1 table, presented at IEEE GLOBECOM 2020. Please cite it as: M. Lecci, M. Polese, C. Lai, J. Wang, C. Gentile, N. Golmie, M. Zorzi, "Quasi-Deterministic Channel Model for mmWaves: Mathematical Formalization and Validation," IEEE Global Communications Conference (GLOBECOM), Dec. 2020, Taipei, Taiwan

arXiv:2002.09179 [pdf, other]

doi 10.1109/ITA50056.2020.9244950

Simplified Ray Tracing for the Millimeter Wave Channel: A Performance Evaluation

Authors: Mattia Lecci, Paolo Testolina, Marco Giordani, Michele Polese, Tanguy Ropitault, Camillo Gentile, Neeraj Varshney, Anuraag Bodi, Michele Zorzi

Abstract: Millimeter-wave (mmWave) communication is one of the cornerstone innovations of fifth-generation (5G) wireless networks, thanks to the massive bandwidth available in these frequency bands. To correctly assess the performance of such systems, however, it is essential to have reliable channel models, based on a deep understanding of the propagation characteristics of the mmWave signal. In this respe… ▽ More Millimeter-wave (mmWave) communication is one of the cornerstone innovations of fifth-generation (5G) wireless networks, thanks to the massive bandwidth available in these frequency bands. To correctly assess the performance of such systems, however, it is essential to have reliable channel models, based on a deep understanding of the propagation characteristics of the mmWave signal. In this respect, ray tracers can provide high accuracy, at the expense of a significant computational complexity, which limits the scalability of simulations. To address this issue, in this paper we present possible simplifications that can reduce the complexity of ray tracing in the mmWave environment, without significantly affecting the accuracy of the model. We evaluate the effect of such simplifications on link-level metrics, testing different configuration parameters and propagation scenarios. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: 6 pages, 6 figures, 1 table. This paper has been accepted for presentation at ITA 2020. (c) 2020 IEEE. Please cite it as: M. Lecci, P. Testolina, M. Giordani, M. Polese, T. Ropitault, C. Gentile, N. Varshney, A. Bodi, M. Zorzi, "Simplified Ray Tracing for the Millimeter Wave Channel: A Performance Evaluation," Information Theory and Applications Workshop (ITA), San Diego, US, 2020

arXiv:2002.07348 [pdf, other]

Adaptive Region-Based Active Learning

Authors: Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

Abstract: We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some m… ▽ More We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and non-adaptive region-based active learning baselines. △ Less

Submitted 17 February, 2020; originally announced February 2020.

arXiv:1906.09458 [pdf, other]

Flattening a Hierarchical Clustering through Active Learning

Authors: Fabio Vitale, Anand Rajagopalan, Claudio Gentile

Abstract: We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds.… ▽ More We investigate active learning by pairwise similarity over the leaves of trees originating from hierarchical clustering procedures. In the realizable setting, we provide a full characterization of the number of queries needed to achieve perfect reconstruction of the tree cut. In the non-realizable setting, we rely on known important-sampling procedures to obtain regret and query complexity bounds. Our algorithms come with theoretical guarantees on the statistical error and, more importantly, lend themselves to linear-time implementations in the relevant parameters of the problem. We discuss such implementations, prove running time guarantees for them, and present preliminary experiments on real-world datasets showing the compelling practical performance of our algorithms as compared to both passive learning and simple active learning baselines. △ Less

Submitted 12 October, 2019; v1 submitted 22 June, 2019; originally announced June 2019.

arXiv:1806.01182 [pdf, other]

Online Reciprocal Recommendation with Theoretical Performance Guarantees

Authors: Fabio Vitale, Nikos Parotsidis, Claudio Gentile

Abstract: A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match re… ▽ More A reciprocal recommendation problem is one where the goal of learning is not just to predict a user's preference towards a passive item (e.g., a book), but to recommend the targeted user on one side another user from the other side such that a mutual interest between the two exists. The problem thus is sharply different from the more traditional items-to-users recommendation, since a good match requires meeting the preferences of both users. We initiate a rigorous theoretical investigation of the reciprocal recommendation task in a specific framework of sequential learning. We point out general limitations, formulate reasonable assumptions enabling effective learning and, under these assumptions, we design and analyze a computationally efficient algorithm that uncovers mutual likes at a pace comparable to those achieved by a clearvoyant algorithm knowing all user preferences in advance. Finally, we validate our algorithm against synthetic and real-world datasets, showing improved empirical performance over simple baselines. △ Less

Submitted 4 June, 2018; originally announced June 2018.

arXiv:1706.06474 [pdf, other]

On Pairwise Clustering with Side Information

Authors: Stephen Pasteris, Fabio Vitale, Claudio Gentile, Mark Herbster

Abstract: Pairwise clustering, in general, partitions a set of items via a known similarity function. In our treatment, clustering is modeled as a transductive prediction problem. Thus rather than beginning with a known similarity function, the function instead is hidden and the learner only receives a random sample consisting of a subset of the pairwise similarities. An additional set of pairwise side-info… ▽ More Pairwise clustering, in general, partitions a set of items via a known similarity function. In our treatment, clustering is modeled as a transductive prediction problem. Thus rather than beginning with a known similarity function, the function instead is hidden and the learner only receives a random sample consisting of a subset of the pairwise similarities. An additional set of pairwise side-information may be given to the learner, which then determines the inductive bias of our algorithms. We measure performance not based on the recovery of the hidden similarity function, but instead on how well we classify each item. We give tight bounds on the number of misclassifications. We provide two algorithms. The first algorithm SACA is a simple agglomerative clustering algorithm which runs in near linear time, and which serves as a baseline for our analyses. Whereas the second algorithm, RGCA, enables the incorporation of side-information which may lead to improved bounds at the cost of a longer running time. △ Less

Submitted 18 June, 2017; originally announced June 2017.

arXiv:1705.10257 [pdf, ps, other]

Boltzmann Exploration Done Right

Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

Abstract: Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optima… ▽ More Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optimal actions or spending too much time exploring the suboptimal ones? What is the right tuning for the learning rate? In this paper, we address several of these questions in the classic setup of stochastic multi-armed bandits. One of our main results is showing that the Boltzmann exploration strategy with any monotone learning-rate sequence will induce suboptimal behavior. As a remedy, we offer a simple non-monotone schedule that guarantees near-optimal performance, albeit only when given prior access to key problem parameters that are typically not available in practical situations (like the time horizon $T$ and the suboptimality gap $Δ$). More importantly, we propose a novel variant that uses different learning rates for different arms, and achieves a distribution-dependent regret bound of order $\frac{K\log^2 T}Δ$ and a distribution-independent bound of order $\sqrt{KT}\log K$ without requiring such prior knowledge. To demonstrate the flexibility of our technique, we also propose a variant that guarantees the same performance bounds even if the rewards are heavy-tailed. △ Less

Submitted 7 November, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

arXiv:1703.03478 [pdf, other]

Online Learning with Abstention

Authors: Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott Yang

Abstract: We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedba… ▽ More We present an extensive study of the key problem of online learning where algorithms are allowed to abstain from making predictions. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to time-varying feedback graphs, as needed in this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and is adapted to time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a possible, but limited, extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as more standard baselines. △ Less

Submitted 14 November, 2019; v1 submitted 9 March, 2017; originally announced March 2017.

arXiv:1702.08211 [pdf, ps, other]

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Authors: Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz

Abstract: We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz… ▽ More We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz and semi-Lipschitz losses with regret bounds improving on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound. △ Less

Submitted 30 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Comments: This document is the full version of an extended abstract accepted for presentation at COLT 2017

arXiv:1608.03544 [pdf, other]

On Context-Dependent Clustering of Bandits

Authors: Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Evans Etrue, Giovanni Zappella

Abstract: We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving… ▽ More We investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that implements the underlying feedback sharing mechanism by estimating the neighborhood of users in a context-dependent manner. CAB makes sharp departures from the state of the art by incorporating collaborative effects into inference as well as learning processes in a manner that seamlessly interleaving explore-exploit tradeoffs and collaborative steps. We prove regret bounds under various assumptions on the data, which exhibit a crisp dependence on the expected number of clusters over the users, a natural measure of the statistical difficulty of the learning task. Experiments on production and real-world datasets show that CAB offers significantly increased prediction performance against a representative pool of state-of-the-art methods. △ Less

Submitted 27 February, 2017; v1 submitted 6 August, 2016; originally announced August 2016.

arXiv:1606.00182 [pdf, other]

On the Troll-Trust Model for Edge Sign Prediction in Social Networks

Authors: Géraud Le Falher, Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale

Abstract: In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges.… ▽ More In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled. △ Less

Submitted 28 February, 2017; v1 submitted 1 June, 2016; originally announced June 2016.

Comments: v5: accepted to AISTATS 2017

arXiv:1605.00596 [pdf, other]

Graph Clustering Bandits for Recommendation

Authors: Shuai Li, Claudio Gentile, Alexandros Karatzoglou

Abstract: We investigate an efficient context-dependent clustering technique for recommender systems based on exploration-exploitation strategies through multi-armed bandits over multiple users. Our algorithm dynamically groups users based on their observed behavioral similarity during a sequence of logged activities. In doing so, the algorithm reacts to the currently served user by sha** clusters around… ▽ More We investigate an efficient context-dependent clustering technique for recommender systems based on exploration-exploitation strategies through multi-armed bandits over multiple users. Our algorithm dynamically groups users based on their observed behavioral similarity during a sequence of logged activities. In doing so, the algorithm reacts to the currently served user by sha** clusters around him/her but, at the same time, it explores the generation of clusters over users which are not currently engaged. We motivate the effectiveness of this clustering policy, and provide an extensive empirical analysis on real-world datasets, showing scalability and improved prediction performance over state-of-the-art methods for sequential clustering of users in multi-armed bandit scenarios. △ Less

Submitted 2 May, 2016; originally announced May 2016.

arXiv:1602.04741 [pdf, other]

Delay and Cooperation in Nonstochastic Bandits

Authors: Nicolo' Cesa-Bianchi, Claudio Gentile, Yishay Mansour, Alberto Minora

Abstract: We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than $d$ hops to arrive, where $d$ is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove… ▽ More We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than $d$ hops to arrive, where $d$ is a delay parameter. We introduce \textsc{Exp3-Coop}, a cooperative version of the {\sc Exp3} algorithm and prove that with $K$ actions and $N$ agents the average per-agent regret after $T$ rounds is at most of order $\sqrt{\bigl(d+1 + \tfrac{K}{N}α_{\le d}\bigr)(T\ln K)}$, where $α_{\le d}$ is the independence number of the $d$-th power of the connected communication graph $G$. We then show that for any connected graph, for $d=\sqrt{K}$ the regret bound is $K^{1/4}\sqrt{T}$, strictly better than the minimax regret $\sqrt{KT}$ for noncooperating agents. More informed choices of $d$ lead to bounds which are arbitrarily close to the full information minimax regret $\sqrt{T\ln K}$ when $G$ is dense. When $G$ has sparse components, we show that a variant of \textsc{Exp3-Coop}, allowing agents to choose their parameters according to their centrality in $G$, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay. △ Less

Submitted 1 June, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

Comments: 30 pages

arXiv:1502.03473 [pdf, other]

Collaborative Filtering Bandits

Authors: Shuai Li, Alexandros Karatzoglou, Claudio Gentile

Abstract: Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recom… ▽ More Classical collaborative filtering, and content-based filtering methods try to learn a static recommendation model given training data. These approaches are far from ideal in highly dynamic recommendation domains such as news recommendation and computational advertisement, where the set of items and users is very fluid. In this work, we investigate an adaptive clustering technique for content recommendation based on exploration-exploitation strategies in contextual multi-armed bandit settings. Our algorithm takes into account the collaborative effects that arise due to the interaction of the users with the items, by dynamically grou** users based on the items under consideration and, at the same time, grou** items based on the similarity of the clusterings induced over the users. The resulting algorithm thus takes advantage of preference patterns in the data in a way akin to collaborative filtering methods. We provide an empirical analysis on medium-size real-world datasets, showing scalability and increased prediction performance (as measured by click-through rate) over state-of-the-art methods for clustering bandits. We also provide a regret analysis within a standard linear stochastic noise setting. △ Less

Submitted 31 May, 2016; v1 submitted 11 February, 2015; originally announced February 2015.

Comments: The 39th SIGIR (SIGIR 2016)

arXiv:1409.8428 [pdf, other]

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

Authors: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

Abstract: We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and i… ▽ More We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting, and provide tight regret bounds depending on combinatorial properties of the information feedback structure. △ Less

Submitted 30 September, 2014; originally announced September 2014.

Comments: Preliminary versions of parts of this paper appeared in [1,20], and also as arXiv papers arXiv:1106.2436 and arXiv:1307.4564

arXiv:1401.8257 [pdf, other]

Online Clustering of Bandits

Authors: Claudio Gentile, Shuai Li, Giovanni Zappella

Abstract: We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant incre… ▽ More We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems. △ Less

Submitted 6 June, 2014; v1 submitted 31 January, 2014; originally announced January 2014.

Comments: In E. Xing and T. Jebara (Eds.), Proceedings of 31st International Conference on Machine Learning, Journal of Machine Learning Research Workshop and Conference Proceedings, Vol.32 (JMLR W&CP-32), Bei**g, China, Jun. 21-26, 2014 (ICML 2014), Submitted by Shuai Li (https://sites.google.com/site/shuailidotsli)

arXiv:1307.4564 [pdf, ps, other]

From Bandits to Experts: A Tale of Domination and Independence

Authors: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour

Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before… ▽ More We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner. △ Less

Submitted 17 July, 2013; originally announced July 2013.

arXiv:1306.0811 [pdf, other]

A Gang of Bandits

Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead t… ▽ More Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global strategy which allocates a bandit algorithm to each network node (user) and allows it to "share" signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a marked increase in prediction performance obtained by exploiting the network structure. △ Less

Submitted 4 November, 2013; v1 submitted 4 June, 2013; originally announced June 2013.

Comments: NIPS 2013

arXiv:1302.7263 [pdf, other]

Online Similarity Prediction of Networked Data from Known and Unknown Graphs

Authors: Claudio Gentile, Mark Herbster, Stephen Pasteris

Abstract: We consider online similarity prediction problems over networked data. We begin by relating this task to the more standard class prediction problem, showing that, given an arbitrary algorithm for class prediction, we can construct an algorithm for similarity prediction with "nearly" the same mistake bound, and vice versa. After noticing that this general construction is computationally infeasible,… ▽ More We consider online similarity prediction problems over networked data. We begin by relating this task to the more standard class prediction problem, showing that, given an arbitrary algorithm for class prediction, we can construct an algorithm for similarity prediction with "nearly" the same mistake bound, and vice versa. After noticing that this general construction is computationally infeasible, we target our study to {\em feasible} similarity prediction algorithms on networked data. We initially assume that the network structure is {\em known} to the learner. Here we observe that Matrix Winnow \cite{w07} has a near-optimal mistake guarantee, at the price of cubic prediction time per round. This motivates our effort for an efficient implementation of a Perceptron algorithm with a weaker mistake guarantee but with only poly-logarithmic prediction time. Our focus then turns to the challenging case of networks whose structure is initially {\em unknown} to the learner. In this novel setting, where the network structure is only incrementally revealed, we obtain a mistake-bounded algorithm with a quadratic prediction time per round. △ Less

Submitted 15 March, 2013; v1 submitted 28 February, 2013; originally announced February 2013.

arXiv:1301.5160 [pdf, other]

See the Tree Through the Lines: The Shazoo Algorithm -- Full Version --

Authors: Fabio Vitale, Nicolo Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully sati… ▽ More Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, Shazoo, which is nearly optimal on any weighted tree. Moreover, we show that Shazoo can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that Shazoo performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. △ Less

Submitted 28 February, 2013; v1 submitted 22 January, 2013; originally announced January 2013.

arXiv:1301.5112 [pdf, ps, other]

Active Learning on Trees and Graphs

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We investigate the problem of active learning on a given tree whose nodes are assigned binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we characterize (up to constant factors) the optimal placement of queries so to minimize the mistakes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and the optimal number of mistakes on… ▽ More We investigate the problem of active learning on a given tree whose nodes are assigned binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we characterize (up to constant factors) the optimal placement of queries so to minimize the mistakes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and the optimal number of mistakes on the non-queried nodes is achieved by a simple and efficient mincut classifier. Through a simple modification of the query selection algorithm we also show optimality (up to constant factors) with respect to the trade-off between number of queries and number of mistakes on non-queried nodes. By using spanning trees, our algorithms can be efficiently applied to general graphs, although the problem of finding optimal and efficient active learning algorithms for general graphs remains open. Towards this end, we provide a lower bound on the number of mistakes made on arbitrary graphs by any active learning algorithm using a number of queries which is up to a constant fraction of the graph size. △ Less

Submitted 22 January, 2013; originally announced January 2013.

arXiv:1301.4769 [pdf, other]

A Correlation Clustering Approach to Link Classification in Signed Networks -- Full Version --

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new fa… ▽ More Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks. △ Less

Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

arXiv:1301.4767 [pdf, other]

A Linear Time Active Learning Algorithm for Link Classification -- Full Version --

Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant facto… ▽ More We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant factor) number of mistakes on any graph G = (V,E) such that |E| = Ω(|V|^{3/2}) by querying O(|V|^{3/2}) edge labels. More generally, we show an algorithm that achieves optimality to within a factor of O(k) by querying at most order of |V| + (|V|/k)^{3/2} edge labels. The running time of this algorithm is at most of order |E| + |V|\log|V|. △ Less

Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

arXiv:1212.5637 [pdf, other]

Random Spanning Trees and the Prediction of Weighted Graphs

Authors: Nicolo' Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph no… ▽ More We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world datasets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice. △ Less

Submitted 21 December, 2012; originally announced December 2012.

Comments: Appeared in ICML 2010

arXiv:1207.0166 [pdf, other]

On Multilabel Classification and Ranking with Partial Feedback

Authors: Claudio Gentile, Francesco Orabona

Abstract: We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O… ▽ More We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We show O(T^{1/2} log T) regret bounds, which improve in several ways on the existing results. We test the effectiveness of our upper-confidence scheme by contrasting against full-information baselines on real-world multilabel datasets, often obtaining comparable performance. △ Less

Submitted 16 January, 2013; v1 submitted 30 June, 2012; originally announced July 2012.

arXiv:1109.2296 [pdf, other]

Bandits with an Edge

Authors: Dotan Di Castro, Claudio Gentile, Shie Mannor

Abstract: We consider a bandit problem over a graph where the rewards are not directly observed. Instead, the decision maker can compare two nodes and receive (stochastic) information pertaining to the difference in their value. The graph structure describes the set of possible comparisons. Consequently, comparing between two nodes that are relatively far requires estimating the difference between every pai… ▽ More We consider a bandit problem over a graph where the rewards are not directly observed. Instead, the decision maker can compare two nodes and receive (stochastic) information pertaining to the difference in their value. The graph structure describes the set of possible comparisons. Consequently, comparing between two nodes that are relatively far requires estimating the difference between every pair of nodes on the path between them. We analyze this problem from the perspective of sample complexity: How many queries are needed to find an approximately optimal node with probability more than $1-δ$ in the PAC setup? We show that the topology of the graph plays a crucial in defining the sample complexity: graphs with a low diameter have a much better sample complexity. △ Less

Submitted 11 September, 2011; originally announced September 2011.

arXiv:1105.2550

A Maximal Large Deviation Inequality for Sub-Gaussian Variables

Authors: Dotan Di Castro, Claudio Gentile, Shie Mannor

Abstract: In this short note we prove a maximal concentration lemma for sub-Gaussian random variables stating that for independent sub-Gaussian random variables we have \[P<(\max_{1\le i\le N}S_{i}>ε>) \le\exp<(-\frac{1}{N^2}\sum_{i=1}^{N}\frac{ε^{2}}{2σ_{i}^{2}}>), \] where $S_i$ is the sum of $i$ zero mean independent sub-Gaussian random variables and $σ_i$ is the variance of the $i$th random variable. In this short note we prove a maximal concentration lemma for sub-Gaussian random variables stating that for independent sub-Gaussian random variables we have \[P<(\max_{1\le i\le N}S_{i}>ε>) \le\exp<(-\frac{1}{N^2}\sum_{i=1}^{N}\frac{ε^{2}}{2σ_{i}^{2}}>), \] where $S_i$ is the sum of $i$ zero mean independent sub-Gaussian random variables and $σ_i$ is the variance of the $i$th random variable. △ Less

Submitted 25 July, 2011; v1 submitted 12 May, 2011; originally announced May 2011.

Comments: This paper has been withdrawn by the authors due to a crucial error in the last sentence of the proof of Theorem 1: "we can take the infimum of the r.h.s. over s, which yields (1)." This statement is only true if a single value of s yields the supremum of (ε_i s - ρ_i(s)) simultaneously for every i

Showing 1–47 of 47 results for author: Gentile, C