Skip to main content

Showing 1–47 of 47 results for author: Cesa-Bianchi, N

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.16802  [pdf, other

    cs.LG stat.ML

    Improved Regret Bounds for Bandits with Expert Advice

    Authors: Nicolò Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia Olkhovskaya

    Abstract: In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.10529  [pdf, ps, other

    cs.LG cs.AI stat.ML

    A Theory of Interpretable Approximations

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen

    Abstract: Can a deep neural network be approximated by a small decision tree based on simple features? This question and its variants are behind the growing demand for machine learning models that are *interpretable* by humans. In this work we study such questions by introducing *interpretable approximations*, a notion that captures the idea of approximating a target concept $c$ by a small aggregation of co… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear at COLT 2024

  3. arXiv:2406.01192  [pdf, other

    cs.LG stat.ML

    Sparsity-Agnostic Linear Bandits with Adaptive Adversaries

    Authors: Tianyuan **, Kyoungseok Jang, Nicolò Cesa-Bianchi

    Abstract: We study stochastic linear bandits where, in each round, the learner receives a set of actions (i.e., feature vectors), from which it chooses an element and obtains a stochastic reward. The expected reward is a fixed but unknown linear function of the chosen action. We study sparse regret bounds, that depend on the number $S$ of non-zero coefficients in the linear reward function. Previous works f… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 25 pages

  4. arXiv:2402.10282  [pdf, other

    cs.LG stat.ML

    Information Capacity Regret Bounds for Bandits with Mediator Feedback

    Authors: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

    Abstract: This work addresses the mediator feedback problem, a bandit game where the decision set consists of a number of policies, each associated with a probability distribution over a common space of outcomes. Upon choosing a policy, the learner observes an outcome sampled from its distribution and incurs the loss assigned to this outcome in the present round. We introduce the policy set capacity as an i… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  5. arXiv:2312.15433  [pdf, ps, other

    cs.LG stat.ML

    Best-of-Both-Worlds Algorithms for Linear Contextual Bandits

    Authors: Yuko Kuroki, Alberto Rumi, Taira Tsuchiya, Fabio Vitale, Nicolò Cesa-Bianchi

    Abstract: We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits. Our algorithms deliver near-optimal regret bounds in both the adversarial and stochastic regimes, without prior knowledge about the environment. In the stochastic regime, we achieve the polylogarithmic rate $\frac{(dK)^2\mathrm{poly}\log(dKT)}{Δ_{\min}}$, where $Δ_{\min}$ is the minimum suboptimality gap over the $d$-… ▽ More

    Submitted 19 February, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: Accepted at AISTATS2024

  6. arXiv:2310.09597  [pdf, other

    econ.EM cs.LG stat.ML

    Adaptive maximization of social welfare

    Authors: Nicolo Cesa-Bianchi, Roberto Colomboni, Maximilian Kasy

    Abstract: We consider the problem of repeatedly choosing policies to maximize social welfare. Welfare is a weighted sum of private utility and public revenue. Earlier outcomes inform later policies. Utility is not observed, but indirectly inferred. Response functions are learned through experimentation. We derive a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 alg… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  7. arXiv:2307.00836  [pdf, other

    stat.ML cs.LG

    Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts

    Authors: Dirk van der Hoeven, Ciara Pike-Burke, Hao Qiu, Nicolo Cesa-Bianchi

    Abstract: We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz "productivity" function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a w… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: ICML 2023

  8. arXiv:2303.08102  [pdf, ps, other

    cs.LG stat.ML

    Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

    Authors: Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

    Abstract: We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic quantities that measure the similarity between experts. In some natural special cases, this allows us to obtain the first regret bound for EXP4 that can get arbitr… ▽ More

    Submitted 15 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  9. arXiv:2206.02656  [pdf, ps, other

    cs.LG stat.ML

    A Regret-Variance Trade-Off in Online Learning

    Authors: Dirk van der Hoeven, Nikita Zhivotovskiy, Nicolò Cesa-Bianchi

    Abstract: We consider prediction with expert advice for strongly convex and bounded losses, and investigate trade-offs between regret and "variance" (i.e., squared difference of learner's predictions and best expert predictions). With $K$ experts, the Exponentially Weighted Average (EWA) algorithm is known to achieve $O(\log K)$ regret. We prove that a variant of EWA either achieves a negative regret (i.e.,… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

  10. arXiv:2106.04982  [pdf, other

    cs.LG stat.ML

    Cooperative Online Learning with Feedback Graphs

    Authors: Nicolò Cesa-Bianchi, Tommaso R. Cesari, Riccardo Della Vecchia

    Abstract: We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph. We characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network. Our analysis recovers as special cases many previousl… ▽ More

    Submitted 24 September, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  11. arXiv:2106.04913  [pdf, ps, other

    cs.LG stat.ML

    On Margin-Based Cluster Recovery with Oracle Queries

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

    Abstract: We study an active cluster recovery problem where, given a set of $n$ points and an oracle answering queries like "are these two points in the same cluster?", the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous work, the classic SVM… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

  12. arXiv:2102.09864  [pdf, other

    cs.LG stat.ML

    An Algorithm for Stochastic and Adversarial Bandits with Switching Costs

    Authors: Chloé Rouyer, Yevgeny Seldin, Nicolò Cesa-Bianchi

    Abstract: We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price $λ$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax opt… ▽ More

    Submitted 19 February, 2021; originally announced February 2021.

  13. arXiv:2102.08754  [pdf, ps, other

    cs.LG econ.TH stat.ML

    A Regret Analysis of Bilateral Trade

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi

    Abstract: Bilateral trade, a fundamental topic in economics, models the problem of intermediating between two strategic agents, a seller and a buyer, willing to trade a good for which they hold private valuations. Despite the simplicity of this problem, a classical result by Myerson and Satterthwaite (1983) affirms the impossibility of designing a mechanism which is simultaneously efficient, incentive compa… ▽ More

    Submitted 16 February, 2021; originally announced February 2021.

    Journal ref: EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation (2021))

  14. arXiv:2102.00504  [pdf, other

    cs.LG stat.ML

    Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

    Abstract: We investigate the problem of exact cluster recovery using oracle queries. Previous results show that clusters in Euclidean spaces that are convex and separated with a margin can be reconstructed exactly using only $O(\log n)$ same-cluster queries, where $n$ is the number of input points. In this work, we study this problem in the more challenging non-convex setting. We introduce a structural char… ▽ More

    Submitted 13 July, 2021; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2021

  15. arXiv:2006.04675  [pdf, other

    cs.LG stat.ML

    Exact Recovery of Mangled Clusters with Same-Cluster Queries

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

    Abstract: We study the cluster recovery problem in the semi-supervised active clustering framework. Given a finite set of input points, and an oracle revealing whether any two points lie in the same cluster, our goal is to recover all clusters exactly using as few queries as possible. To this end, we relax the spherical $k$-means cluster assumption of Ashtiani et al.\ to allow for arbitrary ellipsoidal clus… ▽ More

    Submitted 30 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020 (oral)

  16. arXiv:2002.01882  [pdf, other

    cs.LG stat.ML

    Locally-Adaptive Nonparametric Online Learning

    Authors: Ilja Kuzborskij, Nicolò Cesa-Bianchi

    Abstract: One of the main strengths of online algorithms is their ability to adapt to arbitrary data sequences. This is especially important in nonparametric settings, where performance is measured against rich classes of comparator functions that are able to fit complex environments. Although such hard comparators and complex environments may exhibit local regularities, efficient algorithms, which can prov… ▽ More

    Submitted 1 November, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

  17. arXiv:1910.02757  [pdf, other

    stat.ML cs.LG

    Stochastic Bandits with Delay-Dependent Payoffs

    Authors: Leonardo Cella, Nicolò Cesa-Bianchi

    Abstract: Motivated by recommendation problems in music streaming platforms, we propose a nonstationary stochastic bandit model in which the expected reward of an arm depends on the number of rounds that have passed since the arm was last pulled. After proving that finding an optimal policy is NP-hard even when all model parameters are known, we introduce a class of ranking policies provably approximating,… ▽ More

    Submitted 19 February, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

  18. arXiv:1906.00670  [pdf, other

    cs.LG stat.ML

    Nonstochastic Multiarmed Bandits with Unrestricted Delays

    Authors: Tobias Sommer Thune, Nicolò Cesa-Bianchi, Yevgeny Seldin

    Abstract: We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that "delayed" Exp3 achieves the $O(\sqrt{(KT + D)\ln K} )$ regret bound conjectured by Cesa-Bianchi et al. [2019] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm… ▽ More

    Submitted 19 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 9 pages, Neurips camera ready

  19. arXiv:1905.11902  [pdf, other

    cs.LG stat.ML

    Correlation Clustering with Adaptive Similarity Queries

    Authors: Marco Bressan, Nicolò Cesa-Bianchi, Andrea Paudice, Fabio Vitale

    Abstract: In correlation clustering, we are given $n$ objects together with a binary similarity score between each pair of them. The goal is to partition the objects into clusters so to minimise the disagreements with the scores. In this work we investigate correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disag… ▽ More

    Submitted 14 January, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

  20. arXiv:1905.11797  [pdf, ps, other

    cs.LG stat.ML

    ROI Maximization in Stochastic Online Decision-Making

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

    Abstract: We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making. Our setting is motivated by the use case of companies that regularly receive proposals for technological innovations and want to quickly decide whether they are worth implementing. We design an algorithm for learning ROI-maximizing decision-making policies over a sequence of innovati… ▽ More

    Submitted 22 December, 2021; v1 submitted 28 May, 2019; originally announced May 2019.

  21. arXiv:1902.01846  [pdf, other

    cs.LG stat.ML

    Distribution-Dependent Analysis of Gibbs-ERM Principle

    Authors: Ilja Kuzborskij, Nicolò Cesa-Bianchi, Csaba Szepesvári

    Abstract: Gibbs-ERM learning is a natural idealized model of learning with stochastic optimization algorithms (such as Stochastic Gradient Langevin Dynamics and ---to some extent--- Stochastic Gradient Descent), while it also arises in other contexts, including PAC-Bayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by a Gibbs-ERM learner that uses non-convex, regularize… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

  22. arXiv:1901.08082  [pdf, ps, other

    cs.LG stat.ML

    Cooperative Online Learning: Kee** your Neighbors Updated

    Authors: Nicolò Cesa-Bianchi, Tommaso R. Cesari, Claire Monteleoni

    Abstract: We study an asynchronous online learning setting with a network of agents. At each time step, some of the agents are activated, requested to make a prediction, and pay the corresponding loss. The loss function is then revealed to these agents and also to their neighbors in the network. Our results characterize how much knowing the network structure affects the regret as a function of the model of… ▽ More

    Submitted 15 January, 2020; v1 submitted 23 January, 2019; originally announced January 2019.

  23. arXiv:1809.11033  [pdf, other

    cs.LG stat.ML

    Efficient Linear Bandits through Matrix Sketching

    Authors: Ilja Kuzborskij, Leonardo Cella, Nicolò Cesa-Bianchi

    Abstract: We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $Ω(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of c… ▽ More

    Submitted 21 March, 2022; v1 submitted 28 September, 2018; originally announced September 2018.

  24. arXiv:1807.03288  [pdf, ps, other

    cs.LG stat.ML

    Dynamic Pricing with Finitely Many Unknown Valuations

    Authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet

    Abstract: Motivated by posted price auctions where buyers are grouped in an unknown number of latent types characterized by their private values for the good on sale, we investigate revenue maximization in stochastic dynamic pricing when the distribution of buyers' private values is supported on an unknown set of points in [0,1] of unknown cardinality $K$. This setting can be viewed as an instance of a stoc… ▽ More

    Submitted 5 March, 2019; v1 submitted 9 July, 2018; originally announced July 2018.

  25. arXiv:1805.07331  [pdf, other

    cs.LG q-bio.QM stat.ML

    Positive and Unlabeled Learning through Negative Selection and Imbalance-aware Classification

    Authors: Marco Frasca, Nicolò Cesa-Bianchi

    Abstract: Motivated by applications in protein function prediction, we consider a challenging supervised classification setting in which positive labels are scarce and there are no explicit negative labels. The learning algorithm must thus select which unlabeled examples to use as negative training points, possibly ending up with an unbalanced learning problem. We address these issues by proposing an algori… ▽ More

    Submitted 25 January, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

  26. arXiv:1705.10257  [pdf, ps, other

    cs.LG stat.ML

    Boltzmann Exploration Done Right

    Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

    Abstract: Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optima… ▽ More

    Submitted 7 November, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

  27. arXiv:1705.05091  [pdf, ps, other

    cs.LG stat.ML

    Bandit Regret Scaling with the Effective Loss Range

    Authors: Nicolò Cesa-Bianchi, Ohad Shamir

    Abstract: We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent impossibility result, we show how this can be made possible under certain mild additional assumptions, such as availability of rough estimates of the losses, or advanc… ▽ More

    Submitted 2 January, 2020; v1 submitted 15 May, 2017; originally announced May 2017.

    Comments: The results in section 4 are incorrect as stated -- we have added an erratum at the beginning of the document. The results in the other sections are still valid. We thank Étienne de Montbrun for locating the error

  28. arXiv:1702.08211  [pdf, ps, other

    stat.ML cs.LG math.ST

    Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

    Authors: Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz

    Abstract: We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz… ▽ More

    Submitted 30 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: This document is the full version of an extended abstract accepted for presentation at COLT 2017

  29. arXiv:1604.02855  [pdf, other

    stat.ML cs.CV cs.LG

    Active Learning for Online Recognition of Human Activities from Streaming Videos

    Authors: Rocco De Rosa, Ilaria Gori, Fabio Cuzzolin, Barbara Caputo, Nicolò Cesa-Bianchi

    Abstract: Recognising human activities from streaming videos poses unique challenges to learning algorithms: predictive models need to be scalable, incrementally trainable, and must remain bounded in size even when the data stream is arbitrarily long. Furthermore, as parameter tuning is problematic in a streaming setting, suitable approaches should be parameterless, and make no assumptions on what class lab… ▽ More

    Submitted 11 April, 2016; originally announced April 2016.

  30. arXiv:1508.04912  [pdf, other

    stat.ML cs.LG

    The ABACOC Algorithm: a Novel Approach for Nonparametric Classification of Data Streams

    Authors: Rocco De Rosa, Francesco Orabona, Nicolò Cesa-Bianchi

    Abstract: Stream mining poses unique challenges to machine learning: predictive models are required to be scalable, incrementally trainable, must remain bounded in size (even when the data stream is arbitrarily long), and be nonparametric in order to achieve high accuracy even in complex and dynamic environments. Moreover, the learning system must be parameterless ---traditional tuning methods are problemat… ▽ More

    Submitted 20 August, 2015; originally announced August 2015.

  31. arXiv:1411.1158  [pdf, ps, other

    cs.LG stat.ML

    On the Complexity of Learning with Kernels

    Authors: Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir

    Abstract: A well-recognized limitation of kernel learning is the requirement to handle a kernel matrix, whose size is quadratic in the number of training examples. Many methods have been proposed to reduce this computational cost, mostly by using a subset of the kernel matrix entries, or some form of low-rank matrix approximation, or a random projection method. In this paper, we study lower bounds on the er… ▽ More

    Submitted 5 November, 2014; originally announced November 2014.

  32. arXiv:1409.8428  [pdf, other

    cs.LG stat.ML

    Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

    Authors: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

    Abstract: We present and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and i… ▽ More

    Submitted 30 September, 2014; originally announced September 2014.

    Comments: Preliminary versions of parts of this paper appeared in [1,20], and also as arXiv papers arXiv:1106.2436 and arXiv:1307.4564

  33. arXiv:1307.4564  [pdf, ps, other

    cs.LG stat.ML

    From Bandits to Experts: A Tale of Domination and Independence

    Authors: Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour

    Abstract: We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before… ▽ More

    Submitted 17 July, 2013; originally announced July 2013.

  34. arXiv:1306.0811  [pdf, other

    cs.LG cs.SI stat.ML

    A Gang of Bandits

    Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

    Abstract: Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead t… ▽ More

    Submitted 4 November, 2013; v1 submitted 4 June, 2013; originally announced June 2013.

    Comments: NIPS 2013

  35. arXiv:1302.4387  [pdf, ps, other

    cs.LG stat.ML

    Online Learning with Switching Costs and Other Adaptive Adversaries

    Authors: Nicolo Cesa-Bianchi, Ofer Dekel, Ohad Shamir

    Abstract: We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we ch… ▽ More

    Submitted 1 June, 2013; v1 submitted 18 February, 2013; originally announced February 2013.

  36. arXiv:1301.5112  [pdf, ps, other

    cs.LG stat.ML

    Active Learning on Trees and Graphs

    Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

    Abstract: We investigate the problem of active learning on a given tree whose nodes are assigned binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we characterize (up to constant factors) the optimal placement of queries so to minimize the mistakes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and the optimal number of mistakes on… ▽ More

    Submitted 22 January, 2013; originally announced January 2013.

  37. arXiv:1301.4769  [pdf, other

    cs.LG cs.DS stat.ML

    A Correlation Clustering Approach to Link Classification in Signed Networks -- Full Version --

    Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

    Abstract: Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new fa… ▽ More

    Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

  38. arXiv:1301.4767  [pdf, other

    cs.LG cs.SI stat.ML

    A Linear Time Active Learning Algorithm for Link Classification -- Full Version --

    Authors: Nicolo Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

    Abstract: We present very efficient active learning algorithms for link classification in signed networks. Our algorithms are motivated by a stochastic model in which edge labels are obtained through perturbations of a initial sign assignment consistent with a two-clustering of the nodes. We provide a theoretical analysis within this model, showing that we can achieve an optimal (to whithin a constant facto… ▽ More

    Submitted 28 February, 2013; v1 submitted 21 January, 2013; originally announced January 2013.

  39. arXiv:1212.5637  [pdf, other

    cs.LG stat.ML

    Random Spanning Trees and the Prediction of Weighted Graphs

    Authors: Nicolo' Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

    Abstract: We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph no… ▽ More

    Submitted 21 December, 2012; originally announced December 2012.

    Comments: Appeared in ICML 2010

  40. arXiv:1209.1727  [pdf, ps, other

    stat.ML cs.LG

    Bandits with heavy tail

    Authors: Sébastien Bubeck, Nicolò Cesa-Bianchi, Gábor Lugosi

    Abstract: The stochastic multi-armed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper we examine the bandit problem under the weaker assumption that the distributions have moments of order 1+ε, for some $ε\in (0,1]$. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward dis… ▽ More

    Submitted 8 September, 2012; originally announced September 2012.

  41. arXiv:1204.5721  [pdf, ps, other

    cs.LG stat.ML

    Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

    Authors: Sébastien Bubeck, Nicolò Cesa-Bianchi

    Abstract: Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs aris… ▽ More

    Submitted 3 November, 2012; v1 submitted 25 April, 2012; originally announced April 2012.

    Comments: To appear in Foundations and Trends in Machine Learning

  42. arXiv:1202.3323  [pdf, ps, other

    cs.LG stat.ML

    Mirror Descent Meets Fixed Share (and feels no regret)

    Authors: Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

    Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other rel… ▽ More

    Submitted 27 September, 2012; v1 submitted 15 February, 2012; originally announced February 2012.

    Journal ref: NIPS 2012, Lake Tahoe : United States (2012)

  43. arXiv:1202.3079  [pdf, ps, other

    cs.LG stat.ML

    Towards minimax policies for online linear optimization with bandit feedback

    Authors: Sébastien Bubeck, Nicolò Cesa-Bianchi, Sham M. Kakade

    Abstract: We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $\sqrt{d}$ factor compared to previous works, and give… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

  44. arXiv:1110.6886  [pdf, other

    cs.LG cs.IT stat.ML

    PAC-Bayesian Inequalities for Martingales

    Authors: Yevgeny Seldin, François Laviolette, Nicolò Cesa-Bianchi, John Shawe-Taylor, Peter Auer

    Abstract: We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and ot… ▽ More

    Submitted 30 July, 2012; v1 submitted 31 October, 2011; originally announced October 2011.

  45. arXiv:1110.4322   

    cs.LG stat.ML

    An Optimal Algorithm for Linear Bandits

    Authors: Nicolò Cesa-Bianchi, Sham Kakade

    Abstract: We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order sqrt{Td ln N} on any finite class X of N actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We a… ▽ More

    Submitted 14 February, 2012; v1 submitted 19 October, 2011; originally announced October 2011.

    Comments: This paper is superseded by S. Bubeck, N. Cesa-Bianchi, and S.M. Kakade, "Towards minimax policies for online linear optimization with bandit feedback"

  46. arXiv:1106.2429  [pdf, ps, other

    cs.LG stat.ML

    Efficient Transductive Online Learning via Randomized Rounding

    Authors: Nicolò Cesa-Bianchi, Ohad Shamir

    Abstract: Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, tailored for transductive settings, which combines "random playout" and randomized rounding of loss subgradients. As an application of our approach, we present the first computationally efficient online alg… ▽ More

    Submitted 11 September, 2013; v1 submitted 13 June, 2011; originally announced June 2011.

    Comments: To appear in a Festschrift in honor of V.N. Vapnik. Preliminary version presented in NIPS 2011

  47. arXiv:1105.4585  [pdf, ps, other

    cs.LG stat.ML

    PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

    Authors: Yevgeny Seldin, Nicolò Cesa-Bianchi, François Laviolette, Peter Auer, John Shawe-Taylor, Jan Peters

    Abstract: We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolvin… ▽ More

    Submitted 23 May, 2011; originally announced May 2011.

    Comments: On-line Trading of Exploration and Exploitation 2 - ICML-2011 workshop. http://explo.cs.ucl.ac.uk/workshop/