Skip to main content

Showing 51–86 of 86 results for author: Valko, M

.
  1. arXiv:2006.10459  [pdf, other

    stat.ML cs.LG

    Stochastic bandits with arm-dependent delays

    Authors: Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

    Abstract: Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications. The applicability of existing algorithms is however restricted by the fact that strong assumptions are often made on the delay distributions, such as full observability, restrictive shape constraints, or uniformity over arms. In this work, we weaken them significantly and… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 19 Pages, 4 figures

    MSC Class: 62L10

  2. arXiv:2006.07733  [pdf, other

    cs.LG cs.CV stat.ML

    Bootstrap your own latent: A new approach to self-supervised Learning

    Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

    Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the… ▽ More

    Submitted 10 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

  3. arXiv:2006.06613  [pdf, ps, other

    stat.ML cs.LG

    Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

    Authors: Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

    Abstract: We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family. We propo… ▽ More

    Submitted 3 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: accepted to NeurIPS 2020

  4. arXiv:2006.06294  [pdf, other

    cs.LG stat.ML

    Adaptive Reward-Free Exploration

    Authors: Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

    Abstract: Reward-free exploration is a reinforcement learning setting studied by ** et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be… ▽ More

    Submitted 7 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

  5. arXiv:2006.05879  [pdf, other

    cs.LG stat.ML

    Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

    Authors: Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

    Abstract: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optima… ▽ More

    Submitted 10 June, 2020; originally announced June 2020.

  6. arXiv:2004.06248  [pdf, other

    cs.LG stat.ML

    Improved Slee** Bandits with Stochastic Actions Sets and Adversarial Rewards

    Authors: Aadirupa Saha, Pierre Gaillard, Michal Valko

    Abstract: In this paper, we consider the problem of slee** bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomial-time) algorithms for this problem only guarantee an $O(T^{2/3})$ up… ▽ More

    Submitted 8 August, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted to ICML 2020

  7. arXiv:2004.05599  [pdf, other

    cs.LG stat.ML

    Kernel-Based Reinforcement Learning: A Finite-Time Analysis

    Authors: Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

    Abstract: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric. We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards and transitions to efficiently balance exploration and exploitation. For problems with $K$ epi… ▽ More

    Submitted 23 March, 2022; v1 submitted 12 April, 2020; originally announced April 2020.

    Comments: Update following the publication in ICML 2021, including fixed typos

  8. arXiv:2003.06259  [pdf, other

    cs.LG stat.ML

    Taylor Expansion Policy Optimization

    Authors: Yunhao Tang, Michal Valko, Rémi Munos

    Abstract: In this work, we investigate the application of Taylor expansions in reinforcement learning. In particular, we propose Taylor expansion policy optimization, a policy optimization formalism that generalizes prior work (e.g., TRPO) as a first-order special case. We also show that Taylor expansions intimately relate to off-policy evaluation. Finally, we show that this new formulation entails modifica… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

  9. Fast sampling from $β$-ensembles

    Authors: Guillaume Gautier, Rémi Bardenet, Michal Valko

    Abstract: We study sampling algorithms for $β$-ensembles with time complexity less than cubic in the cardinality of the ensemble. Following Dumitriu & Edelman (2002), we see the ensemble as the eigenvalues of a random tridiagonal matrix, namely a random Jacobi matrix. First, we provide a unifying and elementary treatment of the tridiagonal models associated to the three classical Hermite, Laguerre and Jacob… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: 37 pages, 8 figures, code at https://github.com/guilgautier/DPPy

    MSC Class: 60K35 (Primary) 65C40; 60B20; 33C45 (Secondary)

    Journal ref: Stat. Comput. 31 (2021) 7

  10. arXiv:2002.09954  [pdf, other

    stat.ML cs.LG

    Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

    Authors: Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

    Abstract: Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as… ▽ More

    Submitted 26 February, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

  11. arXiv:1912.03517  [pdf, other

    stat.ML cs.LG

    No-Regret Exploration in Goal-Oriented Reinforcement Learning

    Authors: Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

    Abstract: Many popular reinforcement learning problems (e.g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost. Despite the popularity of this setting, the exploration-exploitation dilemma has been sparsely studied in general SSP pro… ▽ More

    Submitted 17 August, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

    Journal ref: International Conference on Machine Learning (ICML 2020)

  12. arXiv:1910.10945  [pdf, other

    cs.LG stat.ML

    Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

    Authors: Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

    Abstract: We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS). In particular, we justify its use for fixed-confidence best-arm identification. We further propose a variant of TTTS called Top-Two Transportation Cost (T3C), which disposes of the computational burden of TTTS. As our main contribution, we provide the first sample complexity analysis of TTTS and T… ▽ More

    Submitted 28 October, 2019; v1 submitted 24 October, 2019; originally announced October 2019.

  13. arXiv:1910.04034  [pdf, ps, other

    cs.LG stat.ML

    Derivative-Free & Order-Robust Optimisation

    Authors: Victor Gabillon, Rasul Tutunov, Michal Valko, Haitham Bou Ammar

    Abstract: In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes. Our results are the first to target simple regret definitions in adversarial sc… ▽ More

    Submitted 22 October, 2019; v1 submitted 9 October, 2019; originally announced October 2019.

  14. arXiv:1909.09849  [pdf, other

    cs.MA cs.AI cs.LG

    Multiagent Evaluation under Incomplete Information

    Authors: Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

    Abstract: This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other technique… ▽ More

    Submitted 10 January, 2020; v1 submitted 21 September, 2019; originally announced September 2019.

  15. arXiv:1906.08509  [pdf, other

    stat.ML cs.LG math.OC

    Online A-Optimal Design and Active Linear Regression

    Authors: Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

    Abstract: We consider in this paper the problem of optimal experiment design where a decision maker can choose which points to sample to obtain an estimate $\hatβ$ of the hidden parameter $β^{\star}$ of an underlying linear model. The key challenge of this work lies in the heteroscedasticity assumption that we make, meaning that each covariate has a different and unknown variance. The goal of the decision m… ▽ More

    Submitted 30 December, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: 29 pages, 5 figures

  16. arXiv:1905.13476  [pdf, other

    cs.LG stat.ML

    Exact sampling of determinantal point processes with sublinear time preprocessing

    Authors: Michał Dereziński, Daniele Calandriello, Michal Valko

    Abstract: We study the complexity of sampling from a distribution over all index subsets of the set $\{1,...,n\}$ with the probability of a subset $S$ proportional to the determinant of the submatrix $\mathbf{L}_S$ of some $n\times n$ p.s.d. matrix $\mathbf{L}$, where $\mathbf{L}_S$ corresponds to the entries of $\mathbf{L}$ indexed by $S$. Known as a determinantal point process, this distribution is used i… ▽ More

    Submitted 8 July, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

  17. arXiv:1903.05594  [pdf, other

    stat.ML cs.LG

    Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

    Authors: Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

    Abstract: Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runti… ▽ More

    Submitted 27 August, 2019; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: Accepted at COLT 2019. Corrected typos and improved comparison with existing methods

    Journal ref: Proceedings of Machine Learning Research vol, 99, (COLT 2019)

  18. arXiv:1902.03794  [pdf, other

    stat.ML cs.LG

    Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}. In most interesting problems, state-of-the-art algorithms take advantage of structural properties of rewards, such as \emph{independence}. However, while being optimal in terms of asymptotic regret, these algorithms are inefficient. In our paper, we first reduce their implementation to a specific \emph{submod… ▽ More

    Submitted 20 June, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

    Comments: Accepted to ICML 2019, Long Beach

  19. arXiv:1901.04884  [pdf, other

    stat.ML cs.LG stat.AP stat.CO

    Optimistic optimization of a Brownian

    Authors: Jean-Bastien Grill, Michal Valko, Rémi Munos

    Abstract: We address the problem of optimizing a Brownian motion. We consider a (random) realization $W$ of a Brownian motion with input space in $[0,1]$. Given $W$, our goal is to return an $ε$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm. We provide an algorithm with sample complexity of order $\log^2(1/ε)$. This improves o… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: 10 pages, 2 figures

    Journal ref: Neural Information Processing Systems (NeurIPS 2018)

  20. arXiv:1811.11043  [pdf, other

    stat.ML cs.LG

    Rotting bandits are not harder than stochastic ones

    Authors: Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko

    Abstract: In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering… ▽ More

    Submitted 9 May, 2020; v1 submitted 27 November, 2018; originally announced November 2018.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS 2019)

  21. arXiv:1810.00997  [pdf, other

    cs.LG stat.ML

    A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

    Authors: Peter L. Bartlett, Victor Gabillon, Michal Valko

    Abstract: We study the problem of optimizing a function under a \emph{budgeted number of evaluations}. We only assume that the function is \emph{locally} smooth around one of its global optima. The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function. A smaller $d$ results in smaller optimization err… ▽ More

    Submitted 23 February, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

  22. Compressing the Input for CNNs with the First-Order Scattering Transform

    Authors: Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

    Abstract: We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN). We show theoretical and empirical evidence that in the case of natural images and sufficiently small translation invariance, this transform preserves most of the signal information needed for classification while substantially reducing the spatial resolution and t… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Journal ref: ECCV 2018

  23. arXiv:1809.07258  [pdf, other

    cs.LG stat.ML

    DPPy: Sampling DPPs with Python

    Authors: Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko

    Abstract: Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning. Sampling from DPPs is a challenge and therefore we present DPPy, a Python toolbox that gathers known exact and approximate sampling algorithms for both finite and continuous DPP… ▽ More

    Submitted 12 August, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

    Comments: Code at http://github.com/guilgautier/DPPy/ Documentation at http://dppy.readthedocs.io/

    Journal ref: Journal of Machine Learning Research 20 (2019) 1-7

  24. arXiv:1806.02282  [pdf, ps, other

    stat.ML cs.LG math.OC

    Finding the bandit in a graph: Sequential search-and-stop

    Authors: Pierre Perrault, Vianney Perchet, Michal Valko

    Abstract: We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and resta… ▽ More

    Submitted 22 April, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: in International Conference on Artificial Intelligence and Statistics (AISTATS 2019), April 2019, Naha, Okinawa, Japan

  25. arXiv:1803.10172  [pdf, other

    stat.ML cs.DS cs.LG

    Distributed Adaptive Sampling for Kernel Matrix Approximation

    Authors: Daniele Calandriello, Alessandro Lazaric, Michal Valko

    Abstract: Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least $\mathcal{O}(n^2)$ time and space for $n$ samples. Recent works show that sampling points with replacement according to their ridge leverage scores (RLS) generates smal… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

    Comments: Presented at AISTATS 2017

  26. arXiv:1706.04892  [pdf, ps, other

    stat.ML cs.LG

    Second-Order Kernel Online Convex Optimization with Adaptive Sketching

    Authors: Daniele Calandriello, Alessandro Lazaric, Michal Valko

    Abstract: Kernel online convex optimization (KOCO) is a framework combining the expressiveness of non-parametric kernel models with the regret guarantees of online learning. First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  27. arXiv:1705.10498  [pdf, other

    stat.ML cs.LG stat.CO

    Zonotope hit-and-run for efficient sampling from projection DPPs

    Authors: Guillaume Gautier, Rémi Bardenet, Michal Valko

    Abstract: Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines i… ▽ More

    Submitted 15 June, 2017; v1 submitted 30 May, 2017; originally announced May 2017.

    Comments: 12 pages, 12 figures, 2 columns, accepted to ICML 2017

    Journal ref: Proceedings of the 34th International Conference on Machine Learning 70 (2017) 1223-1232

  28. arXiv:1611.06800  [pdf, other

    stat.ML

    MDL-motivated compression of GLM ensembles increases interpretability and retains predictive power

    Authors: Boris Hayete, Matthew Valko, Alex Greenfield, Raymond Yan

    Abstract: Over the years, ensemble methods have become a staple of machine learning. Similarly, generalized linear models (GLMs) have become very popular for a wide variety of statistical inference tasks. The former have been shown to enhance out- of-sample predictive power and the latter possess easy interpretability. Recently, ensembles of GLMs have been proposed as a possibility. On the downside, this ap… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: The authors would like to acknowledge Leon Furchtgott and Fred Gruber for their invaluable feedback on the manuscript, and Fred Gruber for his help with LATEX. Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

  29. arXiv:1609.03769  [pdf, other

    stat.ML cs.DS cs.LG

    Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

    Authors: Daniele Calandriello, Alessandro Lazaric, Michal Valko

    Abstract: We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability. We rigorously take into account the dependencies across subsequent resparsifications using martingale inequalities, fixing a flaw in the original analysis.

    Submitted 13 September, 2016; originally announced September 2016.

  30. arXiv:1605.06593  [pdf, other

    cs.LG cs.AI cs.SI math.OC stat.ML

    Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

    Authors: Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

    Abstract: We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, an… ▽ More

    Submitted 19 June, 2018; v1 submitted 21 May, 2016; originally announced May 2016.

    Comments: Compared with the previous version, this version has fixed a mistake. This version is also consistent with the NIPS camera-ready version

    Journal ref: Z. Wen, B. Kveton, M. Valko, and S. Vaswani, "Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback", Advances in Neural Information Processing Systems 30 Proceedings, 2017

  31. arXiv:1601.05675  [pdf, other

    stat.ML cs.LG

    Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

    Authors: Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

    Abstract: While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization methods pr… ▽ More

    Submitted 21 January, 2016; originally announced January 2016.

  32. arXiv:1506.04782  [pdf, other

    cs.LG

    Cheap Bandits

    Authors: Manjesh Kumar Hanawal, Venkatesh Saligrama, Michal Valko, R\' emi Munos

    Abstract: We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually \textit{cheaper} to observe… ▽ More

    Submitted 18 June, 2015; v1 submitted 15 June, 2015; originally announced June 2015.

    Comments: To be presented at ICML 2015

  33. arXiv:1505.04627  [pdf, other

    cs.LG stat.ML

    Simple regret for infinitely many armed bandits

    Authors: Alexandra Carpentier, Michal Valko

    Abstract: We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the s… ▽ More

    Submitted 18 May, 2015; originally announced May 2015.

    Comments: in 32th International Conference on Machine Learning (ICML 2015)

  34. arXiv:1405.7752  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Act Greedily: Polymatroid Semi-Bandits

    Authors: Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

    Abstract: Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method. In this work, we study a learning variant of these problems, where the model of the problem is unknown and has to be learned by interacting repeatedly with the environment in the bandit setting. We formalize our learning problem quite generally, as learning how… ▽ More

    Submitted 21 November, 2014; v1 submitted 29 May, 2014; originally announced May 2014.

  35. arXiv:1309.6869  [pdf

    cs.LG stat.ML

    Finite-Time Analysis of Kernelised Contextual Bandits

    Authors: Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini

    Abstract: We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions' contexts and that the expected reward is an arbitrary linear function of the contexts' images in the related reproduc… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-654-663

  36. arXiv:1203.3522  [pdf

    cs.LG stat.ML

    Online Semi-Supervised Learning on Quantized Graphs

    Authors: Michal Valko, Branislav Kveton, Ling Huang, Daniel Ting

    Abstract: In this paper, we tackle the problem of online semi-supervised learning (SSL). When data arrive in a stream, the dual problems of computation and data storage arise for any SSL method. We propose a fast approximate online SSL algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-606-614