Skip to main content

Showing 1–31 of 31 results for author: Neu, G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.04056  [pdf, other

    cs.LG math.OC stat.ML

    Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

    Authors: Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas

    Abstract: We propose a new framework for formulating optimal transport distances between Markov chains. Previously known formulations studied couplings between the entire joint distribution induced by the chains, and derived solutions via a reduction to dynamic programming (DP) in an appropriately defined Markov decision process. This formulation has, however, not led to particularly efficient algorithms so… ▽ More

    Submitted 11 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2402.13903  [pdf, ps, other

    cs.LG math.OC stat.ML

    Dealing with unbounded gradients in stochastic saddle-point optimization

    Authors: Gergely Neu, Nneka Okolo

    Abstract: We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may result in instability and divergence. In this paper, we propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful… ▽ More

    Submitted 7 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 23 pages

  3. arXiv:2310.01609  [pdf, ps, other

    stat.ML cs.LG

    Adversarial Contextual Bandits Go Kernelized

    Authors: Gergely Neu, Julia Olkhovskaya, Sattar Vakili

    Abstract: We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a reproducing kernel Hilbert space, which allows for a more flexible modeling of complex decision-making scenarios. We propose a computationally efficient algorithm that makes use of a new optimistically biased estimator for the loss functions and achi… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  4. arXiv:2309.15771  [pdf, other

    cs.LG stat.ML

    Importance-Weighted Offline Learning Done Right

    Authors: Germano Gabbianelli, Gergely Neu, Matteo Papini

    Abstract: We study the problem of offline policy optimization in stochastic contextual bandit problems, where the goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy. Rather than making any structural assumptions on the reward function, we assume access to a given policy class and aim to compete with the best comparator policy within this clas… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  5. arXiv:2305.19674  [pdf, other

    stat.ML cs.LG

    Online-to-PAC Conversions: Generalization Bounds via Regret Analysis

    Authors: Gábor Lugosi, Gergely Neu

    Abstract: We present a new framework for deriving bounds on the generalization bound of statistical learning algorithms from the perspective of online learning. Specifically, we construct an online learning game called the "generalization game", where an online learner is trying to compete with a fixed statistical learning algorithm in predicting the sequence of generalization gaps on a training set of i.i.… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  6. arXiv:2305.00832  [pdf, ps, other

    cs.LG stat.ML

    First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

    Authors: Julia Olkhovskaya, Jack Mayo, Tim van Erven, Gergely Neu, Chen-Yu Wei

    Abstract: We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction. Assuming the $d$-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of $T$ rounds is known to scale as $\tilde O(\sqrt{Kd T})$. Under the additional assumption that the… ▽ More

    Submitted 24 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

  7. arXiv:2302.14004  [pdf, other

    cs.LG stat.ML

    Optimistic Planning by Regularized Dynamic Programming

    Authors: Antoine Moulin, Gergely Neu

    Abstract: We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure. This technique allows us to avoid contraction and monotonicity arguments typically required by existing analyses of approximate dynamic programming methods, and in particula… ▽ More

    Submitted 14 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

  8. arXiv:2207.08956  [pdf, other

    cs.LG stat.ML

    Online Learning with Off-Policy Feedback

    Authors: Germano Gabbianelli, Matteo Papini, Gergely Neu

    Abstract: We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a standard exploration-exploitation dilemma, the learner has to f… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  9. arXiv:2202.04985  [pdf, other

    stat.ML cs.LG

    Generalization Bounds via Convex Analysis

    Authors: Gábor Lugosi, Gergely Neu

    Abstract: Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shann… ▽ More

    Submitted 19 July, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  10. arXiv:2102.00931  [pdf, other

    cs.LG stat.ML

    Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

    Authors: Gergely Neu, Gintare Karolina Dziugaite, Mahdi Haghifam, Daniel M. Roy

    Abstract: We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds dep… ▽ More

    Submitted 15 August, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: COLT 2021

  11. arXiv:2010.11151  [pdf, other

    cs.LG cs.AI stat.ML

    Logistic Q-Learning

    Authors: Joan Bas-Serrano, Sebastian Curi, Andreas Krause, Gergely Neu

    Abstract: We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The method is closely related to the classic Relative Entropy Policy Search (REPS) algorithm of Peters et al. (2010), with the key difference that our method introduces a Q-function that enables efficient exact model-free implementation. The main feature of our al… ▽ More

    Submitted 25 February, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

  12. arXiv:2007.01891  [pdf, ps, other

    cs.LG stat.ML

    A Unifying View of Optimism in Episodic Reinforcement Learning

    Authors: Gergely Neu, Ciara Pike-Burke

    Abstract: The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs a… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  13. arXiv:2007.01612  [pdf, ps, other

    cs.LG stat.ML

    Online learning in MDPs with linear function approximation and bandit feedback

    Authors: Gergely Neu, Julia Olkhovskaya

    Abstract: We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be repre… ▽ More

    Submitted 12 June, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

  14. arXiv:2002.00287  [pdf, ps, other

    cs.LG stat.ML

    Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

    Authors: Gergely Neu, Julia Olkhovskaya

    Abstract: We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where the sequence of loss functions associated with each arm are allowed to change without restriction over time. Under the assumption that the $d$-dimensional contexts are generated i.i.d.~at random from a known distributions, we develop computationally efficient algorithms based on the classic Exp3 algo… ▽ More

    Submitted 24 May, 2022; v1 submitted 1 February, 2020; originally announced February 2020.

  15. arXiv:2001.10623  [pdf, ps, other

    cs.LG math.ST stat.ML

    Fast Rates for Online Prediction with Abstention

    Authors: Gergely Neu, Nikita Zhivotovskiy

    Abstract: In the setting of sequential prediction of individual $\{0, 1\}$-sequences with expert advice, we show that by allowing the learner to abstain from the prediction by paying a cost marginally smaller than $\frac 12$ (say, $0.49$), it is possible to achieve expected regret bounds that are independent of the time horizon $T$. We exactly characterize the dependence on the abstention cost $c$ and the n… ▽ More

    Submitted 20 June, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: 19 pages, minor corrections, to appear in COLT

  16. arXiv:1909.10904  [pdf, ps, other

    math.OC cs.LG stat.ML

    Faster saddle-point optimization for solving large-scale Markov decision processes

    Authors: Joan Bas-Serrano, Gergely Neu

    Abstract: We consider the problem of computing optimal policies in average-reward Markov decision processes. This classical problem can be formulated as a linear program directly amenable to saddle-point optimization methods, albeit with a number of variables that is linear in the number of states. To address this issue, recent work has considered a linearly relaxed version of the resulting saddle-point pro… ▽ More

    Submitted 10 January, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

  17. arXiv:1906.07987  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

    Authors: Hugo Penedones, Carlos Riquelme, Damien Vincent, Hartmut Maennel, Timothy Mann, Andre Barreto, Sylvain Gelly, Gergely Neu

    Abstract: We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this pa… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

  18. arXiv:1902.08668  [pdf, other

    stat.ML cs.LG

    Beating SGD Saturation with Tail-Averaging and Minibatching

    Authors: Nicole Mücke, Gergely Neu, Lorenzo Rosasco

    Abstract: While stochastic gradient descent (SGD) is one of the major workhorses in machine learning, the learning properties of many practically used variants are poorly understood. In this paper, we consider least squares learning in a nonparametric setting and contribute to filling this gap by focusing on the effect and interplay of multiple passes, mini-batching and averaging, and in particular tail ave… ▽ More

    Submitted 26 May, 2019; v1 submitted 22 February, 2019; originally announced February 2019.

  19. arXiv:1902.03035  [pdf, ps, other

    cs.LG stat.ML

    Bandit Principal Component Analysis

    Authors: Wojciech Kotłowski, Gergely Neu

    Abstract: We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's choices. We focus on a natural notion of bandit feedback where the learner only observes the loss associated with its own prediction. Based on the classical ob… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

  20. arXiv:1805.11022  [pdf, ps, other

    cs.LG stat.ML

    Online Influence Maximization with Local Observations

    Authors: Julia Olkhovskaya, Gergely Neu, Gábor Lugosi

    Abstract: We consider an online influence maximization problem in which a decision maker selects a node among a large number of possibilities and places a piece of information at the node. The node transmits the information to some others that are in the same connected component in a random graph. The goal of the decision maker is to reach as many nodes as possible, with the added complication that feedback… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

  21. arXiv:1802.08009  [pdf, ps, other

    cs.LG stat.ML

    Iterate averaging as regularization for stochastic gradient descent

    Authors: Gergely Neu, Lorenzo Rosasco

    Abstract: We propose and analyze a variant of the classic Polyak-Ruppert averaging scheme, broadly used in stochastic gradient methods. Rather than a uniform average of the iterates, we consider a weighted average, with weights decaying in a geometric fashion. In the context of linear least squares regression, we show that this averaging scheme has a the same regularizing effect, and indeed is asymptoticall… ▽ More

    Submitted 22 February, 2018; originally announced February 2018.

  22. arXiv:1710.05739  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Hardness of Inventory Management with Censored Demand Data

    Authors: Gábor Lugosi, Mihalis G. Markakis, Gergely Neu

    Abstract: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or tim… ▽ More

    Submitted 16 October, 2017; originally announced October 2017.

  23. arXiv:1705.10257  [pdf, ps, other

    cs.LG stat.ML

    Boltzmann Exploration Done Right

    Authors: Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, Gergely Neu

    Abstract: Boltzmann exploration is a classic strategy for sequential decision-making under uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite its widespread use, there is virtually no theoretical understanding about the limitations or the actual benefits of this exploration scheme. Does it drive exploration in a meaningful way? Is it prone to misidentifying the optima… ▽ More

    Submitted 7 November, 2017; v1 submitted 29 May, 2017; originally announced May 2017.

  24. arXiv:1705.07798  [pdf, other

    cs.LG cs.AI stat.ML

    A unified view of entropy-regularized Markov decision processes

    Authors: Gergely Neu, Anders Jonsson, Vicenç Gómez

    Abstract: We propose a general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of policy optimization in MDPs to accommodate convex regularization functions. Our key result is showing that using the conditional entropy of the joint state-action distributions as regularization yi… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

  25. arXiv:1702.08712  [pdf, ps, other

    stat.ML cs.LG

    Algorithmic stability and hypothesis complexity

    Authors: Tongliang Liu, Gábor Lugosi, Gergely Neu, Dacheng Tao

    Abstract: We introduce a notion of algorithmic stability of learning algorithms---that we term \emph{argument stability}---that captures stability of the hypothesis output by the learning algorithm in the normed space of functions from which hypotheses are selected. The main result of the paper bounds the generalization error of any learning algorithm in terms of its argument stability. The bounds are based… ▽ More

    Submitted 3 August, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

  26. arXiv:1702.06341  [pdf, ps, other

    cs.LG math.OC stat.ML

    Fast rates for online learning in Linearly Solvable Markov Decision Processes

    Authors: Gergely Neu, Vicenç Gómez

    Abstract: We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online… ▽ More

    Submitted 6 June, 2017; v1 submitted 21 February, 2017; originally announced February 2017.

  27. arXiv:1506.03271  [pdf, other

    cs.LG stat.ML

    Explore no more: Improved high-probability regret bounds for non-stochastic bandits

    Authors: Gergely Neu

    Abstract: This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on exp… ▽ More

    Submitted 3 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

    Comments: To appear at NIPS 2015

  28. arXiv:1503.05087  [pdf, ps, other

    cs.LG stat.ML

    Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

    Authors: Gergely Neu, Gábor Bartók

    Abstract: We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combinat… ▽ More

    Submitted 31 August, 2016; v1 submitted 17 March, 2015; originally announced March 2015.

    Comments: To appear in JMLR

  29. arXiv:1502.06354  [pdf, ps, other

    cs.LG stat.ML

    First-order regret bounds for combinatorial semi-bandits

    Authors: Gergely Neu

    Abstract: We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algori… ▽ More

    Submitted 10 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: To appear at COLT 2015

  30. arXiv:1406.6812  [pdf, other

    cs.LG stat.ML

    Online learning in MDPs with side information

    Authors: Yasin Abbasi-Yadkori, Gergely Neu

    Abstract: We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side informa… ▽ More

    Submitted 26 June, 2014; originally announced June 2014.

  31. arXiv:1206.5264  [pdf

    cs.LG stat.ML

    Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

    Authors: Gergely Neu, Csaba Szepesvari

    Abstract: In this paper we propose a novel gradient algorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some unknown reward function of a Markovian Decision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's observed behavior. The main difficulty is that the map** f… ▽ More

    Submitted 20 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI2007)

    Report number: UAI-P-2007-PG-295-302