Search | arXiv e-print repository

Structured Prediction in Online Learning

Authors: Pierre Boudart, Alessandro Rudi, Pierre Gaillard

Abstract: We study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, a… ▽ More We study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, and achieves the same excess risk upper bound also when data are not i.i.d. Moreover, we consider a second algorithm designed especially for non-stationary data distributions, including adversarial data. We bound its stochastic regret in function of the variation of the data distributions. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 29 pages

arXiv:2405.19807 [pdf, ps, other]

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Authors: Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Abstract: We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can b… ▽ More We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a slee** expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2403.07460 [pdf, other]

Experimental Comparison of Ensemble Methods and Time-to-Event Analysis Models Through Integrated Brier Score and Concordance Index

Authors: Camila Fernandez, Chung Shue Chen, Chen Pierre Gaillard, Alonso Silva

Abstract: Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statis… ▽ More Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statistical models, in addition to machine learning approaches. Our study is carried out on three datasets and evaluated in two different scores (the integrated Brier score and concordance index). Moreover, we show how ensemble methods, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy and enhance the robustness of the prediction performance. We conclude the analysis with a simulation experiment in which we evaluate the factors influencing the performance ranking of the methods using both scores. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.18917 [pdf, other]

Stop Relying on No-Choice and Do not Repeat the Moves: Optimal, Efficient and Practical Algorithms for Assortment Optimization

Authors: Aadirupa Saha, Pierre Gaillard

Abstract: We address the problem of active online assortment optimization problem with preference feedback, which is a framework for modeling user choices and subsetwise utility maximization. The framework is useful in various real-world applications including ad placement, online retail, recommender systems, fine-tuning language models, amongst many. The problem, although has been studied in the past, lack… ▽ More We address the problem of active online assortment optimization problem with preference feedback, which is a framework for modeling user choices and subsetwise utility maximization. The framework is useful in various real-world applications including ad placement, online retail, recommender systems, fine-tuning language models, amongst many. The problem, although has been studied in the past, lacks an intuitive and practical solution approach with simultaneously efficient algorithm and optimal regret guarantee. E.g., popularly used assortment selection algorithms often require the presence of a `strong reference' which is always included in the choice sets, further they are also designed to offer the same assortments repeatedly until the reference item gets selected -- all such requirements are quite unrealistic for practical applications. In this paper, we designed efficient algorithms for the problem of regret minimization in assortment selection with \emph{Plackett Luce} (PL) based user choices. We designed a novel concentration guarantee for estimating the score parameters of the PL model using `\emph{Pairwise Rank-Breaking}', which builds the foundation of our proposed algorithms. Moreover, our methods are practical, provably optimal, and devoid of the aforementioned limitations of the existing methods. Empirical evaluations corroborate our findings and outperform the existing baselines. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.15171 [pdf, ps, other]

Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits

Authors: Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel

Abstract: We address the problem of stochastic combinatorial semi-bandits, where a player selects among $P$ actions from the power set of a set containing $d$ base items. Adaptivity to the problem's structure is essential in order to obtain optimal regret upper bounds. As estimating the coefficients of a covariance matrix can be manageable in practice, leveraging them should improve the regret. We design ``… ▽ More We address the problem of stochastic combinatorial semi-bandits, where a player selects among $P$ actions from the power set of a set containing $d$ base items. Adaptivity to the problem's structure is essential in order to obtain optimal regret upper bounds. As estimating the coefficients of a covariance matrix can be manageable in practice, leveraging them should improve the regret. We design ``optimistic'' covariance-adaptive algorithms relying on online estimations of the covariance structure, called OLSUCBC and COSV (only the variances for the latter). They both yields improved gap-free regret. Although COSV can be slightly suboptimal, it improves on computational complexity by taking inspiration from Thompson Sampling approaches. It is the first sampling-based algorithm satisfying a $\sqrt{T}$ gap-free regret (up to poly-logs). We also show that in some cases, our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches, not only in exponential regimes where $P\gg d$ but also when $P\leq d$, which is not covered by existing analyses. △ Less

Submitted 3 July, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.05145 [pdf, other]

Online Learning Approach for Survival Analysis

Authors: Camila Fernandez, Pierre Gaillard, Joseph de Vilmarest, Olivier Wintenberger

Abstract: We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data. This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm-Online Newton Step (ONS). This approach, previously unexplored, presents substantial advantages, including explicit algorit… ▽ More We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data. This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm-Online Newton Step (ONS). This approach, previously unexplored, presents substantial advantages, including explicit algorithms with non-asymptotic convergence guarantees. Moreover, we analyze the selection of ONS hyperparameters, which depends on the exp-concavity property and has a significant influence on the regret bound. We propose a stochastic approach that guarantees logarithmic stochastic regret for ONS. Additionally, we introduce an adaptive aggregation method that ensures robustness in hyperparameter selection while maintaining fast regret bounds. The findings of this paper can extend beyond the survival analysis field, and are relevant for any case characterized by poor exp-concavity and unstable ONS. Finally, these assertions are illustrated by simulation experiments. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2311.18346 [pdf, other]

Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Authors: Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Abstract: Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-… ▽ More Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2309.07530 [pdf, other]

Adaptive approximation of monotone functions

Authors: Pierre Gaillard, Sébastien Gerchinovitz, Étienne de Montbrun

Abstract: We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(μ)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $μ$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation… ▽ More We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(μ)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $μ$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(μ)$ error below $ε$ after stop**. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(μ)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2302.12120 [pdf, other]

Sequential Counterfactual Risk Minimization

Authors: Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard

Abstract: Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk… ▽ More Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual estimator and identify conditions that can improve the performance of CRM in terms of excess risk and regret rates, by using an analysis similar to restart strategies in accelerated optimization methods. We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM. △ Less

Submitted 25 May, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: To appear at ICML23

arXiv:2302.08190 [pdf, other]

Reimagining Demand-Side Management with Mean Field Learning

Authors: Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Abstract: Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean… ▽ More Integrating renewable energy into the power grid while balancing supply and demand is a complex issue, given its intermittent nature. Demand side management (DSM) offers solutions to this challenge. We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal. We model it as a finite horizon Markovian mean field control problem. We develop a new algorithm, MD-MFC, which provides theoretical guarantees for convex and Lipschitz objective functions. What distinguishes MD-MFC from the existing load control literature is its effectiveness in directly solving the target tracking problem without resorting to regularization techniques on the main problem. A non-standard Bregman divergence on a mirror descent scheme allows dynamic programming to be used to obtain simple closed-form solutions. In addition, we show that general mean-field game algorithms can be applied to this problem, which expands the possibilities for addressing load control problems. We illustrate our claims with experiments on a realistic data set. △ Less

Submitted 25 May, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

arXiv:2210.14998 [pdf, other]

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Slee** Bandits

Authors: Pierre Gaillard, Aadirupa Saha, Soham Dan

Abstract: We address the problem of \emph{`Internal Regret'} in \emph{Slee** Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of slee** regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for slee** MAB. We then proposed an algor… ▽ More We address the problem of \emph{`Internal Regret'} in \emph{Slee** Bandits} in the fully adversarial setup, as well as draw connections between different existing notions of slee** regrets in the multiarmed bandits (MAB) literature and consequently analyze the implications: Our first contribution is to propose the new notion of \emph{Internal Regret} for slee** MAB. We then proposed an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities. We further show that a low slee** internal regret always implies a low external regret, and as well as a low policy regret for iid sequence of losses. The main contribution of this work precisely lies in unifying different notions of existing regret in slee** bandits and understand the implication of one to another. Finally, we also extend our results to the setting of \emph{Dueling Bandits} (DB)--a preference feedback variant of MAB, and proposed a reduction to MAB idea to design a low regret algorithm for slee** dueling bandits with stochastic preferences and adversarial availabilities. The efficacy of our algorithms is justified through empirical evaluations. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2209.13932 [pdf, ps, other]

Efficient and Near-Optimal Online Portfolio Selection

Authors: Rémi Jézéquel, Dmitrii M. Ostrovskii, Pierre Gaillard

Abstract: In the problem of online portfolio selection as formulated by Cover (1991), the trader repeatedly distributes her capital over $ d $ assets in each of $ T > 1 $ rounds, with the goal of maximizing the total return. Cover proposed an algorithm, termed Universal Portfolios, that performs nearly as well as the best (in hindsight) static assignment of a portfolio, with an $ O(d\log(T)) $ regret in ter… ▽ More In the problem of online portfolio selection as formulated by Cover (1991), the trader repeatedly distributes her capital over $ d $ assets in each of $ T > 1 $ rounds, with the goal of maximizing the total return. Cover proposed an algorithm, termed Universal Portfolios, that performs nearly as well as the best (in hindsight) static assignment of a portfolio, with an $ O(d\log(T)) $ regret in terms of the logarithmic return. Without imposing any restrictions on the market this guarantee is known to be worst-case optimal, and no other algorithm attaining it has been discovered so far. Unfortunately, Cover's algorithm crucially relies on computing certain $ d $-dimensional integral which must be approximated in any implementation; this results in a prohibitive $ \tilde O(d^4(T+d)^{14}) $ per-round runtime for the fastest known implementation due to Kalai and Vempala (2002). We propose an algorithm for online portfolio selection that admits essentially the same regret guarantee as Universal Portfolios -- up to a constant factor and replacement of $ \log(T) $ with $ \log(T+d) $ -- yet has a drastically reduced runtime of $ \tilde O(d^2(T+d)) $ per round. The selected portfolio minimizes the current logarithmic loss regularized by the log-determinant of its Hessian -- equivalently, the hybrid logarithmic-volumetric barrier of the polytope specified by the asset return vectors. As such, our work reveals surprising connections of online portfolio selection with two classical topics in optimization theory: cutting-plane and interior-point algorithms. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2202.06694 [pdf, other]

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

Authors: Aadirupa Saha, Pierre Gaillard

Abstract: We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve m… ▽ More We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in dueling bandits. In particular, \emph{we give the first best-of-both world result for the dueling bandits regret minimization problem} -- a unified framework that is guaranteed to perform optimally for both stochastic and adversarial preferences simultaneously. Moreover, our algorithm is also the first to achieve an optimal $O(\sum_{i = 1}^K \frac{\log T}{Δ_i})$ regret bound against the Condorcet-winner benchmark, which scales optimally both in terms of the arm-size $K$ and the instance-specific suboptimality gaps $\{Δ_i\}_{i = 1}^K$. This resolves the long-standing problem of designing an instancewise gap-dependent order optimal regret algorithm for dueling bandits (with matching lower bounds up to small constant factors). We further justify the robustness of our proposed algorithm by proving its optimal regret rate under adversarially corrupted preferences -- this outperforms the existing state-of-the-art corrupted dueling results by a large margin. In summary, we believe our reduction idea will find a broader scope in solving a diverse class of dueling bandits setting, which are otherwise studied separately from multi-armed bandits with often more complex solutions and worse guarantees. The efficacy of our proposed algorithms is empirically corroborated against the existing dueling bandit methods. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.05638 [pdf, other]

Efficient Kernel UCB for Contextual Bandits

Authors: Houssam Zenati, Alberto Bietti, Eustache Diemert, Julien Mairal, Matthieu Martin, Pierre Gaillard

Abstract: In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel… ▽ More In this paper, we tackle the computational efficiency of kernelized UCB algorithms in contextual bandits. While standard methods require a O(CT^3) complexity where T is the horizon and the constant C is related to optimizing the UCB rule, we propose an efficient contextual algorithm for large-scale problems. Specifically, our method relies on incremental Nystrom approximations of the joint kernel embedding of contexts and actions. This allows us to achieve a complexity of O(CTm^2) where m is the number of Nystrom points. To recover the same regret as the standard kernelized UCB algorithm, m needs to be of order of the effective dimension of the problem, which is at most O(\sqrt(T)) and nearly constant in some cases. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: To appear at AISTATS2022

arXiv:2110.09133 [pdf, other]

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Authors: Reda Ouhamma, Rémy Degenne, Pierre Gaillard, Vianney Perchet

Abstract: In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic an… ▽ More In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 10+15 pages. To be published in the proceedings of NeurIPS 2021

arXiv:2110.03960 [pdf, other]

Mixability made efficient: Fast online multiclass logistic regression

Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

Abstract: Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves… ▽ More Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves $O(e^B\log(n))$ obtaining a double exponential gain in $B$ (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity $O(n^{37})$. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2107.02274 [pdf, other]

Dueling Bandits with Adversarial Slee**

Authors: Aadirupa Saha, Pierre Gaillard

Abstract: We introduce the problem of slee** dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shop**, restaurant recommendation, search engine optimization, etc. Surprisingly, this `slee** aspect' of dueling bandits has never been studied in th… ▽ More We introduce the problem of slee** dueling bandits with stochastic preferences and adversarial availabilities (DB-SPAA). In almost all dueling bandit applications, the decision space often changes over time; eg, retail store management, online shop**, restaurant recommendation, search engine optimization, etc. Surprisingly, this `slee** aspect' of dueling bandits has never been studied in the literature. Like dueling bandits, the goal is to compete with the best arm by sequentially querying the preference feedback of item pairs. The non-triviality however results due to the non-stationary item spaces that allow any arbitrary subsets items to go unavailable every round. The goal is to find an optimal `no-regret' policy that can identify the best available item at each round, as opposed to the standard `fixed best-arm regret objective' of dueling bandits. We first derive an instance-specific lower bound for DB-SPAA $Ω( \sum_{i =1}^{K-1}\sum_{j=i+1}^K \frac{\log T}{Δ(i,j)})$, where $K$ is the number of items and $Δ(i,j)$ is the gap between items $i$ and $j$. This indicates that the slee** problem with preference feedback is inherently more difficult than that for classical multi-armed bandits (MAB). We then propose two algorithms, with near optimal regret guarantees. Our results are corroborated empirically. △ Less

Submitted 5 July, 2021; originally announced July 2021.

arXiv:2106.07644 [pdf, other]

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms. △ Less

Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

arXiv:2102.06035 [pdf, other]

A Continuized View on Nesterov Acceleration

Authors: Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

Abstract: We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process,… ▽ More We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. △ Less

Submitted 11 February, 2021; originally announced February 2021.

arXiv:2102.03594 [pdf, other]

Online nonparametric regression with Sobolev kernels

Authors: Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi

Abstract: In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^β(\mathcal{X})$, $p\geq 2, β>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $β> \frac{d}{2}$ or… ▽ More In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^β(\mathcal{X})$, $p\geq 2, β>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $β> \frac{d}{2}$ or $p=\infty$ these rates are (essentially) optimal. Finally, we compare the performance of the kernelized ridge regression forecaster to the known non-parametric forecasters in terms of the regret rates and their computational complexity as well as to the excess risk rates in the setting of statistical (i.i.d.) nonparametric regression. △ Less

Submitted 13 July, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

Comments: 40 pages, 5 figures, 3 tables (version 2)

arXiv:2011.06957 [pdf, other]

Non-stationary Online Regression

Authors: Anant Raj, Pierre Gaillard, Christophe Saad

Abstract: Online forecasting under a changing environment has been a problem of increasing importance in many real-world applications. In this paper, we consider the meta-algorithm presented in \citet{zhang2017dynamic} combined with different subroutines. We show that an expected cumulative error of order $\tilde{O}(n^{1/3} C_n^{2/3})$ can be obtained for non-stationary online linear regression where the to… ▽ More Online forecasting under a changing environment has been a problem of increasing importance in many real-world applications. In this paper, we consider the meta-algorithm presented in \citet{zhang2017dynamic} combined with different subroutines. We show that an expected cumulative error of order $\tilde{O}(n^{1/3} C_n^{2/3})$ can be obtained for non-stationary online linear regression where the total variation of parameter sequence is bounded by $C_n$. Our paper extends the result of online forecasting of one-dimensional time-series as proposed in \cite{baby2019online} to general $d$-dimensional non-stationary linear regression. We improve the rate $O(\sqrt{n C_n})$ obtained by Zhang et al. 2017 and Besbes et al. 2015. We further extend our analysis to non-stationary online kernel regression. Similar to the non-stationary online regression case, we use the meta-procedure of Zhang et al. 2017 combined with Kernel-AWV (Jezequel et al. 2020) to achieve an expected cumulative controlled by the effective dimension of the RKHS and the total variation of the sequence. To the best of our knowledge, this work is the first extension of non-stationary online regression to non-stationary kernel regression. Lastly, we evaluate our method empirically with several existing benchmarks and also compare it with the theoretical bound obtained in this paper. △ Less

Submitted 13 November, 2020; originally announced November 2020.

arXiv:2006.08212 [pdf, other]

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Authors: Raphaël Berthier, Francis Bach, Pierre Gaillard

Abstract: In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle θ_*, X \rangle$ between the random output $Y$ and the random feature vector $Φ(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square r… ▽ More In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle θ_*, X \rangle$ between the random output $Y$ and the random feature vector $Φ(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum $θ_*$ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum $θ_*$ and of the feature vectors $Φ(u)$. We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension. △ Less

Submitted 27 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2004.11722 [pdf, other]

Counterfactual Learning of Stochastic Policies with Continuous Actions: from Models to Offline Evaluation

Authors: Houssam Zenati, Alberto Bietti, Matthieu Martin, Eustache Diemert, Pierre Gaillard, Julien Mairal

Abstract: Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of learning stochastic policies with continuous actions from the viewpoint of counterfactual risk minimization (CRM). While the CRM framework is appealing and well studied for discrete actions, the continuous action case rais… ▽ More Counterfactual reasoning from logged data has become increasingly important for many applications such as web advertising or healthcare. In this paper, we address the problem of learning stochastic policies with continuous actions from the viewpoint of counterfactual risk minimization (CRM). While the CRM framework is appealing and well studied for discrete actions, the continuous action case raises new challenges about modelization, optimization, and~offline model selection with real data which turns out to be particularly challenging. Our paper contributes to these three aspects of the CRM estimation pipeline. First, we introduce a modelling strategy based on a joint kernel embedding of contexts and actions, which overcomes the shortcomings of previous discretization approaches. Second, we empirically show that the optimization aspect of counterfactual learning is important, and we demonstrate the benefits of proximal point algorithms and differentiable estimators. Finally, we propose an evaluation protocol for offline policies in real-world logged systems, which is challenging since policies cannot be replayed on test data, and we release a new large-scale dataset along with multiple synthetic, yet realistic, evaluation setups. △ Less

Submitted 14 December, 2022; v1 submitted 22 April, 2020; originally announced April 2020.

arXiv:2004.06248 [pdf, other]

Improved Slee** Bandits with Stochastic Actions Sets and Adversarial Rewards

Authors: Aadirupa Saha, Pierre Gaillard, Michal Valko

Abstract: In this paper, we consider the problem of slee** bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomial-time) algorithms for this problem only guarantee an $O(T^{2/3})$ up… ▽ More In this paper, we consider the problem of slee** bandits with stochastic action sets and adversarial rewards. In this setting, in contrast to most work in bandits, the actions may not be available at all times. For instance, some products might be out of stock in item recommendation. The best existing efficient (i.e., polynomial-time) algorithms for this problem only guarantee an $O(T^{2/3})$ upper-bound on the regret. Yet, inefficient algorithms based on EXP4 can achieve $O(\sqrt{T})$. In this paper, we provide a new computationally efficient algorithm inspired by EXP3 satisfying a regret of order $O(\sqrt{T})$ when the availabilities of each action $i \in \cA$ are independent. We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i.e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee. Our theoretical results are corroborated with experimental evaluations. △ Less

Submitted 8 August, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

Comments: Accepted to ICML 2020

arXiv:2003.08820 [pdf, other]

Experimental Comparison of Semi-parametric, Parametric, and Machine Learning Models for Time-to-Event Analysis Through the Concordance Index

Authors: Camila Fernandez, Chung Shue Chen, Pierre Gaillard, Alonso Silva

Abstract: In this paper, we make an experimental comparison of semi-parametric (Cox proportional hazards model, Aalen's additive regression model), parametric (Weibull AFT model), and machine learning models (Random Survival Forest, Gradient Boosting with Cox Proportional Hazards Loss, DeepSurv) through the concordance index on two different datasets (PBC and GBCSG2). We present two comparisons: one with th… ▽ More In this paper, we make an experimental comparison of semi-parametric (Cox proportional hazards model, Aalen's additive regression model), parametric (Weibull AFT model), and machine learning models (Random Survival Forest, Gradient Boosting with Cox Proportional Hazards Loss, DeepSurv) through the concordance index on two different datasets (PBC and GBCSG2). We present two comparisons: one with the default hyper-parameters of these models and one with the best hyper-parameters found by randomized search. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:2003.08109 [pdf, other]

Efficient improper learning for online logistic regression

Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

Abstract: We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential c… ▽ More We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. Indeed, [Foster et al., 2018] showed that the lower bound does not apply to improper algorithms and proposed a strategy based on exponential weights with prohibitive computational complexity. Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d^2). △ Less

Submitted 3 November, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

Journal ref: Conference on Learning Theory 2020, Jul 2020, Graz, Austria

arXiv:1902.09917 [pdf, other]

Efficient online learning with kernels for adversarial large scale problems

Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

Abstract: We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order… ▽ More We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order $n^α$ with $α< 2$. The algorithm we consider is based on approximating the kernel with the linear span of basis functions. Our contributions is two-fold: 1) For the Gaussian kernel, we propose to build the basis beforehand (independently of the data) through Taylor expansion. For $d$-dimensional inputs, we provide a (close to) optimal regret of order $O((\log n)^{d+1})$ with per-round time complexity and space complexity $O((\log n)^{2d})$. This makes the algorithm a suitable choice as soon as $n \gg e^d$ which is likely to happen in a scenario with small dimensional and large-scale dataset; 2) For general kernels with low effective dimension, the basis functions are updated sequentially in a data-adaptive fashion by sampling Nystr{ö}m points. In this case, our algorithm improves the computational trade-off known for online kernel regression. △ Less

Submitted 29 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

arXiv:1902.04931 [pdf, other]

doi 10.1016/j.nucengdes.2019.110391

Bayesian inference and non-linear extensions of the CIRCE method for quantifying the uncertainty of closure relationships integrated into thermal-hydraulic system codes

Authors: Guillaume Damblin, Pierre Gaillard

Abstract: Uncertainty Quantification of closure relationships integrated into thermal-hydraulic system codes is a critical prerequisite in applying the Best-Estimate Plus Uncertainty (BEPU) methodology for nuclear safety and licensing processes.The purpose of the CIRCE method is to estimate the (log)-Gaussian probability distribution of a multiplicative factor applied to a reference closure relationship in… ▽ More Uncertainty Quantification of closure relationships integrated into thermal-hydraulic system codes is a critical prerequisite in applying the Best-Estimate Plus Uncertainty (BEPU) methodology for nuclear safety and licensing processes.The purpose of the CIRCE method is to estimate the (log)-Gaussian probability distribution of a multiplicative factor applied to a reference closure relationship in order to assess its uncertainty. Even though this method has been implemented with success in numerous physical scenarios, it can still suffer from substantial limitations such as the linearity assumption and the difficulty of properly taking into account the inherent statistical uncertainty. In the paper, we will extend the CIRCE method in two aspects. On the one hand, we adopt the Bayesian setting putting prior probability distributions on the parameters of the (log)-Gaussian distribution. The posterior distribution of the parameters is then computed with respect to an experimental database by means of Markov Chain Monte Carlo (MCMC) algorithms. On the other hand, we tackle the more general setting where the simulations do not move linearly against the multiplicative factor(s). MCMC algorithms then become time-prohibitive when the thermal-hydraulic simulations exceed a few minutes. This handicap is overcome by using Gaussian process (GP) emulators which can yield both reliable and fast predictions of the simulations. The GP-based MCMC algorithms will be applied to quantify the uncertainty of two condensation closure relationships at a safety injection with respect to a database of experimental tests. The thermal-hydraulic simulations will be run with the CATHARE 2 computer code. △ Less

Submitted 9 March, 2020; v1 submitted 13 February, 2019; originally announced February 2019.

Comments: 37 pages, 5 figures

MSC Class: 62F15

Journal ref: Nuclear Engineering and Design, 2020, Volume 359, 1 April 2020, 110391

arXiv:1901.09532 [pdf, other]

Target Tracking for Contextual Bandits: Application to Demand Side Management

Authors: Margaux Brégère, Pierre Gaillard, Yannig Goude, Gilles Stoltz

Abstract: We propose a contextual-bandit approach for demand side management by offering price incentives. More precisely, a target mean consumption is set at each round and the mean consumption is modeled as a complex function of the distribution of prices sent and of some contextual variables such as the temperature, weather, and so on. The performance of our strategies is measured in quadratic losses thr… ▽ More We propose a contextual-bandit approach for demand side management by offering price incentives. More precisely, a target mean consumption is set at each round and the mean consumption is modeled as a complex function of the distribution of prices sent and of some contextual variables such as the temperature, weather, and so on. The performance of our strategies is measured in quadratic losses through a regret criterion. We offer $T^{2/3}$ upper bounds on this regret (up to poly-logarithmic terms)---and even faster rates under stronger assumptions---for strategies inspired by standard strategies for contextual bandits (like LinUCB, see Li et al., 2010). Simulations on a real data set gathered by UK Power Networks, in which price incentives were offered, show that our strategies are effective and may indeed manage demand response by suitably picking the price levels. △ Less

Submitted 13 May, 2019; v1 submitted 28 January, 2019; originally announced January 2019.

Journal ref: ICML 2019 (Thirty-sixth International Conference on Machine Learning), Jun 2019, Long Beach, United States

arXiv:1805.11386 [pdf, ps, other]

Uniform regret bounds over $R^d$ for the sequential linear regression problem with the square loss

Authors: Pierre Gaillard, Sébastien Gerchinovitz, Malo Huard, Gilles Stoltz

Abstract: We consider the setting of online linear regression for arbitrary deterministic sequences, with the square loss. We are interested in the aim set by Bartlett et al. (2015): obtain regret bounds that hold uniformly over all competitor vectors. When the feature sequence is known at the beginning of the game, they provided closed-form regret bounds of $2d B^2 \ln T + \mathcal{O}_T(1)$, where $T$ is t… ▽ More We consider the setting of online linear regression for arbitrary deterministic sequences, with the square loss. We are interested in the aim set by Bartlett et al. (2015): obtain regret bounds that hold uniformly over all competitor vectors. When the feature sequence is known at the beginning of the game, they provided closed-form regret bounds of $2d B^2 \ln T + \mathcal{O}_T(1)$, where $T$ is the number of rounds and $B$ is a bound on the observations. Instead, we derive bounds with an optimal constant of $1$ in front of the $d B^2 \ln T$ term. In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 \ln T$ for any individual sequence of features and bounded observations. All our algorithms are variants of the online non-linear ridge regression forecaster, either with a data-dependent regularization or with almost no regularization. △ Less

Submitted 25 February, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: Proceedings of ALT'2019

arXiv:1805.09174 [pdf, other]

Efficient online algorithms for fast-rate regret bounds under sparsity

Authors: Pierre Gaillard, Olivier Wintenberger

Abstract: We consider the online convex optimization problem. In the setting of arbitrary sequences and finite set of parameters, we establish a new fast-rate quantile regret bound. Then we investigate the optimization into the L1-ball by discretizing the parameter space. Our algorithm is projection free and we propose an efficient solution by restarting the algorithm on adaptive discretization grids. In th… ▽ More We consider the online convex optimization problem. In the setting of arbitrary sequences and finite set of parameters, we establish a new fast-rate quantile regret bound. Then we investigate the optimization into the L1-ball by discretizing the parameter space. Our algorithm is projection free and we propose an efficient solution by restarting the algorithm on adaptive discretization grids. In the adversarial setting, we develop an algorithm that achieves several rates of convergence with different dependencies on the sparsity of the objective. In the i.i.d. setting, we establish new risk bounds that are adaptive to the sparsity of the problem and to the regularity of the risk (ranging from a rate 1 / $\sqrt T$ for general convex risk to 1 /T for strongly convex risk). These results generalize previous works on sparse online learning. They are obtained under a weak assumption on the risk (Łojasiewicz's assumption) that allows multiple optima which is crucial when dealing with degenerate situations. △ Less

Submitted 23 May, 2018; originally announced May 2018.

arXiv:1805.08531 [pdf, other]

Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

Authors: Raphaël Berthier, Francis Bach, Pierre Gaillard

Abstract: Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in… ▽ More Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in which the agents live. This contrasts with previous work that required the spectral gap of the network as a parameter, or suffered from slow mixing. Our method shows an important improvement over existing algorithms in the non-asymptotic regime, i.e., when the values are far from being fully mixed in the network. Our approach stems from a polynomial-based point of view on gossip algorithms, as well as an approximation of the spectral measure of the graphs with a Jacobi measure. We show the power of the approach with simulations on various graphs, and with performance guarantees on graphs of known spectral dimension, such as grids and random percolation bonds. An extension of this work to distributed Laplacian solvers is discussed. As a side result, we also use the polynomial-based point of view to show the convergence of the message passing algorithm for gossip of Moallemi \& Van Roy on regular graphs. The explicit computation of the rate of the convergence shows that message passing has a slow rate of convergence on graphs with small spectral gap. △ Less

Submitted 11 June, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

arXiv:1702.08211 [pdf, ps, other]

Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

Authors: Nicolò Cesa-Bianchi, Pierre Gaillard, Claudio Gentile, Sébastien Gerchinovitz

Abstract: We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz… ▽ More We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz and semi-Lipschitz losses with regret bounds improving on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound. △ Less

Submitted 30 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

Comments: This document is the full version of an extended abstract accepted for presentation at COLT 2017

arXiv:1610.05022 [pdf, other]

Sparse Accelerated Exponential Weights

Authors: Pierre Gaillard, Olivier Wintenberger

Abstract: We consider the stochastic optimization problem where a convex function is minimized observing recursively the gradients. We introduce SAEW, a new procedure that accelerates exponential weights procedures with the slow rate $1/\sqrt{T}$ to procedures achieving the fast rate $1/T$. Under the strong convexity of the risk, we achieve the optimal rate of convergence for approximating sparse parameters… ▽ More We consider the stochastic optimization problem where a convex function is minimized observing recursively the gradients. We introduce SAEW, a new procedure that accelerates exponential weights procedures with the slow rate $1/\sqrt{T}$ to procedures achieving the fast rate $1/T$. Under the strong convexity of the risk, we achieve the optimal rate of convergence for approximating sparse parameters in $\mathbb{R}^d$. The acceleration is achieved by using successive averaging steps in an online fashion. The procedure also produces sparse estimators thanks to additional hard threshold steps. △ Less

Submitted 17 October, 2016; originally announced October 2016.

arXiv:1503.07899 [pdf, ps, other]

Multi-parametric solutions to the NLS equation

Authors: Pierre Gaillard

Abstract: The structure of the solutions to the one dimensional focusing nonlin-ear Schr{ö}dinger equation (NLS) for the order N in terms of quasi rational functions is given here. We first give the proof that the solutions can be expressed as a ratio of two wronskians of order 2N and then two determinants by an exponential depending on t with 2N -- 2 parameters. It also is proved that for the order N , the… ▽ More The structure of the solutions to the one dimensional focusing nonlin-ear Schr{ö}dinger equation (NLS) for the order N in terms of quasi rational functions is given here. We first give the proof that the solutions can be expressed as a ratio of two wronskians of order 2N and then two determinants by an exponential depending on t with 2N -- 2 parameters. It also is proved that for the order N , the solutions can be written as the product of an exponential depending on t by a quotient of two polynomials of degree N (N + 1) in x and t. The solutions depend on 2N -- 2 parameters and give when all these parameters are equal to 0, the analogue of the famous Peregrine breather PN. It is fundamental to note that in this representation at order N , all these solutions can be seen as deformations with 2N -- 2 parameters of the famous Peregrine breather PN. With this method, we already built Peregrine breathers until order N = 10, and their deformations depending on 2N -- 2 parameters. △ Less

Submitted 26 March, 2015; originally announced March 2015.

arXiv:1502.07697 [pdf, other]

A Chaining Algorithm for Online Nonparametric Regression

Authors: Pierre Gaillard, Sébastien Gerchinovitz

Abstract: We consider the problem of online nonparametric regression with arbitrary deterministic sequences. Using ideas from the chaining technique, we design an algorithm that achieves a Dudley-type regret bound similar to the one obtained in a non-constructive fashion by Rakhlin and Sridharan (2014). Our regret bound is expressed in terms of the metric entropy in the sup norm, which yields optimal guaran… ▽ More We consider the problem of online nonparametric regression with arbitrary deterministic sequences. Using ideas from the chaining technique, we design an algorithm that achieves a Dudley-type regret bound similar to the one obtained in a non-constructive fashion by Rakhlin and Sridharan (2014). Our regret bound is expressed in terms of the metric entropy in the sup norm, which yields optimal guarantees when the metric and sequential entropies are of the same order of magnitude. In particular our algorithm is the first one that achieves optimal rates for online regression over H{ö}lder balls. In addition we show for this example how to adapt our chaining algorithm to get a reasonable computational efficiency with similar regret guarantees (up to a log factor). △ Less

Submitted 1 July, 2015; v1 submitted 26 February, 2015; originally announced February 2015.

Comments: Published in the proceedings of COLT 2015: http://jmlr.org/proceedings/papers/v40/Gaillard15.html

arXiv:1405.1533 [pdf, ps, other]

A consistent deterministic regression tree for non-parametric prediction of time series

Authors: Pierre Gaillard, Paul Baudin

Abstract: We study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and build a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz constant predictors. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic… ▽ More We study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and build a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz constant predictors. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic processes. △ Less

Submitted 8 May, 2014; v1 submitted 7 May, 2014; originally announced May 2014.

arXiv:1402.2044 [pdf, ps, other]

A Second-order Bound with Excess Losses

Authors: Pierre Gaillard, Gilles Stoltz, Tim Van Erven

Abstract: We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates. These bounds are in terms of excess losses, the differences between the instantaneous losses suffered by the algorith… ▽ More We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates. These bounds are in terms of excess losses, the differences between the instantaneous losses suffered by the algorithm and the ones of a given expert. We then demonstrate the interest of these bounds in the context of experts that report their confidences as a number in the interval [0,1] using a generic reduction to the standard setting. We conclude by two other applications in the standard setting, which improve the known bounds in case of small excess losses and show a bounded regret against i.i.d. sequences of losses. △ Less

Submitted 10 February, 2014; originally announced February 2014.

arXiv:1207.1965 [pdf, other]

Forecasting electricity consumption by aggregating specialized experts

Authors: Marie Devaine, Pierre Gaillard, Yannig Goude, Gilles Stoltz

Abstract: We consider the setting of sequential prediction of arbitrary sequences based on specialized experts. We first provide a review of the relevant literature and present two theoretical contributions: a general analysis of the specialist aggregation rule of Freund et al. (1997) and an adaptation of fixed-share rules of Herbster and Warmuth (1998) in this setting. We then apply these rules to the sequ… ▽ More We consider the setting of sequential prediction of arbitrary sequences based on specialized experts. We first provide a review of the relevant literature and present two theoretical contributions: a general analysis of the specialist aggregation rule of Freund et al. (1997) and an adaptation of fixed-share rules of Herbster and Warmuth (1998) in this setting. We then apply these rules to the sequential short-term (one-day-ahead) forecasting of electricity consumption; to do so, we consider two data sets, a Slovakian one and a French one, respectively concerned with hourly and half-hourly predictions. We follow a general methodology to perform the stated empirical studies and detail in particular tuning issues of the learning parameters. The introduced aggregation rules demonstrate an improved accuracy on the data sets at hand; the improvements lie in a reduced mean squared error but also in a more robust behavior with respect to large occasional errors. △ Less

Submitted 9 July, 2012; originally announced July 2012.

Comments: 33 pages

arXiv:1202.3323 [pdf, ps, other]

Mirror Descent Meets Fixed Share (and feels no regret)

Authors: Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Abstract: Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other rel… ▽ More Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension. This is done using either a carefully designed projection or by a weight sharing technique. Via a novel unified analysis, we show that these two approaches deliver essentially equivalent bounds on a notion of regret generalizing shifting, adaptive, discounted, and other related regrets. Our analysis also captures and extends the generalized weight sharing technique of Bousquet and Warmuth, and can be refined in several ways, including improvements for small losses and adaptive tuning of parameters. △ Less

Submitted 27 September, 2012; v1 submitted 15 February, 2012; originally announced February 2012.

Journal ref: NIPS 2012, Lake Tahoe : United States (2012)

arXiv:0809.1918 [pdf, ps, other]

The Gauss-Dirichlet Orbit Number

Authors: Pierre-Yves Gaillard

Abstract: Dirichlet computed in some particular cases the number of equivalence classes of representations of a nonzero integer by a representative system for the integral binary quadratic forms of a given discriminant. We complete this computation. Dirichlet computed in some particular cases the number of equivalence classes of representations of a nonzero integer by a representative system for the integral binary quadratic forms of a given discriminant. We complete this computation. △ Less

Submitted 20 September, 2008; v1 submitted 11 September, 2008; originally announced September 2008.

Comments: I changed one word in the abstract

arXiv:0809.0550 [pdf, ps, other]

Hurwitz's Freeness Property

Authors: Pierre-Yves Gaillard

Abstract: The groupoid attached to the action of PSL(2,Z) on the irrational reals by linear fractional transformations is free. The groupoid attached to the action of PSL(2,Z) on the irrational reals by linear fractional transformations is free. △ Less

Submitted 27 October, 2008; v1 submitted 3 September, 2008; originally announced September 2008.

Comments: LaTeX, 3 pages. Minor change

arXiv:math/0510369 [pdf, ps, other]

Integral Congruences

Authors: Pierre-Yves Gaillard

Abstract: To each i, j belonging to some set of integers, attach the integer a(i,j). Are there integers x(i) such that x(j)-x(i) is congruent to a(i,j) mod (i,j)? A necessary condition is that a(i,j)+a(j,k) be congruent to a(i,k) mod (i,j,k). This condition is sufficient. To each i, j belonging to some set of integers, attach the integer a(i,j). Are there integers x(i) such that x(j)-x(i) is congruent to a(i,j) mod (i,j)? A necessary condition is that a(i,j)+a(j,k) be congruent to a(i,k) mod (i,j,k). This condition is sufficient. △ Less

Submitted 29 October, 2005; v1 submitted 18 October, 2005; originally announced October 2005.

Comments: 9 pages, LaTeX. Results have been improved

arXiv:math/0502574 [pdf, ps, other]

The functional equation of the zeta function of a global field

Authors: Pierre-Yves Gaillard

Abstract: We write down the functional equation of the zeta function of a global field. This equation is implicit in Weil's ``Basic Number Theory''. We write down the functional equation of the zeta function of a global field. This equation is implicit in Weil's ``Basic Number Theory''. △ Less

Submitted 28 February, 2005; originally announced February 2005.

Comments: 2 pages, LaTeX

arXiv:math/0412133 [pdf, ps, other]

Around the Chinese Remainder Theorem

Authors: Jean-Marie Didry, Pierre-Yves Gaillard

Abstract: We prove an explicit Chinese Remainder Theorem for one variable polynomials with complex coefficients, and derive some consequences. We prove an explicit Chinese Remainder Theorem for one variable polynomials with complex coefficients, and derive some consequences. △ Less

Submitted 24 December, 2008; v1 submitted 7 December, 2004; originally announced December 2004.

Comments: New section, titled "Wronski"; 22 pages, LaTeX. Last version available at http://www.iecn.u-nancy.fr/~gaillard/DIVERS/Chinese.Remainder.Theorem/

arXiv:math/0405053 [pdf, ps, other]

There are only countably many sets

Authors: Pierre-Yves Gaillard

Abstract: We prove that Bourbaki's mathematics is incomplete. We prove that Bourbaki's mathematics is incomplete. △ Less

Submitted 4 May, 2004; originally announced May 2004.

Comments: 4 pages, LaTeX

arXiv:math/0309296 [pdf, ps, other]

Grothendieck categories and support conditions

Authors: Pierre-Yves Gaillard

Abstract: We give examples of pairs (G1,G2) where G1 is a Grothendieck category and G2 a full Grothendieck subcategory of G1, the inclusion G2 --> G1 being denoted i, for which R^+i : D^+G2 --> D^+G1 (or even Ri : DG2 --> DG1) is a full embedding. This yields generalizations of some results of Bernstein and Lunts, and of Cline, Parshall and Scott. We give examples of pairs (G1,G2) where G1 is a Grothendieck category and G2 a full Grothendieck subcategory of G1, the inclusion G2 --> G1 being denoted i, for which R^+i : D^+G2 --> D^+G1 (or even Ri : DG2 --> DG1) is a full embedding. This yields generalizations of some results of Bernstein and Lunts, and of Cline, Parshall and Scott. △ Less

Submitted 16 March, 2004; v1 submitted 18 September, 2003; originally announced September 2003.

Comments: 11 pages, LaTeX, minor changes

arXiv:math/0303285 [pdf, ps, other]

About a Theorem of Cline, Parshall and Scott

Authors: Pierre-Yves Gaillard

Abstract: We give a simple proof of a Theorem of Cline, Parshall and Scott about the category O of BGG and suggest an analog for Harish-Chandra modules. We give a simple proof of a Theorem of Cline, Parshall and Scott about the category O of BGG and suggest an analog for Harish-Chandra modules. △ Less

Submitted 24 March, 2003; originally announced March 2003.

Comments: 6 pages, LaTeX. Related material is available at http://www.iecn.u-nancy.fr/~gaillard

arXiv:math/0004006 [pdf, ps, other]

A naive question about quantum groups

Authors: Pierre-Yves Gaillard

Abstract: The category O of BGG can be thought of as a category of sheaves over the flag variety F in the sense that the algebra E of self-extensions of the trivial object of O is isomorphic to the cohomology algebra of the flag variety. A deformation of O' - giving rise to a "new" algebra E' - can be thought of as a (possibly noncommutative) deformation F' of F. The mythic variety F', being a deformation… ▽ More The category O of BGG can be thought of as a category of sheaves over the flag variety F in the sense that the algebra E of self-extensions of the trivial object of O is isomorphic to the cohomology algebra of the flag variety. A deformation of O' - giving rise to a "new" algebra E' - can be thought of as a (possibly noncommutative) deformation F' of F. The mythic variety F', being a deformation of F, should have the same homotopy type as F, and E' should therefore be isomorphic to E. △ Less

Submitted 22 July, 2009; v1 submitted 2 April, 2000; originally announced April 2000.

Comments: 4 pages, TeX

arXiv:math/0003183 [pdf, ps, other]

A simple question about a complicated object

Authors: Pierre-Yves Gaillard

Abstract: Let n and k be positive integers with and k < n. Then of course SU(k,1) is contained into SU(n,1). Moreover, which is less clear - but proved by Khoroshkin -, the representation theory of SU(k,1) at the generalized infinitesimal character of the trivial module can be fully (and even Ext-fully) embedded into that of SU(n,1). Here is the obvious bet: This embedding is implemented by the cohomo… ▽ More Let n and k be positive integers with and k < n. Then of course SU(k,1) is contained into SU(n,1). Moreover, which is less clear - but proved by Khoroshkin -, the representation theory of SU(k,1) at the generalized infinitesimal character of the trivial module can be fully (and even Ext-fully) embedded into that of SU(n,1). Here is the obvious bet: This embedding is implemented by the cohomological induction functor. I conjecture that a similar phenomenon occurs whenever SU(k,1) is a Levi factor of a theta stable parabolic subalgebra of a reductive group. △ Less

Submitted 22 July, 2009; v1 submitted 28 March, 2000; originally announced March 2000.

Comments: 4 pages, TeX

Showing 1–50 of 55 results for author: Gaillard, P