-
Median Clip** for Zeroth-order Non-Smooth Convex Optimization and Multi Arm Bandit Problem with Heavy-tailed Symmetric Noise
Authors:
Nikita Kornilov,
Yuriy Dorn,
Aleksandr Lobanov,
Nikolay Kutuzov,
Innokentiy Shibaev,
Eduard Gorbunov,
Alexander Gasnikov,
Alexander Nazin
Abstract:
In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known…
▽ More
In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known ones for the case of the bounded variance. To achieve this, we use the mini-batched median estimate of the sampled gradient differences, apply gradient clip** to the result, and plug in the final estimate into the accelerated method. We apply this technique to the stochastic multi-armed bandit problem with heavy-tailed distribution of rewards and achieve $O(\sqrt{Td})$ regret by incorporating noise symmetry.
△ Less
Submitted 25 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Implicitly normalized forecaster with clip** for linear and non-linear heavy-tailed multi-armed bandits
Authors:
Yuriy Dorn,
Nikita Kornilov,
Nikolay Kutuzov,
Alexander Nazin,
Eduard Gorbunov,
Alexander Gasnikov
Abstract:
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm…
▽ More
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clip** (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.
△ Less
Submitted 26 December, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method
Authors:
Anatoli Juditsky,
Alexander Nazin,
Arkadi Nemirovsky,
Alexandre Tsybakov
Abstract:
We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic a…
▽ More
We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Two-Armed Bandit Problem, Data Processing, and Parallel Version of the Mirror Descent Algorithm
Authors:
Alexander Kolnogorov,
Alexander Nazin,
Dmitry Shiyan
Abstract:
We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. One should determine the most effective method and provide its predominant application. To this end we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the ord…
▽ More
We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. One should determine the most effective method and provide its predominant application. To this end we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the order $N^{1/2}$ with $N$ being the number of processed data. We improve significantly the theoretical estimate of the factor using Monte-Carlo simulations. Then we propose a parallel version of the MDA which allows processing of data by packets in a number of stages. The usage of parallel version of the MDA ensures that total time of data processing depends mostly on the number of packets but not on the total number of data. It is quite unexpectedly that the parallel version behaves unlike the ordinary one even if the number of packets is large. Moreover, the parallel version considerably improves control performance because it provides significantly smaller value of the minimax risk. We explain this result by considering another parallel modification of the MDA which behavior is close to behavior of the ordinary version. Our estimates are based on invariant descriptions of the algorithms. All estimates are obtained by Monte-Carlo simulations. It's worth noting that parallel version performs well only for methods with close efficiencies. If efficiencies differ significantly then one should use the combined algorithm which at initial sufficiently short control horizon uses ordinary version and then switches to the parallel version of the MDA.
△ Less
Submitted 28 May, 2017;
originally announced May 2017.
-
Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization
Authors:
Alexander Nazin
Abstract:
The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional…
▽ More
The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional averaging. A discrete algorithm of inertial MD is described. The theorem on the upper bound on the error in the objective function is proved.
△ Less
Submitted 2 May, 2017;
originally announced May 2017.
-
L1-optimal linear programming estimatorfor periodic frontier functions with Holder continuous derivative
Authors:
Alexander Nazin,
Stephane Girard
Abstract:
We propose a new estimator based on a linear programming method for smooth frontiers of sample points. The derivative of the frontier function is supposed to be Holder continuous.The estimator is defined as a linear combination of kernel functions being sufficiently regular, covering all the points and whose associated support is of smallest surface. The coefficients of the linear combination are…
▽ More
We propose a new estimator based on a linear programming method for smooth frontiers of sample points. The derivative of the frontier function is supposed to be Holder continuous.The estimator is defined as a linear combination of kernel functions being sufficiently regular, covering all the points and whose associated support is of smallest surface. The coefficients of the linear combination are computed by solving a linear programming problem. The L1- error between the estimated and the true frontier functionsis shown to be almost surely converging to zero, and the rate of convergence is proved to be optimal.
△ Less
Submitted 22 September, 2014;
originally announced September 2014.
-
Linear programming problems for frontier estimation
Authors:
Guillaume Bouchard,
Stéphane Girard,
Anatoli Iouditski,
Alexander Nazin
Abstract:
We propose new estimates for the frontier of a set of points. They are defined as kernel estimates covering all the points and whose associated support is of smallest surface. The estimates are written as linear combinatio- ns of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving a linear programming problem. In the general…
▽ More
We propose new estimates for the frontier of a set of points. They are defined as kernel estimates covering all the points and whose associated support is of smallest surface. The estimates are written as linear combinatio- ns of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving a linear programming problem. In the general case, the solution of the optimizat- ion problem is sparse, that is, only a few coefficients are non zero. The corresponding points play the role of support vectors in the statistical learning theory. The L_1 error between the estimated and the true frontiers is shown to be almost surely converging to zero, and the rate of convergence is provided. The behaviour of the estimates on finite sample situations is illustrated on some simulations.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Linear programming problems for l_1- optimal frontier estimation
Authors:
Stéphane Girard,
Anatoli Iouditski,
Alexander Nazin
Abstract:
We propose new optimal estimators for the Lipschitz frontier of a set of points. They are defined as kernel estimators being sufficiently regular, covering all the points and whose associated support is of smallest surface. The estimators are written as linear combinations of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solvi…
▽ More
We propose new optimal estimators for the Lipschitz frontier of a set of points. They are defined as kernel estimators being sufficiently regular, covering all the points and whose associated support is of smallest surface. The estimators are written as linear combinations of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving related linear programming problem. The L_1 error between the estimated and the true frontier function with a known Lipschitz constant is shown to be almost surely converging to zero, and the rate of convergence is proved to be optimal.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging
Authors:
Anatoli Juditsky,
Alexander Nazin,
Alexandre Tsybakov,
Nicolas Vayatis
Abstract:
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional ave…
▽ More
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order $\sqrt{(\log M)/t}$ with an explicit and small constant factor, where $M$ is the dimension of the problem and $t$ stands for the sample size. A similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss.
△ Less
Submitted 7 March, 2006; v1 submitted 16 May, 2005;
originally announced May 2005.