Search | arXiv e-print repository

arXiv:2402.02461 [pdf, other]

Median Clip** for Zeroth-order Non-Smooth Convex Optimization and Multi Arm Bandit Problem with Heavy-tailed Symmetric Noise

Authors: Nikita Kornilov, Yuriy Dorn, Aleksandr Lobanov, Nikolay Kutuzov, Innokentiy Shibaev, Eduard Gorbunov, Alexander Gasnikov, Alexander Nazin

Abstract: In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known… ▽ More In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known ones for the case of the bounded variance. To achieve this, we use the mini-batched median estimate of the sampled gradient differences, apply gradient clip** to the result, and plug in the final estimate into the accelerated method. We apply this technique to the stochastic multi-armed bandit problem with heavy-tailed distribution of rewards and achieve $O(\sqrt{Td})$ regret by incorporating noise symmetry. △ Less

Submitted 25 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2305.06743 [pdf, other]

Implicitly normalized forecaster with clip** for linear and non-linear heavy-tailed multi-armed bandits

Authors: Yuriy Dorn, Nikita Kornilov, Nikolay Kutuzov, Alexander Nazin, Eduard Gorbunov, Alexander Gasnikov

Abstract: The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm… ▽ More The Implicitly Normalized Forecaster (INF) algorithm is considered to be an optimal solution for adversarial multi-armed bandit (MAB) problems. However, most of the existing complexity results for INF rely on restrictive assumptions, such as bounded rewards. Recently, a related algorithm was proposed that works for both adversarial and stochastic heavy-tailed MAB settings. However, this algorithm fails to fully exploit the available data. In this paper, we propose a new version of INF called the Implicitly Normalized Forecaster with clip** (INF-clip) for MAB problems with heavy-tailed reward distributions. We establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, we show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms. △ Less

Submitted 26 December, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:1907.02707 [pdf, ps, other]

Algorithms of Robust Stochastic Optimization Based on Mirror Descent Method

Authors: Anatoli Juditsky, Alexander Nazin, Arkadi Nemirovsky, Alexandre Tsybakov

Abstract: We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic a… ▽ More We propose an approach to construction of robust non-Euclidean iterative algorithms for convex composite stochastic optimization based on truncation of stochastic gradients. For such algorithms, we establish sub-Gaussian confidence bounds under weak assumptions about the tails of the noise distribution in convex and strongly convex settings. Robust estimates of the accuracy of general stochastic algorithms are also proposed. △ Less

Submitted 5 July, 2019; originally announced July 2019.

Comments: Automation and Remote Control / Avtomatika i Telemekhanika, MAIK Nauka/Interperiodica, In press

arXiv:1705.09977 [pdf, other]

Two-Armed Bandit Problem, Data Processing, and Parallel Version of the Mirror Descent Algorithm

Authors: Alexander Kolnogorov, Alexander Nazin, Dmitry Shiyan

Abstract: We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. One should determine the most effective method and provide its predominant application. To this end we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the ord… ▽ More We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. One should determine the most effective method and provide its predominant application. To this end we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the order $N^{1/2}$ with $N$ being the number of processed data. We improve significantly the theoretical estimate of the factor using Monte-Carlo simulations. Then we propose a parallel version of the MDA which allows processing of data by packets in a number of stages. The usage of parallel version of the MDA ensures that total time of data processing depends mostly on the number of packets but not on the total number of data. It is quite unexpectedly that the parallel version behaves unlike the ordinary one even if the number of packets is large. Moreover, the parallel version considerably improves control performance because it provides significantly smaller value of the minimax risk. We explain this result by considering another parallel modification of the MDA which behavior is close to behavior of the ordinary version. Our estimates are based on invariant descriptions of the algorithms. All estimates are obtained by Monte-Carlo simulations. It's worth noting that parallel version performs well only for methods with close efficiencies. If efficiencies differ significantly then one should use the combined algorithm which at initial sufficiently short control horizon uses ordinary version and then switches to the parallel version of the MDA. △ Less

Submitted 28 May, 2017; originally announced May 2017.

Comments: 11 pages, 13 figures

arXiv:1705.01073 [pdf, ps, other]

Algorithms of Inertial Mirror Descent in Convex Problems of Stochastic Optimization

Authors: Alexander Nazin

Abstract: The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional… ▽ More The goal is to modify the known method of mirror descent (MD), proposed by A.S. Nemirovsky and D.B. Yudin in 1979. The paper shows the idea of a new, so-called inertial MD method with the example of a deterministic optimization problem in continuous time. In particular, in the Euclidean case, the heavy ball method by B.T. Polyak is realized. It is noted that the new method does not use additional averaging. A discrete algorithm of inertial MD is described. The theorem on the upper bound on the error in the objective function is proved. △ Less

Submitted 2 May, 2017; originally announced May 2017.

Comments: Poster presented at Workshop "Optimization and Statistical Learning", April 10-14, 2017, Les Houches, France

arXiv:1409.6230 [pdf, ps, other]

L1-optimal linear programming estimatorfor periodic frontier functions with Holder continuous derivative

Authors: Alexander Nazin, Stephane Girard

Abstract: We propose a new estimator based on a linear programming method for smooth frontiers of sample points. The derivative of the frontier function is supposed to be Holder continuous.The estimator is defined as a linear combination of kernel functions being sufficiently regular, covering all the points and whose associated support is of smallest surface. The coefficients of the linear combination are… ▽ More We propose a new estimator based on a linear programming method for smooth frontiers of sample points. The derivative of the frontier function is supposed to be Holder continuous.The estimator is defined as a linear combination of kernel functions being sufficiently regular, covering all the points and whose associated support is of smallest surface. The coefficients of the linear combination are computed by solving a linear programming problem. The L1- error between the estimated and the true frontier functionsis shown to be almost surely converging to zero, and the rate of convergence is proved to be optimal. △ Less

Submitted 22 September, 2014; originally announced September 2014.

Comments: arXiv admin note: text overlap with arXiv:1103.5913

arXiv:1103.5925 [pdf, ps, other]

Linear programming problems for frontier estimation

Authors: Guillaume Bouchard, Stéphane Girard, Anatoli Iouditski, Alexander Nazin

Abstract: We propose new estimates for the frontier of a set of points. They are defined as kernel estimates covering all the points and whose associated support is of smallest surface. The estimates are written as linear combinatio- ns of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving a linear programming problem. In the general… ▽ More We propose new estimates for the frontier of a set of points. They are defined as kernel estimates covering all the points and whose associated support is of smallest surface. The estimates are written as linear combinatio- ns of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving a linear programming problem. In the general case, the solution of the optimizat- ion problem is sparse, that is, only a few coefficients are non zero. The corresponding points play the role of support vectors in the statistical learning theory. The L_1 error between the estimated and the true frontiers is shown to be almost surely converging to zero, and the rate of convergence is provided. The behaviour of the estimates on finite sample situations is illustrated on some simulations. △ Less

Submitted 30 March, 2011; originally announced March 2011.

Journal ref: Automation and Remote Control, 65(1), 58-64, 2004

arXiv:1103.5913 [pdf, ps, other]

Linear programming problems for l_1- optimal frontier estimation

Authors: Stéphane Girard, Anatoli Iouditski, Alexander Nazin

Abstract: We propose new optimal estimators for the Lipschitz frontier of a set of points. They are defined as kernel estimators being sufficiently regular, covering all the points and whose associated support is of smallest surface. The estimators are written as linear combinations of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solvi… ▽ More We propose new optimal estimators for the Lipschitz frontier of a set of points. They are defined as kernel estimators being sufficiently regular, covering all the points and whose associated support is of smallest surface. The estimators are written as linear combinations of kernel functions applied to the points of the sample. The coefficients of the linear combination are then computed by solving related linear programming problem. The L_1 error between the estimated and the true frontier function with a known Lipschitz constant is shown to be almost surely converging to zero, and the rate of convergence is proved to be optimal. △ Less

Submitted 30 March, 2011; originally announced March 2011.

Journal ref: S. Girard, A. Iouditski & A. Nazin. "L1-optimal frontier estimation via linear programming", Automation and Remote Control, 66(12), 2000-2018, 2005

arXiv:math/0505333 [pdf, ps, other]

Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging

Authors: Anatoli Juditsky, Alexander Nazin, Alexandre Tsybakov, Nicolas Vayatis

Abstract: We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional ave… ▽ More We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order $\sqrt{(\log M)/t}$ with an explicit and small constant factor, where $M$ is the dimension of the problem and $t$ stands for the sample size. A similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss. △ Less

Submitted 7 March, 2006; v1 submitted 16 May, 2005; originally announced May 2005.

Comments: 29 pages; mai 2005

MSC Class: 62G99

Showing 1–9 of 9 results for author: Nazin, A