-
Optimal spanning tree reconstruction in symbolic regression
Authors:
Radoslav G. Neychev,
Innokentiy A. Shibaev,
Vadim V. Strijov
Abstract:
This paper investigates the problem of regression model generation. A model is a superposition of primitive functions. The model structure is described by a weighted colored graph. Each graph vertex corresponds to some primitive function. An edge assigns a superposition of two functions. The weight of an edge equals the probability of superposition. To generate an optimal model one has to reconstr…
▽ More
This paper investigates the problem of regression model generation. A model is a superposition of primitive functions. The model structure is described by a weighted colored graph. Each graph vertex corresponds to some primitive function. An edge assigns a superposition of two functions. The weight of an edge equals the probability of superposition. To generate an optimal model one has to reconstruct its structure from its graph adjacency matrix. The proposed algorithm reconstructs the~minimum spanning tree from the~weighted colored graph. This paper presents a novel solution based on the prize-collecting Steiner tree algorithm. This algorithm is compared with its alternatives.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Median Clip** for Zeroth-order Non-Smooth Convex Optimization and Multi Arm Bandit Problem with Heavy-tailed Symmetric Noise
Authors:
Nikita Kornilov,
Yuriy Dorn,
Aleksandr Lobanov,
Nikolay Kutuzov,
Innokentiy Shibaev,
Eduard Gorbunov,
Alexander Gasnikov,
Alexander Nazin
Abstract:
In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known…
▽ More
In this paper, we consider non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Unlike the existing high-probability results requiring the noise to have bounded $κ$-th moment with $κ\in (1,2]$, our results allow even heavier noise with any $κ> 0$, e.g., the noise distribution can have unbounded $1$-st moment. Moreover, our results match the best-known ones for the case of the bounded variance. To achieve this, we use the mini-batched median estimate of the sampled gradient differences, apply gradient clip** to the result, and plug in the final estimate into the accelerated method. We apply this technique to the stochastic multi-armed bandit problem with heavy-tailed distribution of rewards and achieve $O(\sqrt{Td})$ regret by incorporating noise symmetry.
△ Less
Submitted 25 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance
Authors:
Nikita Kornilov,
Ohad Shamir,
Aleksandr Lobanov,
Darina Dvinskikh,
Alexander Gasnikov,
Innokentiy Shibaev,
Eduard Gorbunov,
Samuel Horváth
Abstract:
In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle complexity, as well as the…
▽ More
In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle complexity, as well as the maximal admissible level of adversarial noise. However, the assumption of finite variance is burdensome and it might not hold in many practical scenarios. To address this, we demonstrate how to adapt a refined clipped version of the accelerated gradient (Stochastic Similar Triangles) method from (Sadiev et al., 2023) for a two-point zero-order oracle. This adaptation entails extending the batching technique to accommodate infinite variance -- a non-trivial task that stands as a distinct contribution of this paper.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise
Authors:
Eduard Gorbunov,
Marina Danilova,
Innokentiy Shibaev,
Pavel Dvurechensky,
Alexander Gasnikov
Abstract:
Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with…
▽ More
Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clip**. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
△ Less
Submitted 1 July, 2022; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Recent Theoretical Advances in Non-Convex Optimization
Authors:
Marina Danilova,
Pavel Dvurechensky,
Alexander Gasnikov,
Eduard Gorbunov,
Sergey Guminov,
Dmitry Kamzolov,
Innokentiy Shibaev
Abstract:
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex pro…
▽ More
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not be solved efficiently in a reasonable time. Then we give a list of problems that can be solved efficiently to find the global minimizer by exploiting the structure of the problem as much as it is possible. Another way to deal with non-convexity is to relax the goal from finding the global minimum to finding a stationary point or a local minimum. For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods. After that, we discuss quite general classes of non-convex problems, such as minimization of $α$-weakly-quasi-convex functions and functions that satisfy Polyak--Lojasiewicz condition, which still allow obtaining theoretical convergence guarantees of first-order methods. Then we consider higher-order and zeroth-order/derivative-free methods and their convergence rates for non-convex optimization problems.
△ Less
Submitted 26 November, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Zeroth-order methods for noisy Hölder-gradient functions
Authors:
Innokentiy Shibaev,
Pavel Dvurechensky,
Alexander Gasnikov
Abstract:
In this paper, we prove new complexity bounds for zeroth-order methods in non-convex optimization with inexact observations of the objective function values. We use the Gaussian smoothing approach of Nesterov and Spokoiny [2015] and extend their results, obtained for optimization methods for smooth zeroth-order non-convex problems, to the setting of minimization of functions with Hölder-continuous…
▽ More
In this paper, we prove new complexity bounds for zeroth-order methods in non-convex optimization with inexact observations of the objective function values. We use the Gaussian smoothing approach of Nesterov and Spokoiny [2015] and extend their results, obtained for optimization methods for smooth zeroth-order non-convex problems, to the setting of minimization of functions with Hölder-continuous gradient with noisy zeroth-order oracle, obtaining noise upper-bounds as well. We consider finite-difference gradient approximation based on normally distributed random Gaussian vectors and prove that gradient descent scheme based on this approximation converges to the stationary point of the smoothed function. We also consider convergence to the stationary point of the original (not smoothed) function and obtain bounds on the number of steps of the algorithm for making the norm of its gradient small. Additionally, we provide bounds for the level of noise in the zeroth-order oracle for which it is still possible to guarantee that the above bounds hold. We also consider separately the case of $ν= 1$ and show that in this case the dependence of the obtained bounds on the dimension can be improved.
△ Less
Submitted 13 January, 2021; v1 submitted 21 June, 2020;
originally announced June 2020.
-
Accelerator probes for new stable quarks
Authors:
Konstantin M. Belostky,
Maxim Yu. Khlopov,
Konstantin I. Shibaev
Abstract:
The nonbaryonic dark matter of the Universe can consist of new stable double charged particles $O^{--}$, bound with primordial helium in heavy neutral O-helium (OHe)"atoms" by ordinary Coulomb interaction. O-helium dark atoms can play the role of specific nuclear interacting dark matter and provide solution for the puzzles of dark matter searches. The successful development of composite dark matte…
▽ More
The nonbaryonic dark matter of the Universe can consist of new stable double charged particles $O^{--}$, bound with primordial helium in heavy neutral O-helium (OHe)"atoms" by ordinary Coulomb interaction. O-helium dark atoms can play the role of specific nuclear interacting dark matter and provide solution for the puzzles of dark matter searches. The successful development of composite dark matter scenarios appeals to experimental search for the charged constituents of dark atoms. If $O^{--}$ is a "heavy quark cluster" $\bar U \bar U \bar U$, its production at accelerators is virtually impossible and the strategy of heavy quark search is reduced to search for heavy stable hadrons, containing only single heavy quark (or antiquark). Estimates of production cross section of such particles at LHC are presented and the experimental signatures for new stable quarks are outlined.
△ Less
Submitted 15 November, 2011;
originally announced November 2011.
-
Composite Dark Matter and its Charged Constituents
Authors:
K. M. Belotsky,
M. Yu. Khlopov,
K. I. Shibaev
Abstract:
Stable charged heavy leptons and quarks can exist and hide in elusive atoms, bound by Coulomb attraction and playing the role of dark matter. However, in the expanding Universe it is not possible to recombine all the charged particles into such atoms, and the positively charged particles, which escape this recombination, bind with electrons in atoms of anomalous isotopes with pregalactic abundan…
▽ More
Stable charged heavy leptons and quarks can exist and hide in elusive atoms, bound by Coulomb attraction and playing the role of dark matter. However, in the expanding Universe it is not possible to recombine all the charged particles into such atoms, and the positively charged particles, which escape this recombination, bind with electrons in atoms of anomalous isotopes with pregalactic abundance, exceeding substantially the terrestrial upper limits. This abundance can not be reduced in the dense matter bodies, if negatively charged particles have charge -1. Therefore composite dark matter can involve only negatively charged particles with charge -2, while stable heavy particles with charge -1 should be excluded. Realistic scenarios of composite dark matter, avoiding this problem of anomalous isotope over-production, inevitably predict the existence of primordial "atoms", in which primordial helium traps all the free negatively charged heavy constituents with charge -2. Study of the possibility for such primordial heavy $α$ particle with compensated charge to exist as well as the search for the stable charged constituents in cosmic rays and accelerators provide crucial test for composite dark matter scenarios.
△ Less
Submitted 25 April, 2006;
originally announced April 2006.
-
Effects of new long-range interaction: Recombination of relic Heavy neutrinos and antineutrinos
Authors:
K. M. Belotsky,
M. Yu. Khlopov,
S. V. Legonkov,
K. I. Shibaev
Abstract:
If stable Heavy neutrinos of 4th generation possess their own Coulomb-like interaction, recombination of pairs of Heavy neutrinos and antineutrinos can play important role in their cosmological evolution and lead to observable consequences. In particular, effect of this new interaction in the annihilation of neutrino-antineutrino pairs can account for $γ$-flux observed by EGRET.
If stable Heavy neutrinos of 4th generation possess their own Coulomb-like interaction, recombination of pairs of Heavy neutrinos and antineutrinos can play important role in their cosmological evolution and lead to observable consequences. In particular, effect of this new interaction in the annihilation of neutrino-antineutrino pairs can account for $γ$-flux observed by EGRET.
△ Less
Submitted 27 April, 2005;
originally announced April 2005.
-
May Heavy hadrons of the 4th generation be hidden in our Universe while close to detection?
Authors:
K. Belotsky,
D. Fargion,
M. Yu. Khlopov,
R. V. Konoplich,
M. G. Ryskin,
K. I. Shibaev
Abstract:
Metastable quarks of 4th generation are predicted in the framework of heterotic string phenomenology. Their presence in heavy stable hadrons are usually strongly constrained; however their hidden compositions in Heavy doubly charged baryons here considered are found to be still allowable: we studied their primordial quark production in the early Universe, their freezing into cosmic Heavy hadrons…
▽ More
Metastable quarks of 4th generation are predicted in the framework of heterotic string phenomenology. Their presence in heavy stable hadrons are usually strongly constrained; however their hidden compositions in Heavy doubly charged baryons here considered are found to be still allowable: we studied their primordial quark production in the early Universe, their freezing into cosmic Heavy hadrons, their later annihilation into cosmic ray as well as their relic presence in our Universe and among us on Earth. We discuss also their possible production in present or future accelerators. Indeed if the lightest quarks and antiquarks of the 4th generation are stored in doubly charged baryons and neutral mesons, their lifetime can exceed the age of the Universe; the existence of such an anomalous Helium-like (and neutral Pion-like) stable particles may escape present experimental limits, while being close to present and future experimental test. On the contrary primordial abundance of lightest hadrons of the 4th generation with charge +1 can not decrease below the experimental upper limits on anomalous hydrogen and therefore (if stable) it is excluded. While 4th quark hadrons are rare, their presence may play different and surprising role in cosmic rays, muon and neutrino fluxes and cosmic electromagnetic spectra. Most of these traces are tiny, but just nearly detectable.
△ Less
Submitted 20 November, 2004;
originally announced November 2004.