-
Long-Tail Theory under Gaussian Mixtures
Authors:
Arman Bolatov,
Maxat Tezekbayev,
Igor Melnykov,
Artur Pak,
Vassilina Nikoulina,
Zhenisbek Assylbekov
Abstract:
We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered…
▽ More
We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.
△ Less
Submitted 24 July, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Optimal Algorithms for the Inhomogeneous Spiked Wigner Model
Authors:
Aleksandr Pak,
Justin Ko,
Florent Krzakala
Abstract:
In this paper, we study a spiked Wigner problem with an inhomogeneous noise profile. Our aim in this problem is to recover the signal passed through an inhomogeneous low-rank matrix channel. While the information-theoretic performances are well-known, we focus on the algorithmic problem. We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem and show that its rigoro…
▽ More
In this paper, we study a spiked Wigner problem with an inhomogeneous noise profile. Our aim in this problem is to recover the signal passed through an inhomogeneous low-rank matrix channel. While the information-theoretic performances are well-known, we focus on the algorithmic problem. We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem and show that its rigorous state evolution coincides with the information-theoretic optimal Bayes fixed-point equations. We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random. Finally, from the adapted AMP iteration we deduce a simple and efficient spectral method that can be used to recover the transition for matrices with general variance profiles. This spectral method matches the conjectured optimal computational phase transition.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Inference on P(Y<X) in Bivariate Rayleigh Distribution
Authors:
Abbas Pak,
Nayereh Bagheri Khoolenjani,
Ali Akbar Jafari
Abstract:
This paper deals with the estimation of reliability $R=P(Y<X)$ when $X$ is a random strength of a component subjected to a random stress $Y$ and $(X,Y)$ follows a bivariate Rayleigh distribution. The maximum likelihood estimator of $R$ and its asymptotic distribution are obtained. An asymptotic confidence interval of $R$ is constructed using the asymptotic distribution. Also, two confidence interv…
▽ More
This paper deals with the estimation of reliability $R=P(Y<X)$ when $X$ is a random strength of a component subjected to a random stress $Y$ and $(X,Y)$ follows a bivariate Rayleigh distribution. The maximum likelihood estimator of $R$ and its asymptotic distribution are obtained. An asymptotic confidence interval of $R$ is constructed using the asymptotic distribution. Also, two confidence intervals are proposed based on Bootstrap method and a computational approach. Testing of the reliability based on asymptotic distribution of $R$ is discussed. Simulation study to investigate performance of the confidence intervals and tests has been carried out. Also, a numerical example is given to illustrate the proposed approaches.
△ Less
Submitted 18 May, 2014;
originally announced May 2014.