Search | arXiv e-print repository

Statistical Properties of the log-cosh Loss Function Used in Machine Learning

Authors: Resve A. Saleh, A. K. Md. Ehsanes Saleh

Abstract: This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribu… ▽ More This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing. △ Less

Submitted 15 March, 2024; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: 10 pages, 17 figures

arXiv:2111.04805 [pdf, other]

Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression

Authors: Resve A. Saleh, A. K. Md. Ehsanes Saleh

Abstract: This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike fo… ▽ More This paper proposes a new method to address the long-standing problem of lack of monotonicity in estimation of the conditional and structural quantile function, also known as quantile crossing problem. Quantile regression is a very powerful tool in data science in general and econometrics in particular. Unfortunately, the crossing problem has been confounding researchers and practitioners alike for over 4 decades. Numerous attempts have been made to find a simple and general solution. This paper describes a unique and elegant solution to the problem based on a flexible check function that is easy to understand and implement in R and Python, while greatly reducing or even eliminating the crossing problem entirely. It will be very important in all areas where quantile regression is routinely used and may also find application in robust regression, especially in the context of machine learning. From this perspective, we also utilize the flexible check function to provide insights into the root causes of the crossing problem. △ Less

Submitted 24 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 8 pages, 14 figures, IEEE conference format

arXiv:1505.02913 [pdf, other]

Restricted LASSO and Double Shrinking

Authors: M. Norouzirad, M. Arashi, A. K. Md. Ehsanes Saleh

Abstract: In the context of multiple regression model, suppose that the vector parameter of interest βis subjected to lie in the subspace hypothesis Hβ= h, where this restriction is based on either additional information or prior knowledge. Then, the restricted estimator performs fairly well than the ordinary least squares one. In addition, when the number of variables is relatively large with respect to ob… ▽ More In the context of multiple regression model, suppose that the vector parameter of interest βis subjected to lie in the subspace hypothesis Hβ= h, where this restriction is based on either additional information or prior knowledge. Then, the restricted estimator performs fairly well than the ordinary least squares one. In addition, when the number of variables is relatively large with respect to observations, the use of least absolute shrinkage and selection operator (LASSO) estimator is suggested for variable selection purposes. In this paper, we deffine a restricted LASSO estimator and configure three classes of LASSO-type estimators to fulfill both variable selection and restricted estimation. Asymptotic performance of the proposed estimators are studied and a simulation is conducted to analyze asymptotic relative efficiencies. The application of our result is considered for the prostate dataset where the expected prediction errors and risks are compared. It has been shown that the proposed shrunken LASSO estimators, resulted from double shrinking methodology, perform better than the classical LASSO. △ Less

Submitted 12 May, 2015; originally announced May 2015.

Comments: 20 pages; 4 figures, 5 tables

MSC Class: Primary: 62J07; 62F12; Secondary: 62F30

arXiv:1503.06910 [pdf, ps, other]

Penalty, Shrinkage, and Preliminary Test Estimators under Full Model Hypothesis

Authors: Enayetur Raheem, A. K. Md. Ehsanes Saleh

Abstract: This paper considers a multiple regression model and compares, under full model hypothesis, analytically as well as by simulation, the performance characteristics of some popular penalty estimators such as ridge regression, LASSO, adaptive LASSO, SCAD, and elastic net versus Least Squares Estimator, restricted estimator, preliminary test estimator, and Stein-type estimators when the dimension of t… ▽ More This paper considers a multiple regression model and compares, under full model hypothesis, analytically as well as by simulation, the performance characteristics of some popular penalty estimators such as ridge regression, LASSO, adaptive LASSO, SCAD, and elastic net versus Least Squares Estimator, restricted estimator, preliminary test estimator, and Stein-type estimators when the dimension of the parameter space is smaller than the sample space dimension. We find that RR uniformly dominates LSE, RE, PTE, SE and PRSE while LASSO, aLASSO, SCAD, and EN uniformly dominates LSE only. Further, it is observed that neither penalty estimators nor Stein-type estimator dominate one another. △ Less

Submitted 24 March, 2015; originally announced March 2015.

Comments: 28 pages, 4 figures, 10 tables. arXiv admin note: text overlap with arXiv:1503.05160

arXiv:1503.05160 [pdf, ps, other]

Improved LASSO

Authors: A. K. Md. Ehsanes Saleh, Enayetur Raheem

Abstract: We propose an improved LASSO estimation technique based on Stein-rule. We shrink classical LASSO estimator using preliminary test, shrinkage, and positive-rule shrinkage principle. Simulation results have been carried out for various configurations of correlation coefficients ($r$), size of the parameter vector ($β$), error variance ($σ^2$) and number of non-zero coefficients ($k$) in the model pa… ▽ More We propose an improved LASSO estimation technique based on Stein-rule. We shrink classical LASSO estimator using preliminary test, shrinkage, and positive-rule shrinkage principle. Simulation results have been carried out for various configurations of correlation coefficients ($r$), size of the parameter vector ($β$), error variance ($σ^2$) and number of non-zero coefficients ($k$) in the model parameter vector. Several real data examples have been used to demonstrate the practical usefulness of the proposed estimators. Our study shows that the risk ordering given by LSE $>$ LASSO $>$ Stein-type LASSO $>$ Stein-type positive rule LASSO, remains the same uniformly in the divergence parameter $Δ^2$ as in the traditional case. △ Less

Submitted 17 March, 2015; originally announced March 2015.

Comments: 17 pages, 12 figures, 24 tables

arXiv:1203.4427 [pdf, ps, other]

Regression Model With Elliptically Contoured Errors

Authors: M. Arashi, A. K. Md E. Saleh, S. M. M. Tabatabaey

Abstract: For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators. For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators. △ Less

Submitted 20 March, 2012; originally announced March 2012.

Comments: final version will be published in Statistics: A Journal of Theoretical and Applied Statistics

Showing 1–6 of 6 results for author: Saleh, A K M E