-
Robust Nonparametric Regression for Compositional Data: the Simplicial--Real case
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao González--Manteiga,
Francisco Gude Sampedro,
Ana Pérez--González
Abstract:
Statistical analysis on compositional data has gained a lot of attention due to their great potential of applications. A feature of these data is that they are multivariate vectors that lie in the simplex, that is, the components of each vector are positive and sum up a constant value. This fact poses a challenge to the analyst due to the internal dependency of the components which exhibit a spuri…
▽ More
Statistical analysis on compositional data has gained a lot of attention due to their great potential of applications. A feature of these data is that they are multivariate vectors that lie in the simplex, that is, the components of each vector are positive and sum up a constant value. This fact poses a challenge to the analyst due to the internal dependency of the components which exhibit a spurious negative correlation. Since classical multivariate techniques are not appropriate in this scenario, it is necessary to endow the simplex of a suitable algebraic-geometrical structure, which is a starting point to develop adequate methodology and strategies to handle compositions. We centered our attention on regression problems with real responses and compositional covariates and we adopt a nonparametric approach due to the flexibility it provides. Aware of the potential damage that outliers may produce, we introduce a robust estimator in the framework of nonparametric regression for compositional data. The performance of the estimators is investigated by means of a numerical study where different contamination schemes are simulated. Through a real data analysis the advantages of using a robust procedure is illustrated.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Robust estimation of heteroscedastic regression models: a brief overview and new proposals
Authors:
Conceição Amado,
Ana M. Bianco,
Graciela Boente,
Isabel M. Rodrigues
Abstract:
We collect robust proposals given in the field of regression models with heteroscedastic errors. Our motivation stems from the fact that the practitioner frequently faces the confluence of two phenomena in the context of data analysis: non--linearity and heteroscedasticity. The impact of heteroscedasticity on the precision of the estimators is well--known, however the conjunction of these two phen…
▽ More
We collect robust proposals given in the field of regression models with heteroscedastic errors. Our motivation stems from the fact that the practitioner frequently faces the confluence of two phenomena in the context of data analysis: non--linearity and heteroscedasticity. The impact of heteroscedasticity on the precision of the estimators is well--known, however the conjunction of these two phenomena makes handling outliers more difficult.
An iterative procedure to estimate the parameters of a heteroscedastic non--linear model is considered. The studied estimators combine weighted $MM-$regression estimators, to control the impact of high leverage points, and a robust method to estimate the parameters of the variance function.
△ Less
Submitted 7 November, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Asymptotic behaviour of penalized robust estimators in logistic regression when dimension increases
Authors:
Ana M. Bianco,
Graciela Boente,
Gonzalo Chebi
Abstract:
Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some…
▽ More
Penalized $M-$estimators for logistic regression models have been previously study for fixed dimension in order to obtain sparse statistical models and automatic variable selection. In this paper, we derive asymptotic results for penalized $M-$estimators when the dimension $p$ grows to infinity with the sample size $n$. Specifically, we obtain consistency and rates of convergence results, for some choices of the penalty function. Moreover, we prove that these estimators consistently select variables with probability tending to 1 and derive their asymptotic distribution.
△ Less
Submitted 4 August, 2023; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Estimators for covariate-adjusted ROC curves with missing biomarkers values
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao González-Manteiga,
Ana Pérez-González
Abstract:
In this paper, we present three estimators of the ROC curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The other one assumes that the covariates are related to the biomarkers through a regres…
▽ More
In this paper, we present three estimators of the ROC curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The other one assumes that the covariates are related to the biomarkers through a regression model which enables us to construct convolution--based estimators of the distribution and quantile functions. Consistency results are obtained under mild conditions. Through a numerical study we evaluate the finite sample performance of the different proposals. A real data set is also analysed.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
A robust approach for ROC curves with covariates
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao Gonzalez-Manteiga
Abstract:
The Receiver Operating Characteristic (ROC) curve is a useful tool that measures the discriminating power of a continuous variable or the accuracy of a pharmaceutical or medical test to distinguish between two conditions or classes. In certain situations, the practitioner may be able to measure some covariates related to the diagnostic variable which can increase the discriminating power of the RO…
▽ More
The Receiver Operating Characteristic (ROC) curve is a useful tool that measures the discriminating power of a continuous variable or the accuracy of a pharmaceutical or medical test to distinguish between two conditions or classes. In certain situations, the practitioner may be able to measure some covariates related to the diagnostic variable which can increase the discriminating power of the ROC curve. To protect against the existence of atypical data among the observations, a procedure to obtain robust estimators for the ROC curve in presence of covariates is introduced. The considered proposal focusses on a semiparametric approach which fits a location-scale regression model to the diagnostic variable and considers empirical estimators of the regression residuals distributions. Robust parametric estimators are combined with adaptive weighted empirical distribution estimators to down-weight the influence of outliers. The uniform consistency of the proposal is derived under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposed estimators with the classical ones both, in clean and contaminated samples. A real data set is also analysed.
△ Less
Submitted 23 July, 2022; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Robust location estimators in regression models with covariates and responses missing at random
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao González-Manteiga,
Ana Pérez-González
Abstract:
This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due t…
▽ More
This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due to missing values by assuming a missing at random condition. Three methods are considered to estimate the marginal distribution function which allows to obtain the $M-$location of interest: the well-known inverse probability weighting, a convolution--based method that makes use of the regression model and an augmented inverse probability weighting procedure that prevents against misspecification. The robust proposed estimators and the classical ones are compared through a numerical study under different missing models including clean and contaminated samples. We illustrate the estimators behaviour under a nonlinear model. A real data set is also analysed.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Penalized robust estimators in logistic regression with applications to sparse models
Authors:
Ana M. Bianco,
Graciela Boente,
Gonzalo Chebi
Abstract:
Sparse covariates are frequent in classification and regression problems and in these settings the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of non--zero parameters and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regres…
▽ More
Sparse covariates are frequent in classification and regression problems and in these settings the task of variable selection is usually of interest. As it is well known, sparse statistical models correspond to situations where there are only a small number of non--zero parameters and for that reason, they are much easier to interpret than dense ones. In this paper, we focus on the logistic regression model and our aim is to address robust and penalized estimation for the regression parameter. We introduce a family of penalized weighted $M-$type estimators for the logistic regression parameter that are stable against atypical data. We explore different penalizations functions and we introduce the so--called Sign penalization. This new penalty has the advantage that it depends only on one penalty parameter, avoiding arbitrary tuning constants. We discuss the variable selection capability of the given proposals as well as their asymptotic behaviour. Through a numerical study, we compare the finite sample performance of the proposal corresponding to different penalized estimators either robust or classical, under different scenarios. A robust cross--validation criterion is also presented. The analysis of two real data sets enables to investigate the stability of the penalized estimators to the presence of outliers.
△ Less
Submitted 12 February, 2020; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Robust estimation in single index models when the errors have a unimodal density with unknown nuisance parameter
Authors:
Claudio Agostinelli,
Ana M. Bianco,
Graciela Boente
Abstract:
In this paper, we propose a robust profile estimation method for the parametric and nonparametric components of a single index model when the errors have a strongly unimodal density with unknown nuisance parameter. Under regularity conditions, we derive consistency results for the link function estimators as well as consistency and asymptotic distribution results for the single index parameter est…
▽ More
In this paper, we propose a robust profile estimation method for the parametric and nonparametric components of a single index model when the errors have a strongly unimodal density with unknown nuisance parameter. Under regularity conditions, we derive consistency results for the link function estimators as well as consistency and asymptotic distribution results for the single index parameter estimators. Under a log--Gamma model, the sensitivity to anomalous observations is studied by means of the empirical influence curve. We also discuss a robust $K-$fold procedure to select the smoothing parameters involved. A numerical study is conducted to evaluate the small sample performance of the robust proposal with that of their classical relatives, both for errors following a log--Gamma model and for contaminated schemes. The numerical experiment shows the good robustness properties of the proposed estimators and the advantages of considering a robust approach instead of the classical one.
△ Less
Submitted 24 January, 2018; v1 submitted 15 September, 2017;
originally announced September 2017.
-
Conditional tests for elliptical symmetry using robust estimators
Authors:
Ana M. Bianco,
Graciela Boente,
Isabel M. Rodrigues
Abstract:
This paper presents a procedure for testing the hypothesis that the underlying distribution of the data is elliptical when using robust location and scatter estimators instead of the sample mean and covariance matrix. Under mild assumptions that include elliptical distributions without first moments, we derive the test statistic asymptotic behaviour under the null hypothesis and under special alte…
▽ More
This paper presents a procedure for testing the hypothesis that the underlying distribution of the data is elliptical when using robust location and scatter estimators instead of the sample mean and covariance matrix. Under mild assumptions that include elliptical distributions without first moments, we derive the test statistic asymptotic behaviour under the null hypothesis and under special alternatives. Numerical experiments allow to compare the behaviour of the tests based on the sample mean and covariance matrix with that based on robust estimators, under various elliptical distributions and different alternatives. This comparison was done looking not only at the observed level and power but we rather use the size-corrected relative exact power which provides a tool to assess the test statistic skill to detect alternatives. We also provide a numerical comparison with other competing tests.
△ Less
Submitted 19 February, 2015;
originally announced February 2015.