-
Tests for categorical data beyond Pearson: A distance covariance and energy distance approach
Authors:
Fernando Castro-Prado,
Wenceslao González-Manteiga,
Javier Costas,
Fernando Facal,
Dominic Edelmann
Abstract:
Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classica…
▽ More
Categorical variables are of uttermost importance in biomedical research. When two of them are considered, it is often the case that one wants to test whether or not they are statistically dependent. We show weaknesses of classical methods -- such as Pearson's and the G-test -- and we propose testing strategies based on distances that lack those drawbacks. We first develop this theory for classical two-dimensional contingency tables, within the context of distance covariance, an association measure that characterises general statistical independence of two variables. We then apply the same fundamental ideas to one-dimensional tables, namely to the testing for goodness of fit to a discrete distribution, for which we resort to an analogous statistic called energy distance. We prove that our methodology has desirable theoretical properties, and we show how we can calibrate the null distribution of our test statistics without resorting to any resampling technique. We illustrate all this in simulations, as well as with some real data examples, demonstrating the adequate performance of our approach for biostatistical practice.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Testing for linearity in scalar-on-function regression with responses missing at random
Authors:
Manuel Febrero-Bande,
Pedro Galeano,
Eduardo García-Portugués,
Wenceslao González-Manteiga
Abstract:
A goodness-of-fit test for the Functional Linear Model with Scalar Response (FLMSR) with responses Missing at Random (MAR) is proposed in this paper. The test statistic relies on a marked empirical process indexed by the projected functional covariate and its distribution under the null hypothesis is calibrated using a wild bootstrap procedure. The computation and performance of the test rely on h…
▽ More
A goodness-of-fit test for the Functional Linear Model with Scalar Response (FLMSR) with responses Missing at Random (MAR) is proposed in this paper. The test statistic relies on a marked empirical process indexed by the projected functional covariate and its distribution under the null hypothesis is calibrated using a wild bootstrap procedure. The computation and performance of the test rely on having an accurate estimator of the functional slope of the FLMSR when the sample has MAR responses. Three estimation methods based on the Functional Principal Components (FPCs) of the covariate are considered. First, the simplified method estimates the functional slope by simply discarding observations with missing responses. Second, the imputed method estimates the functional slope by imputing the missing responses using the simplified estimator. Third, the inverse probability weighted method incorporates the missing response generation mechanism when imputing. Furthermore, both cross-validation and LASSO regression are used to select the FPCs used by each estimator. Several Monte Carlo experiments are conducted to analyze the behavior of the testing procedure in combination with the functional slope estimators. Results indicate that estimators performing missing-response imputation achieve the highest power. The testing procedure is applied to check for linear dependence between the average number of sunny days per year and the mean curve of daily temperatures at weather stations in Spain.
△ Less
Submitted 22 March, 2024; v1 submitted 10 April, 2023;
originally announced April 2023.
-
A Comparative Review of Specification Tests for Diffusion Models
Authors:
Alejandra López-Pérez,
Manuel Febrero-Bande,
Wenceslao González-Manteiga
Abstract:
Diffusion models play an essential role in modeling continuous-time stochastic processes in the financial field. Therefore, several proposals have been developed in the last decades to test the specification of stochastic differential equations. We provide a survey to collect some developments on goodness-of-fit tests for diffusion models and implement these methods to illustrate their finite samp…
▽ More
Diffusion models play an essential role in modeling continuous-time stochastic processes in the financial field. Therefore, several proposals have been developed in the last decades to test the specification of stochastic differential equations. We provide a survey to collect some developments on goodness-of-fit tests for diffusion models and implement these methods to illustrate their finite sample behavior, regarding size and power, by means of a simulation study. We also apply the ideas of distance correlation for testing independence to propose a test for the parametric specification of diffusion models, comparing its performance with the other methods and analyzing the effect of the curse of dimensionality. As real data examples, treasury securities with different maturities are considered.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Estimation and Specification Test for Diffusion Models with Stochastic Volatility
Authors:
Alejandra López-Pérez,
Manuel Febrero-Bande,
Wenceslao González-Manteiga
Abstract:
Given the importance of continuous-time stochastic volatility models to describe the dynamics of interest rates, we propose a goodness-of-fit test for the parametric form of the drift and diffusion functions, based on a marked empirical process of the residuals. The test statistics are constructed using a continuous functional (Kolmogorov-Smirnov and Cramér-von Mises) over the empirical processes.…
▽ More
Given the importance of continuous-time stochastic volatility models to describe the dynamics of interest rates, we propose a goodness-of-fit test for the parametric form of the drift and diffusion functions, based on a marked empirical process of the residuals. The test statistics are constructed using a continuous functional (Kolmogorov-Smirnov and Cramér-von Mises) over the empirical processes. In order to evaluate the proposed tests, we implement a simulation study, where a bootstrap method is considered for the calibration of the tests. As the estimation of diffusion models with stochastic volatility based on discretely sampled data has proven difficult, we address this issue by means of a Monte Carlo study for different estimation procedures. Finally, an application of the procedures to real data is provided.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Novel specification tests for additive concurrent model formulation based on martingale difference divergence
Authors:
Laura Freijeiro-González,
Manuel Febrero-Bande,
Wenceslao González-Manteiga
Abstract:
Novel significance tests are proposed for the quite general additive concurrent model formulation without the need of model, error structure preliminary estimation or the use of tuning parameters. Making use of the martingale difference divergence coefficient, we propose new tests to measure the conditional mean independence in the concurrent model framework taking under consideration all observed…
▽ More
Novel significance tests are proposed for the quite general additive concurrent model formulation without the need of model, error structure preliminary estimation or the use of tuning parameters. Making use of the martingale difference divergence coefficient, we propose new tests to measure the conditional mean independence in the concurrent model framework taking under consideration all observed time instants. In particular, global dependence tests to quantify the effect of a group of covariates in the response as well as partial ones to apply covariates selection are introduced. Their asymptotic distribution is obtained on each case and a bootstrap algorithm is proposed to compute its p-values in practice. These new procedures are tested by means of simulation studies and some real datasets analysis.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
A goodness-of-fit test for functional time series with applications to Ornstein-Uhlenbeck processes
Authors:
J. Álvarez-Liébana,
A. López-Pérez,
W. González-Manteiga,
M. Febrero-Bande
Abstract:
High-frequency financial data can be collected as a sequence of curves over time; for example, as intra-day price, currently one of the topics of greatest interest in finance. The Functional Data Analysis framework provides a suitable tool to extract the information contained in the shape of the daily paths, often unavailable from classical statistical methods. In this paper, a novel goodness-of-f…
▽ More
High-frequency financial data can be collected as a sequence of curves over time; for example, as intra-day price, currently one of the topics of greatest interest in finance. The Functional Data Analysis framework provides a suitable tool to extract the information contained in the shape of the daily paths, often unavailable from classical statistical methods. In this paper, a novel goodness-of-fit test for autoregressive Hilbertian (ARH) models, with unknown and general order, is proposed. The test imposes just the Hilbert-Schmidt assumption on the functional form of the autocorrelation operator, and the test statistic is formulated in terms of a Cramér-von Mises norm. A wild bootstrap resampling procedure is used for calibration, such that the finite sample behavior of the test, regarding power and size, is checked via a simulation study. Furthermore, we also provide a new specification test for diffusion models, such as Ornstein-Uhlenbeck processes, illustrated with an application to intra-day currency exchange rates. In particular, a two-stage methodology is proffered: firstly, we check if functional samples and their past values are related via ARH(1) model; secondly, under linearity, we perform a functional F-test.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Functional Classification of Bitcoin Addresses
Authors:
Manuel Febrero-Bande,
Wenceslao González-Manteiga,
Brenda Prallon,
Yuri F. Saporito
Abstract:
This paper proposes a classification model for predicting the main activity of bitcoin addresses based on their balances. Since the balances are functions of time, we apply methods from functional data analysis; more specifically, the features of the proposed classification model are the functional principal components of the data. Classifying bitcoin addresses is a relevant problem for two main r…
▽ More
This paper proposes a classification model for predicting the main activity of bitcoin addresses based on their balances. Since the balances are functions of time, we apply methods from functional data analysis; more specifically, the features of the proposed classification model are the functional principal components of the data. Classifying bitcoin addresses is a relevant problem for two main reasons: to understand the composition of the bitcoin market, and to identify addresses used for illicit activities. Although other bitcoin classifiers have been proposed, they focus primarily on network analysis rather than curve behavior. Our approach, on the other hand, does not require any network information for prediction. Furthermore, functional features have the advantage of being straightforward to build, unlike expert-built features. Results show improvement when combining functional features with scalar features, and similar accuracy for the models using those features separately, which points to the functional model being a good alternative when domain-specific knowledge is not available.
△ Less
Submitted 17 July, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
-
Estimators for covariate-adjusted ROC curves with missing biomarkers values
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao González-Manteiga,
Ana Pérez-González
Abstract:
In this paper, we present three estimators of the ROC curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The other one assumes that the covariates are related to the biomarkers through a regres…
▽ More
In this paper, we present three estimators of the ROC curve when missing observations arise among the biomarkers. Two of the procedures assume that we have covariates that allow to estimate the propensity and the estimators are obtained using an inverse probability weighting method or a smoothed version of it. The other one assumes that the covariates are related to the biomarkers through a regression model which enables us to construct convolution--based estimators of the distribution and quantile functions. Consistency results are obtained under mild conditions. Through a numerical study we evaluate the finite sample performance of the different proposals. A real data set is also analysed.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
A review of goodness-of-fit tests for models involving functional data
Authors:
Wenceslao González-Manteiga,
Rosa M. Crujeiras,
Eduardo García-Portugués
Abstract:
A sizable amount of goodness-of-fit tests involving functional data have appeared in the last decade. We provide a relatively compact revision of most of these contributions, within the independent and identically distributed framework, by reviewing goodness-of-fit tests for distribution and regression models with functional predictor and either scalar or functional response.
A sizable amount of goodness-of-fit tests involving functional data have appeared in the last decade. We provide a relatively compact revision of most of these contributions, within the independent and identically distributed framework, by reviewing goodness-of-fit tests for distribution and regression models with functional predictor and either scalar or functional response.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
A test for comparing conditional ROC curves with multidimensional covariates
Authors:
Arís Fanjul-Hevia,
Juan Carlos Pardo-Fernández,
Ingrid Van Keilegom,
Wenceslao González-Manteiga
Abstract:
The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in the literature to compare the discriminatory capability of different classification procedures based on diagnostic variables. The performance of these variables can be sometimes influenced by the presence of other covariates, and thus they should be taken into account when making the comparison. A new non-parame…
▽ More
The comparison of Receiver Operating Characteristic (ROC) curves is frequently used in the literature to compare the discriminatory capability of different classification procedures based on diagnostic variables. The performance of these variables can be sometimes influenced by the presence of other covariates, and thus they should be taken into account when making the comparison. A new non-parametric test is proposed here for testing the equality of two or more dependent ROC curves conditioned to the value of a multidimensional covariate. Projections are used for transforming the problem into a one-dimensional approach easier to handle. Simulations are carried out to study the practical performance of the new methodology. A real data set of patients with Pleural Effusion is analysed to illustrate this procedure.
△ Less
Submitted 8 February, 2021;
originally announced February 2021.
-
A critical review of LASSO and its derivatives for variable selection under dependence among covariates
Authors:
Laura Freijeiro-González,
Manuel Febrero-Bande,
Wenceslao González-Manteiga
Abstract:
We study the limitations of the well known LASSO regression as a variable selector when there exists dependence structures among covariates. We analyze both the classic situation with $n\geq p$ and the high dimensional framework with $p>n$. Restrictive properties of this methodology to guarantee optimality, as well as the inconveniences in practice, are analyzed. Examples of these drawbacks are sh…
▽ More
We study the limitations of the well known LASSO regression as a variable selector when there exists dependence structures among covariates. We analyze both the classic situation with $n\geq p$ and the high dimensional framework with $p>n$. Restrictive properties of this methodology to guarantee optimality, as well as the inconveniences in practice, are analyzed. Examples of these drawbacks are showed by means of a extensive simulation study, making use of different dependence scenarios. In order to search for improvements, a broad comparison with LASSO derivatives and alternatives is carried out. Eventually, we give some guidance about what procedures are the best in terms of the data nature.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Testing for genetic interactions in complex disease with distance correlation
Authors:
Fernando Castro-Prado,
Javier Costas,
Dominic Edelmann,
Wenceslao González-Manteiga,
David R. Penas
Abstract:
Understanding epistasis (genetic interaction) may shed some light on the genomic basis of common diseases, including disorders of maximum interest due to their high socioeconomic burden, like schizophrenia. Distance correlation is an association measure that characterises general statistical independence between random variables, not only the linear one. Here, we propose distance correlation as a…
▽ More
Understanding epistasis (genetic interaction) may shed some light on the genomic basis of common diseases, including disorders of maximum interest due to their high socioeconomic burden, like schizophrenia. Distance correlation is an association measure that characterises general statistical independence between random variables, not only the linear one. Here, we propose distance correlation as a novel tool for the detection of epistasis from case-control data of single-nucleotide polymorphisms (SNPs). On the methodological side, we highlight the derivation of the explicit asymptotic distribution of the test statistic. We show that this is the only way to obtain enough computational speed for the method to be used in practice, in a scenario where the resampling techniques found in the literature are impractical. Our simulations show satisfactory calibration of significance, as well as comparable or better power than existing methodology. We conclude with the application of our technique to a schizophrenia genetics dataset, obtaining biologically sound insights.
△ Less
Submitted 27 April, 2023; v1 submitted 9 December, 2020;
originally announced December 2020.
-
Nonparametric independence tests in metric spaces: What is known and what is not
Authors:
Fernando Castro-Prado,
Wenceslao González-Manteiga
Abstract:
Distance correlation is a recent extension of Pearson's correlation, that characterises general statistical independence between Euclidean-space-valued random variables, not only linear relations. This review delves into how and when distance correlation can be extended to metric spaces, combining the information that is available in the literature with some original remarks and proofs, in a way t…
▽ More
Distance correlation is a recent extension of Pearson's correlation, that characterises general statistical independence between Euclidean-space-valued random variables, not only linear relations. This review delves into how and when distance correlation can be extended to metric spaces, combining the information that is available in the literature with some original remarks and proofs, in a way that is comprehensible for any mathematical statistician.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Goodness-of-fit tests for functional linear models based on integrated projections
Authors:
Eduardo García-Portugués,
Javier Álvarez-Liébana,
Gonzalo Álvarez-Pérez,
Wenceslao González-Manteiga
Abstract:
Functional linear models are one of the most fundamental tools to assess the relation between two random variables of a functional or scalar nature. This contribution proposes a goodness-of-fit test for the functional linear model with functional response that neatly adapts to functional/scalar responses/predictors. In particular, the new goodness-of-fit test extends a previous proposal for scalar…
▽ More
Functional linear models are one of the most fundamental tools to assess the relation between two random variables of a functional or scalar nature. This contribution proposes a goodness-of-fit test for the functional linear model with functional response that neatly adapts to functional/scalar responses/predictors. In particular, the new goodness-of-fit test extends a previous proposal for scalar response. The test statistic is based on a convenient regularized estimator, is easy to compute, and is calibrated through an efficient bootstrap resampling. A graphical diagnostic tool, useful to visualize the deviations from the model, is introduced and illustrated with a novel data application. The R package goffda implements the proposed methods and allows for the reproducibility of the data application.
△ Less
Submitted 22 August, 2020;
originally announced August 2020.
-
A robust approach for ROC curves with covariates
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao Gonzalez-Manteiga
Abstract:
The Receiver Operating Characteristic (ROC) curve is a useful tool that measures the discriminating power of a continuous variable or the accuracy of a pharmaceutical or medical test to distinguish between two conditions or classes. In certain situations, the practitioner may be able to measure some covariates related to the diagnostic variable which can increase the discriminating power of the RO…
▽ More
The Receiver Operating Characteristic (ROC) curve is a useful tool that measures the discriminating power of a continuous variable or the accuracy of a pharmaceutical or medical test to distinguish between two conditions or classes. In certain situations, the practitioner may be able to measure some covariates related to the diagnostic variable which can increase the discriminating power of the ROC curve. To protect against the existence of atypical data among the observations, a procedure to obtain robust estimators for the ROC curve in presence of covariates is introduced. The considered proposal focusses on a semiparametric approach which fits a location-scale regression model to the diagnostic variable and considers empirical estimators of the regression residuals distributions. Robust parametric estimators are combined with adaptive weighted empirical distribution estimators to down-weight the influence of outliers. The uniform consistency of the proposal is derived under mild assumptions. A Monte Carlo study is carried out to compare the performance of the robust proposed estimators with the classical ones both, in clean and contaminated samples. A real data set is also analysed.
△ Less
Submitted 23 July, 2022; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Robust location estimators in regression models with covariates and responses missing at random
Authors:
Ana M. Bianco,
Graciela Boente,
Wenceslao González-Manteiga,
Ana Pérez-González
Abstract:
This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due t…
▽ More
This paper deals with robust marginal estimation under a general regression model when missing data occur in the response and also in some of covariates. The target is a marginal location parameter which is given through an $M-$functional. To obtain robust Fisher--consistent estimators, properly defined marginal distribution function estimators are considered. These estimators avoid the bias due to missing values by assuming a missing at random condition. Three methods are considered to estimate the marginal distribution function which allows to obtain the $M-$location of interest: the well-known inverse probability weighting, a convolution--based method that makes use of the regression model and an augmented inverse probability weighting procedure that prevents against misspecification. The robust proposed estimators and the classical ones are compared through a numerical study under different missing models including clean and contaminated samples. We illustrate the estimators behaviour under a nonlinear model. A real data set is also analysed.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
A goodness-of-fit test for the functional linear model with functional response
Authors:
Eduardo García-Portugués,
Javier Álvarez-Liébana,
Gonzalo Álvarez-Pérez,
Wenceslao González-Manteiga
Abstract:
The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to assess the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cramér-von Mises norm over a doubly-projected empirical process which, using…
▽ More
The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to assess the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cramér-von Mises norm over a doubly-projected empirical process which, using geometrical arguments, yields an easy-to-compute weighted quadratic norm. A resampling procedure calibrates the test through a wild bootstrap on the residuals and the use of convenient computational procedures. As a sideways contribution, and since the statistic requires a reliable estimator of the FLMFR, we discuss and compare several regularized estimators, providing a new one specifically convenient for our test. The finite sample behavior of the test is illustrated via a simulation study. Also, the new proposal is compared with previous significance tests. Two novel real datasets illustrate the application of the new test.
△ Less
Submitted 21 September, 2020; v1 submitted 17 September, 2019;
originally announced September 2019.
-
Smoothing-based tests with directional random variables
Authors:
Eduardo García-Portugués,
Rosa M. Crujeiras,
Wenceslao González-Manteiga
Abstract:
Testing procedures for assessing specific parametric model forms, or for checking the plausibility of simplifying assumptions, play a central role in the mathematical treatment of the uncertain. No certain answers are obtained by testing methods, but at least the uncertainty of these answers is properly quantified. This is the case for tests designed on the two most general data generating mechani…
▽ More
Testing procedures for assessing specific parametric model forms, or for checking the plausibility of simplifying assumptions, play a central role in the mathematical treatment of the uncertain. No certain answers are obtained by testing methods, but at least the uncertainty of these answers is properly quantified. This is the case for tests designed on the two most general data generating mechanisms in practice: distribution/density and regression models. Testing proposals are usually formulated on the Euclidean space, but important challenges arise in non-Euclidean settings, such as when directional variables (i.e., random vectors on the hypersphere) are involved. This work reviews some of the smoothing-based testing procedures for density and regression models that comprise directional variables. The asymptotic distributions of the revised proposals are presented, jointly with some numerical illustrations justifying the need of employing resampling mechanisms for effective test calibration.
△ Less
Submitted 21 September, 2020; v1 submitted 31 March, 2018;
originally announced April 2018.
-
Variable selection in Functional Additive Regression Models
Authors:
Manuel Febrero-Bande,
Wenceslao González-Manteiga,
Manuel Oviedo de la Fuente
Abstract:
This paper considers the problem of variable selection in regression models in the case of functional variables that may be mixed with other type of variables (scalar, multivariate, directional, etc.). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model based on the use of distance correlation proposed by \cite{Szekely2007}. For th…
▽ More
This paper considers the problem of variable selection in regression models in the case of functional variables that may be mixed with other type of variables (scalar, multivariate, directional, etc.). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model based on the use of distance correlation proposed by \cite{Szekely2007}. For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, ...) of each variable. The algorithm has shown quite promising results when applied to simulations and real data sets.
△ Less
Submitted 11 April, 2018; v1 submitted 2 January, 2018;
originally announced January 2018.
-
Testing first-order intensity model in non-homogeneous Poisson point processes with covariates
Authors:
M. I. Borrajo,
W. González-Manteiga,
M. D. Martínez-Miranda
Abstract:
Modelling the first-order intensity function is one of the main aims in point process theory, and it has been approached so far from different perspectives. One appealing model describes the intensity as a function of a spatial covariate. In the recent literature, estimation theory and several applications have been developed assuming this model, but without formally checking this assumption. In t…
▽ More
Modelling the first-order intensity function is one of the main aims in point process theory, and it has been approached so far from different perspectives. One appealing model describes the intensity as a function of a spatial covariate. In the recent literature, estimation theory and several applications have been developed assuming this model, but without formally checking this assumption. In this paper we address this problem for a non-homogeneous Poisson point process, by proposing a new test based on an $L^2$-distance. We also prove the asymptotic normality of the statistic and we suggest a bootstrap procedure to accomplish the calibration. Two applications with real data are presented and a simulation study to better understand the performance of our proposals is accomplished. Finally some possible extensions of the present work to non-Poisson processes and to a multi-dimensional covariate context are detailed.
△ Less
Submitted 2 July, 2018; v1 submitted 22 September, 2017;
originally announced September 2017.
-
Bootstrap** kernel intensity estimation for nonhomogeneous point processes depending on spatial covariates
Authors:
M. I. Borrajo,
W. González-Manteiga,
M. D. Martínez-Miranda
Abstract:
In the spatial point process context, kernel intensity estimation has been mainly restricted to exploratory analysis due to its lack of consistency. Different methods have been analysed to overcome this problem, and the inclusion of covariates resulted to be one possible solution. In this paper we focus on de\-fi\-ning a theoretical framework to derive a consistent kernel intensity estimator using…
▽ More
In the spatial point process context, kernel intensity estimation has been mainly restricted to exploratory analysis due to its lack of consistency. Different methods have been analysed to overcome this problem, and the inclusion of covariates resulted to be one possible solution. In this paper we focus on de\-fi\-ning a theoretical framework to derive a consistent kernel intensity estimator using covariates, as well as a consistent smooth bootstrap procedure. We define two new data-driven bandwidth selectors specifically designed for our estimator: a rule-of-thumb and a plug-in bandwidth based on our consistent bootstrap method. A simulation study is accomplished to understand the performance of our proposals in finite samples. Finally, we describe an application to a real data set consisting of the wildfires in Canada during June 2015, using meteorological information as covariates.
△ Less
Submitted 18 May, 2018; v1 submitted 9 March, 2017;
originally announced March 2017.
-
Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes
Authors:
Juan A. Cuesta-Albertos,
Eduardo García-Portugués,
Manuel Febrero-Bande,
Wenceslao González-Manteiga
Abstract:
We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality. The we…
▽ More
We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The computation of the test in practice involves calibration by wild bootstrap resampling and the combination of several p-values, arising from different projections, by means of the false discovery rate method. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives. The software provided implements the tests and allows the replication of simulations and data applications.
△ Less
Submitted 21 September, 2020; v1 submitted 29 January, 2017;
originally announced January 2017.
-
Bandwidth selection for kernel density estimation with length-biased data
Authors:
María Isabel Borrajo,
Wenceslao González-Manteiga,
María Dolores Martínez-Miranda
Abstract:
Length-biased data are a particular case of weighted data, which arise in many situations: biomedicine, quality control or epidemiology among others. In this paper we study the theoretical properties of kernel density estimation in the context of length-biased data, proposing two consistent bootstrap methods that we use for bandwidth selection. Apart from the bootstrap bandwidth selectors we sugge…
▽ More
Length-biased data are a particular case of weighted data, which arise in many situations: biomedicine, quality control or epidemiology among others. In this paper we study the theoretical properties of kernel density estimation in the context of length-biased data, proposing two consistent bootstrap methods that we use for bandwidth selection. Apart from the bootstrap bandwidth selectors we suggest a rule-of-thumb. These bandwidth selection proposals are compared with a least-squares cross-validation method. A simulation study is accomplished to understand the behaviour of the procedures in finite samples.
△ Less
Submitted 13 December, 2016; v1 submitted 17 June, 2016;
originally announced June 2016.
-
A lack-of-fit test for quantile regression models with high-dimensional covariates
Authors:
Mercedes Conde-Amboage,
César Sánchez-Sellero,
Wenceslao González-Manteiga
Abstract:
We propose a new lack-of-fit test for quantile regression models that is suitable even with high-dimensional covariates. The test is based on the cumulative sum of residuals with respect to unidimensional linear projections of the covariates. The test adapts concepts proposed by Escanciano (Econometric Theory, 22, 2006) to cope with many covariates to the test proposed by He and Zhu (Journal of th…
▽ More
We propose a new lack-of-fit test for quantile regression models that is suitable even with high-dimensional covariates. The test is based on the cumulative sum of residuals with respect to unidimensional linear projections of the covariates. The test adapts concepts proposed by Escanciano (Econometric Theory, 22, 2006) to cope with many covariates to the test proposed by He and Zhu (Journal of the American Statistical Association, 98, 2003). To approximate the critical values of the test, a wild bootstrap mechanism is used, similar to that proposed by Feng et al. (Biometrika, 98, 2011). An extensive simulation study was undertaken that shows the good performance of the new test, particularly when the dimension of the covariate is high. The test can also be applied and performs well under heteroscedastic regression models. The test is illustrated with real data about the economic growth of 161 countries.
△ Less
Submitted 20 February, 2015;
originally announced February 2015.
-
Testing parametric models in linear-directional regression
Authors:
Eduardo García-Portugués,
Ingrid Van Keilegom,
Rosa M. Crujeiras,
Wenceslao González-Manteiga
Abstract:
This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior o…
▽ More
This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.
△ Less
Submitted 20 September, 2020; v1 submitted 1 September, 2014;
originally announced September 2014.
-
Central limit theorems for directional and linear random variables with applications
Authors:
Eduardo García-Portugués,
Rosa M. Crujeiras,
Wenceslao González-Manteiga
Abstract:
A central limit theorem for the integrated squared error of the directional-linear kernel density estimator is established. The result enables the construction and analysis of two testing procedures based on squared loss: a nonparametric independence test for directional and linear random variables and a goodness-of-fit test for parametric families of directional-linear densities. Limit distributi…
▽ More
A central limit theorem for the integrated squared error of the directional-linear kernel density estimator is established. The result enables the construction and analysis of two testing procedures based on squared loss: a nonparametric independence test for directional and linear random variables and a goodness-of-fit test for parametric families of directional-linear densities. Limit distributions for both test statistics, and a consistent bootstrap strategy for the goodness-of-fit test, are developed for the directional-linear case and adapted to the directional-directional setting. Finite sample performance for the goodness-of-fit test is illustrated in a simulation study. This test is also applied to datasets from biology and environmental sciences.
△ Less
Submitted 20 September, 2020; v1 submitted 27 February, 2014;
originally announced February 2014.
-
A comparative simulation study of data-driven methods for estimating density level sets
Authors:
Paula Saavedra-Nieves,
Wenceslao González-Manteiga,
Alberto Rodríguez-Casal
Abstract:
Density level sets are mainly estimated using one of three methodologies: plug-in, excess mass, or a hybrid approach. The plug-in methods are based on replacing the unknown density by some nonparametric estimator, usually the kernel. Thus, the bandwidth selection is a fundamental problem from a practical point of view. Recently, specific selectors for level sets have been proposed. However, if som…
▽ More
Density level sets are mainly estimated using one of three methodologies: plug-in, excess mass, or a hybrid approach. The plug-in methods are based on replacing the unknown density by some nonparametric estimator, usually the kernel. Thus, the bandwidth selection is a fundamental problem from a practical point of view. Recently, specific selectors for level sets have been proposed. However, if some a priori information about the geometry of the level set is available, then excess mass algorithms can be useful. In this case, a density estimator is not necessary, and the problem of bandwidth selection can be avoided. The third methodology is a hybrid of the others. As in the excess mass method, it assumes a mild geometric restriction on the level set and, like the plug-in approach, requires a pilot nonparametric estimator of the density. One interesting open question concerns the practical performance of these methods. In this work, existing methods are reviewed, and two new hybrid algorithms are proposed. Their practical behaviour is compared through extensive simulations.
△ Less
Submitted 5 March, 2014; v1 submitted 3 February, 2014;
originally announced February 2014.
-
A test for directional-linear independence, with applications to wildfire orientation and size
Authors:
Eduardo García-Portugués,
Ana M. G. Barros,
Rosa M. Crujeiras,
Wenceslao González-Manteiga,
J. M. C. Pereira
Abstract:
The relation between wildfire orientation and size is analyzed by means of a nonparametric test for directional-linear independence. The test statistic is designed for assessing the independence between two random variables of different nature, specifically directional (fire orientation, circular or spherical, as particular cases) and linear (fire size measured as burnt area, scalar), based on a d…
▽ More
The relation between wildfire orientation and size is analyzed by means of a nonparametric test for directional-linear independence. The test statistic is designed for assessing the independence between two random variables of different nature, specifically directional (fire orientation, circular or spherical, as particular cases) and linear (fire size measured as burnt area, scalar), based on a directional-linear nonparametric kernel density estimator. In order to apply the proposed methodology in practice, a resampling procedure based on permutations and bootstrap is provided. The finite sample performance of the test is assessed by a simulation study, comparing its behavior with other classical tests for the circular-linear case. Finally, the test is applied to analyze wildfire data from Portugal.
△ Less
Submitted 20 September, 2020; v1 submitted 11 January, 2013;
originally announced January 2013.
-
Kernel density estimation for directional-linear data
Authors:
Eduardo García-Portugués,
Rosa M. Crujeiras,
Wenceslao González-Manteiga
Abstract:
A nonparametric kernel density estimator for directional-linear data is introduced. The proposal is based on a product kernel accounting for the different nature of both (directional and linear) components of the random vector. Expressions for bias, variance and Mean Integrated Squared Error (MISE) are derived, jointly with an asymptotic normality result for the proposed estimator. For some partic…
▽ More
A nonparametric kernel density estimator for directional-linear data is introduced. The proposal is based on a product kernel accounting for the different nature of both (directional and linear) components of the random vector. Expressions for bias, variance and Mean Integrated Squared Error (MISE) are derived, jointly with an asymptotic normality result for the proposed estimator. For some particular distributions, an explicit formula for the MISE is obtained and compared with its asymptotic version, both for directional and directional-linear kernel density estimators. In this same setting a closed expression for the bootstrap MISE is also derived.
△ Less
Submitted 20 September, 2020; v1 submitted 11 October, 2012;
originally announced October 2012.
-
Bootstrap independence test for functional linear models
Authors:
Wenceslao González-Manteiga,
Gil González-Rodríguez,
Adela Martínez-Calvo,
Eduardo García-Portugués
Abstract:
Functional data have been the subject of many research works over the last years. Functional regression is one of the most discussed issues. Specifically, significant advances have been made for functional linear regression models with scalar response. Let $(\mathcal{H},<\cdot,\cdot>)$ be a separable Hilbert space. We focus on the model $Y=<Θ,X>+b+\varepsilon$, where $Y$ and $\varepsilon$ are real…
▽ More
Functional data have been the subject of many research works over the last years. Functional regression is one of the most discussed issues. Specifically, significant advances have been made for functional linear regression models with scalar response. Let $(\mathcal{H},<\cdot,\cdot>)$ be a separable Hilbert space. We focus on the model $Y=<Θ,X>+b+\varepsilon$, where $Y$ and $\varepsilon$ are real random variables, $X$ is an $\mathcal{H}$-valued random element, and the model parameters $b$ and $Θ$ are in $\mathbb{R}$ and $\mathcal{H}$, respectively. Furthermore, the error satisfies that $E(\varepsilon|X)=0$ and $E(\varepsilon^2|X)=σ^2<\infty$. A consistent bootstrap method to calibrate the distribution of statistics for testing $H_0: Θ=0$ versus $H_1: Θ\neq 0$ is developed. The asymptotic theory, as well as a simulation study and a real data application illustrating the usefulness of our proposed bootstrap in practice, is presented.
△ Less
Submitted 20 September, 2020; v1 submitted 3 October, 2012;
originally announced October 2012.
-
Exploring wind direction and SO2 concentration by circular-linear density estimation
Authors:
Eduardo García-Portugués,
Rosa M. Crujeiras,
Wenceslao González-Manteiga
Abstract:
The study of environmental problems usually requires the description of variables with different nature and the assessment of relations between them. In this work, an algorithm for flexible estimation of the joint density for a circular-linear variable is proposed. The method is applied for exploring the relation between wind direction and SO2 concentration in a monitoring station close to a power…
▽ More
The study of environmental problems usually requires the description of variables with different nature and the assessment of relations between them. In this work, an algorithm for flexible estimation of the joint density for a circular-linear variable is proposed. The method is applied for exploring the relation between wind direction and SO2 concentration in a monitoring station close to a power plant located in Galicia (NW-Spain), in order to compare the effectiveness of precautionary measures for pollutants reduction in two different years.
△ Less
Submitted 20 September, 2020; v1 submitted 23 August, 2012;
originally announced August 2012.
-
A goodness-of-fit test for the functional linear model with scalar response
Authors:
Eduardo García-Portugués,
Wenceslao González-Manteiga,
Manuel Febrero-Bande
Abstract:
In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with scalar response is proposed. The test is based on a generalization to the functional framework of a previous one, designed for the goodness-of-fit of regression models with multivariate covariates using random projections. The test statistic is easy to compute using geometrical and matrix arguments, and…
▽ More
In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with scalar response is proposed. The test is based on a generalization to the functional framework of a previous one, designed for the goodness-of-fit of regression models with multivariate covariates using random projections. The test statistic is easy to compute using geometrical and matrix arguments, and simple to calibrate in its distribution by a wild bootstrap on the residuals. The finite sample properties of the test are illustrated by a simulation study for several types of basis and under different alternatives. Finally, the test is applied to two datasets for checking the assumption of the functional linear model and a graphical tool is introduced. Supplementary materials are available online.
△ Less
Submitted 20 September, 2020; v1 submitted 28 May, 2012;
originally announced May 2012.