-
Proposal of a general framework to categorize continuous predictor variables
Authors:
Irantzu Barrio,
Javier Roca-Pardiñas,
Cristobal Esteban,
Maria Durban
Abstract:
The use of discretized variables in the development of prediction models is a common practice, in part because the decision-making process is more natural when it is based on rules created from segmented models. Although this practice is perhaps more common in medicine, it is extensible to any area of knowledge where a predictive model helps in decision-making. Therefore, providing researchers wit…
▽ More
The use of discretized variables in the development of prediction models is a common practice, in part because the decision-making process is more natural when it is based on rules created from segmented models. Although this practice is perhaps more common in medicine, it is extensible to any area of knowledge where a predictive model helps in decision-making. Therefore, providing researchers with a useful and valid categorization method could be a relevant issue when develo** prediction models. In this paper, we propose a new general methodology that can be applied to categorize a predictor variable in any regression model where the response variable belongs to the exponential family distribution. Furthermore, it can be applied in any multivariate context, allowing to categorize more than one continuous covariate simultaneously. In addition, a computationally very efficient method is proposed to obtain the optimal number of categories, based on a pseudo-BIC proposal. Several simulation studies have been conducted in which the efficiency of the method with respect to both the location and the number of estimated cut-off points is shown. Finally, the categorization proposal has been applied to a real data set of 543 patients with chronic obstructive pulmonary disease from Galdakao Hospital's five outpatient respiratory clinics, who were followed up for 10 years. We applied the proposed methodology to jointly categorize the continuous variables six-minute walking test and forced expiratory volume in one second in a multiple Poisson generalized additive model for the response variable rate of the number of hospital admissions by years of follow-up. The location and number of cut-off points obtained were clinically validated as being in line with the categorizations used in the literature.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Modelling physical activity profiles in COPD patients: a fully functional approach to variable domain functional regression models
Authors:
Pavel Hernandez-Amaro,
Maria Durban,
M. Carmen Aguilera-Morillo,
Cristobal Esteban Gonzalez,
Inmaculada Arostegui
Abstract:
Physical activity plays a significant role in the well-being of individuals with Chronic obstructive Pulmonary Disease (COPD). Specifically, it has been directly associated with changes in hospitalization rates for these patients. However, previous investigations have primarily been conducted in a cross-sectional or longitudinal manner and have not considered a continuous perspective. Using the te…
▽ More
Physical activity plays a significant role in the well-being of individuals with Chronic obstructive Pulmonary Disease (COPD). Specifically, it has been directly associated with changes in hospitalization rates for these patients. However, previous investigations have primarily been conducted in a cross-sectional or longitudinal manner and have not considered a continuous perspective. Using the telEPOC program we use telemonitoring data to analyze the impact of physical activity adopting a functional data approach. However, Traditional functional data methods, including functional regression models, typically assume a consistent data domain. However, the data in the telEPOC program exhibits variable domains, presenting a challenge since the majority of functional data methods, are based on the fact that data are observed in the same domain. To address this challenge, we introduce a novel fully functional methodology tailored to variable domain functional data, eliminating the need for data alignment, which can be computationally taxing. Although models designed for variable domain data are relatively scarce and may have inherent limitations in their estimation methods, our approach circumvents these issues. We substantiate the effectiveness of our methodology through a simulation study, comparing our results with those obtained using established methodologies. Finally, we apply our methodology to analyze the impact of physical activity in COPD patients using the telEPOC program's data. Software for our method is available in the form of R code on request at \url{https://github.com/Pavel-Hernadez-Amaro/V.D.F.R.M-new-estimation-approach.git}.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Multidimensional Adaptive Penalised Splines with Application to Neurons' Activity Studies
Authors:
María Xosé Rodríguez-Álvarez,
María Durbán,
Paul H. C. Eilers,
Dae-** Lee,
Francisco Gonzalez
Abstract:
P-spline models have achieved great popularity both in statistical and in applied research. A possible drawback of P-spline is that they assume a smooth transition of the covariate effect across its whole domain. In some practical applications, however, it is desirable and needed to adapt smoothness locally to the data, and adaptive P-splines have been suggested. Yet, the extra flexibility afforde…
▽ More
P-spline models have achieved great popularity both in statistical and in applied research. A possible drawback of P-spline is that they assume a smooth transition of the covariate effect across its whole domain. In some practical applications, however, it is desirable and needed to adapt smoothness locally to the data, and adaptive P-splines have been suggested. Yet, the extra flexibility afforded by adaptive P-spline models is obtained at the cost of a high computational burden, especially in a multidimensional setting. Furthermore, to the best of our knowledge, the literature lacks proposals for adaptive P-splines in more than two dimensions. Motivated by the need for analysing data derived from experiments conducted to study neurons' activity in the visual cortex, this work presents a novel locally adaptive anisotropic P-spline model in two (e.g., space) and three (space and time) dimensions. Estimation is based on the recently proposed SOP (Separation of Overlap** Precision matrices) method, which provides the speed we look for. The practical performance of the proposal is evaluated through simulations, and comparisons with alternative methods are reported. In addition to the spatio-temporal analysis of the data that motivated this work, we also discuss an application in two dimensions on the absenteeism of workers.
△ Less
Submitted 1 December, 2020; v1 submitted 8 October, 2020;
originally announced October 2020.
-
On the estimation of variance parameters in non-standard generalised linear mixed models: Application to penalised smoothing
Authors:
María Xosé Rodríguez-Álvarez,
Maria Durban,
Dae-** Lee,
Paul H. C. Eilers
Abstract:
We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (1977)'s work, but it is able to deal with models that have a precision matrix for the random-effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (Separation of Overlap** Precision…
▽ More
We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (1977)'s work, but it is able to deal with models that have a precision matrix for the random-effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (Separation of Overlap** Precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.
△ Less
Submitted 12 June, 2018; v1 submitted 22 January, 2018;
originally announced January 2018.
-
Fast estimation of multidimensional adaptive P-spline models
Authors:
María Xosé Rodríguez-Álvarez,
María Durbán,
Dae-** Lee,
Paul H. C. Eilers
Abstract:
A fast and stable algorithm for estimating multidimensional adaptive P-spline models is presented. We call it as Separation of Overlap** Penalties (SOP) as it is an extension of the \textit{Separation of Anisotropic Penalties} (SAP) algorithm. SAP was originally derived for the estimation of the smoothing parameters of a multidimensional tensor product P-spline model with anisotropic penalties.
A fast and stable algorithm for estimating multidimensional adaptive P-spline models is presented. We call it as Separation of Overlap** Penalties (SOP) as it is an extension of the \textit{Separation of Anisotropic Penalties} (SAP) algorithm. SAP was originally derived for the estimation of the smoothing parameters of a multidimensional tensor product P-spline model with anisotropic penalties.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Spatio-temporal adaptive penalized splines with application to Neuroscience
Authors:
María Xosé Rodríguez-Álvarez,
María Durbán,
Dae-** Lee,
Paul H. C. Eilers
Abstract:
Data analysed here derive from experiments conducted to study neurons' activity in the visual cortex of behaving monkeys. We consider a spatio-temporal adaptive penalized spline (P-spline) approach for modelling the firing rate of visual neurons. To the best of our knowledge, this is the first attempt in the statistical literature for locally adaptive smoothing in three dimensions. Estimation is b…
▽ More
Data analysed here derive from experiments conducted to study neurons' activity in the visual cortex of behaving monkeys. We consider a spatio-temporal adaptive penalized spline (P-spline) approach for modelling the firing rate of visual neurons. To the best of our knowledge, this is the first attempt in the statistical literature for locally adaptive smoothing in three dimensions. Estimation is based on the Separation of Overlap** Penalties (SOP) algorithm, which provides the stability and speed we look for.
△ Less
Submitted 30 December, 2016; v1 submitted 21 October, 2016;
originally announced October 2016.