-
On the use of splines for representing ordered factors
Authors:
Adelchi Azzalini
Abstract:
In the context of regression-type statistical models, the inclusion of some ordered factors among the explanatory variables requires the conversion of qualitative levels to numeric components of the linear predictor. The present note represent a follow-up of a methodology proposed by Azzalini (2023} for constructing numeric scores assigned to the factors levels. The aim of the present supplement i…
▽ More
In the context of regression-type statistical models, the inclusion of some ordered factors among the explanatory variables requires the conversion of qualitative levels to numeric components of the linear predictor. The present note represent a follow-up of a methodology proposed by Azzalini (2023} for constructing numeric scores assigned to the factors levels. The aim of the present supplement it to allow additional flexibility of the map** from ordered levels and numeric scores.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
The information matrix of the bivariate extended skew-normal distribution
Authors:
Stefano Franco,
Adelchi Azzalini
Abstract:
For the extended skew-normal distribution, which represents an extension of the normal (or Gaussian) distribution, we focus on the properties of the log-likelihood function and derived quantities in the the bivariate case. Specifically, we derive explicit expressions for the score function and the information matrix, in the observed and the expected form; these do not appear to have been examined…
▽ More
For the extended skew-normal distribution, which represents an extension of the normal (or Gaussian) distribution, we focus on the properties of the log-likelihood function and derived quantities in the the bivariate case. Specifically, we derive explicit expressions for the score function and the information matrix, in the observed and the expected form; these do not appear to have been examined before in the literature. Corresponding computing code in R language is provided, which implements the formal expressions.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
On the use of ordered factors as explanatory variables
Authors:
Adelchi Azzalini
Abstract:
Consider a regression or some regression-type model for a certain response variable where the linear predictor includes an ordered factor among the explanatory variables. The inclusion of a factor of this type can take place is a few different ways, discussed in the pertaining literature. The present contribution proposes a different way of tackling this problem, by constructing a numeric variable…
▽ More
Consider a regression or some regression-type model for a certain response variable where the linear predictor includes an ordered factor among the explanatory variables. The inclusion of a factor of this type can take place is a few different ways, discussed in the pertaining literature. The present contribution proposes a different way of tackling this problem, by constructing a numeric variable in an alternative way with respect to the current methodology. The proposed techniques appears to retain the data fitting capability of the existing methodology, but with a simpler interpretation of the model components.
△ Less
Submitted 22 November, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
On the non-identifiability of unified skew-normal distributions
Authors:
Kesen Wang,
Reinaldo B. Arellano-Valle,
Adelchi Azzalini,
Marc G. Genton
Abstract:
In this note, we investigate the non-identifiability of the multivariate unified skew-normal distribution under permutation of its latent variables. We show that the non-identifiability issue also holds with other parametrizations and extends to the family of unified skew-elliptical distributions and more generally to selection distibutions. We provide several suggestions to make the unified skew-…
▽ More
In this note, we investigate the non-identifiability of the multivariate unified skew-normal distribution under permutation of its latent variables. We show that the non-identifiability issue also holds with other parametrizations and extends to the family of unified skew-elliptical distributions and more generally to selection distibutions. We provide several suggestions to make the unified skew-normal model identifiable and describe various sub-models that are identifiable.
△ Less
Submitted 22 June, 2023; v1 submitted 20 April, 2023;
originally announced April 2023.
-
Some properties of the unified skew-normal distribution
Authors:
Reinaldo B. Arellano-Valle,
Adelchi Azzalini
Abstract:
For the family of multivariate probability distributions variously denoted as unified skew-normal, closed skew-normal and other names, a number of properties are already known, but many others are not, even some basic ones. The present contribution aims at filling some of the missing gaps. Specifically, the moments up to the fourth order are obtained, and from here the expressions of the Mardia's…
▽ More
For the family of multivariate probability distributions variously denoted as unified skew-normal, closed skew-normal and other names, a number of properties are already known, but many others are not, even some basic ones. The present contribution aims at filling some of the missing gaps. Specifically, the moments up to the fourth order are obtained, and from here the expressions of the Mardia's measures of multivariate skewness and kurtosis. Other results concern the property of log-concavity of the distribution, and closure with respect to conditioning on intervals.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
A formulation for continuous mixtures of multivariate normal distributions
Authors:
Reinaldo B. Arellano-Valle,
Adelchi Azzalini
Abstract:
Several formulations have long existed in the literature in the form of continuous mixtures of normal variables where a mixing variable operates on the mean or on the variance or on both the mean and the variance of a multivariate normal variable, by changing the nature of these basic constituents from constants to random quantities. More recently, other mixture-type constructions have been introd…
▽ More
Several formulations have long existed in the literature in the form of continuous mixtures of normal variables where a mixing variable operates on the mean or on the variance or on both the mean and the variance of a multivariate normal variable, by changing the nature of these basic constituents from constants to random quantities. More recently, other mixture-type constructions have been introduced, where the core random component, on which the mixing operation operates, is not necessarily normal. The main aim of the present work is to show that many existing constructions can be encompassed by a formulation where normal variables are mixed using two univariate random variables. For this formulation, we derive various general properties. Within the proposed framework, it is also simpler to formulate new proposals of parametric families and we provide a few such instances. At the same time, the exposition provides a review of the theme of normal mixtures.
△ Less
Submitted 29 March, 2020;
originally announced March 2020.
-
Some computational aspects of maximum likelihood estimation of the skew-$t$ distribution
Authors:
Adelchi Azzalini,
Mahdi Salehi
Abstract:
Since its introduction, the skew-$t$ distribution has received much attention in the literature both for the study of theoretical properties and as a model for data fitting in empirical work. A major motivation for this interest is the high degree of flexibility of the distribution as the parameters span their admissible range, with ample variation of the associated measures of skewness and kurtos…
▽ More
Since its introduction, the skew-$t$ distribution has received much attention in the literature both for the study of theoretical properties and as a model for data fitting in empirical work. A major motivation for this interest is the high degree of flexibility of the distribution as the parameters span their admissible range, with ample variation of the associated measures of skewness and kurtosis. While this high flexibility allows to adapt a member of the parametric family to a wide range of data patterns, it also implies that parameter estimation is a more delicate operation with respect to less flexible parametric families, given that a small variation of the parameters can have a substantial effect on the selected distribution. In this context, the aim of the present contribution is to deal with some computational aspects of maximum likelihood estimation. A problem of interest is the possible presence of multiple local maxima of the log-likelihood function. Another one, to which most of our attention is dedicated, is the development of a quick and reliable initialization method for the subsequent numerical maximization of the log-likelihood function, both in the univariate and the multivariate context.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Yet another skew-elliptical family but of a different kind: return to Lemma 1
Authors:
Adelchi Azzalini,
Giuliana Regoli
Abstract:
In the context of modulated-symmetry distributions, there exist various forms of skew-elliptical families. We present yet another one, but with an unusual feature: the modulation factor of the baseline elliptical density is represented by a distribution function with an argument which is not an odd function, as it occurs instead with the overwhelming majority of similar formulations, not only with…
▽ More
In the context of modulated-symmetry distributions, there exist various forms of skew-elliptical families. We present yet another one, but with an unusual feature: the modulation factor of the baseline elliptical density is represented by a distribution function with an argument which is not an odd function, as it occurs instead with the overwhelming majority of similar formulations, not only with other skew-elliptical families. The proposal is obtained by going back to the use of Lemma~1 of Azzalini and Capitanio (1999), which can be seen as the general frame for a vast number of existing formulations, and use it on a different route. The broader target is to show that this `mother lemma' can still generate novel progeny.
△ Less
Submitted 10 October, 2017;
originally announced October 2017.
-
Combining local and global smoothing in multivariate density estimation
Authors:
Adelchi Azzalini
Abstract:
Non-parametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage.
Non-parametric estimation of a multivariate density estimation is tackled via a method which combines traditional local smoothing with a form of global smoothing but without imposing a rigid structure. Simulation work delivers encouraging indications on the effectiveness of the method. An application to density-based clustering illustrates a possible usage.
△ Less
Submitted 7 October, 2016;
originally announced October 2016.
-
Sample selection models for discrete and other non-Gaussian response variables
Authors:
Adelchi Azzalini,
Hyoung-Moon Kim,
Hea-Jung Kim
Abstract:
Consider observation of a phenomenon of interest subject to selective sampling due to a censoring mechanism regulated by some other variable. In this context, an extensive literature exists linked to the so-called Heckman selection model. A great deal of this work has been developed under Gaussian assumption of the underlying probability distributions; considerably less work has dealt with other d…
▽ More
Consider observation of a phenomenon of interest subject to selective sampling due to a censoring mechanism regulated by some other variable. In this context, an extensive literature exists linked to the so-called Heckman selection model. A great deal of this work has been developed under Gaussian assumption of the underlying probability distributions; considerably less work has dealt with other distributions. We examine a general construction which encompasses a variety of distributions and allows various options of the selection mechanism, focusing especially on the case of discrete response. Inferential methods based on the pertaining likelihood function are developed.
△ Less
Submitted 13 September, 2016;
originally announced September 2016.
-
On nomenclature for, and the relative merits of, two formulations of skew distributions
Authors:
Adelchi Azzalini,
Ryan P. Browne,
Marc G. Genton,
Paul D. McNicholas
Abstract:
We examine some distributions used extensively within the model-based clustering literature in recent years, paying special attention to} claims that have been made about their relative efficacy. Theoretical arguments are provided as well as real data examples.
We examine some distributions used extensively within the model-based clustering literature in recent years, paying special attention to} claims that have been made about their relative efficacy. Theoretical arguments are provided as well as real data examples.
△ Less
Submitted 3 December, 2015; v1 submitted 21 February, 2014;
originally announced February 2014.
-
Clustering Via Nonparametric Density Estimation: the R Package pdfCluster
Authors:
Adelchi Azzalini,
Giovanna Menardi
Abstract:
The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets.
The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets.
△ Less
Submitted 28 January, 2013;
originally announced January 2013.
-
On the spatial correlation between areas of high coseismic slip and aftershock clusters of the Maule earthquake Mw=8.8
Authors:
Javier E. Contreras-Reyes,
Adelchi Azzalini
Abstract:
We study the spatial distribution of clusters associated to the aftershocks of the megathrust Maule earthquake MW 8.8 of 27 February 2010. We used a recent clustering method which hinges on a nonparametric estimation of the underlying probability density function to detect subsets of points forming clusters associated with high density areas. In addition, we estimate the probability density functi…
▽ More
We study the spatial distribution of clusters associated to the aftershocks of the megathrust Maule earthquake MW 8.8 of 27 February 2010. We used a recent clustering method which hinges on a nonparametric estimation of the underlying probability density function to detect subsets of points forming clusters associated with high density areas. In addition, we estimate the probability density function using a nonparametric kernel method for each of these clusters. This allows us to identify a set of regions where there is an association between frequency of events and coseismic slip. Our results suggest that high coseismic slip spatially correlates with high aftershock frequency.
△ Less
Submitted 26 January, 2013; v1 submitted 7 August, 2012;
originally announced August 2012.
-
Maximum penalized likelihood estimation for skew-normal and skew-$t$ distributions
Authors:
Adelchi Azzalini,
Reinaldo B. Arellano-Valle
Abstract:
The skew-normal and the skew-$t$ distributions are parametric families which are currently under intense investigation since they provide a more flexible formulation compared to the classical normal and $t$ distributions by introducing a parameter which regulates their skewness. While these families enjoy attractive formal properties from the probability viewpoint, a practical problem with their u…
▽ More
The skew-normal and the skew-$t$ distributions are parametric families which are currently under intense investigation since they provide a more flexible formulation compared to the classical normal and $t$ distributions by introducing a parameter which regulates their skewness. While these families enjoy attractive formal properties from the probability viewpoint, a practical problem with their usage in applications is the possibility that the maximum likelihood estimate of the parameter which regulates skewness diverges. This situation has vanishing probability for increasing sample size, but for finite samples it occurs with non-negligible probability, and its occurrence has unpleasant effects on the inferential process. Methods for overcoming this problem have been put forward both in the classical and in the Bayesian formulation, but their applicability is restricted to simple situations. We formulate a proposal based on the idea of penalized likelihood, which has connections with some of the existing methods, but it applies more generally, including in the multivariate case.
△ Less
Submitted 11 March, 2012;
originally announced March 2012.
-
Some properties of skew-symmetric distributions
Authors:
Adelchi Azzalini,
Giuliana Regoli
Abstract:
The family of skew-symmetric distributions is a wide set of probability density functions obtained by combining in a suitable form a few components which are selectable quite freely provided some simple requirements are satisfied. Intense recent work has produced several results for specific sub-families of this construction, but much less is known in general terms. The present paper explores some…
▽ More
The family of skew-symmetric distributions is a wide set of probability density functions obtained by combining in a suitable form a few components which are selectable quite freely provided some simple requirements are satisfied. Intense recent work has produced several results for specific sub-families of this construction, but much less is known in general terms. The present paper explores some questions within this framework, and provides conditions on the above-mentioned components to ensure that the final distribution enjoys specific properties.
△ Less
Submitted 21 December, 2010;
originally announced December 2010.
-
Selection models under generalized symmetry settings
Authors:
Adelchi Azzalini
Abstract:
An active stream of literature has followed up the idea of skew-elliptical densities initiated by Azzalini and Capitanio (1999). Their original formulation was based on a general lemma which is however of broader applicability than usually perceived. This note examines new directions of its use, and illustrates them with the construction of some probability distributions falling outside the family…
▽ More
An active stream of literature has followed up the idea of skew-elliptical densities initiated by Azzalini and Capitanio (1999). Their original formulation was based on a general lemma which is however of broader applicability than usually perceived. This note examines new directions of its use, and illustrates them with the construction of some probability distributions falling outside the family of the so-called skew-symmetric densities.
△ Less
Submitted 5 April, 2010; v1 submitted 29 December, 2009;
originally announced December 2009.
-
Distributions generated by perturbation of symmetry with emphasis on a multivariate skew $t$ distribution
Authors:
Adelchi Azzalini,
Antonella Capitanio
Abstract:
A fairly general procedure is studied to perturbate a multivariate density satisfying a weak form of multivariate symmetry, and to generate a whole set of non-symmetric densities. The approach is general enough to encompass a number of recent proposals in the literature, variously related to the skew normal distribution. The special case of skew elliptical densities is examined in detail, establ…
▽ More
A fairly general procedure is studied to perturbate a multivariate density satisfying a weak form of multivariate symmetry, and to generate a whole set of non-symmetric densities. The approach is general enough to encompass a number of recent proposals in the literature, variously related to the skew normal distribution. The special case of skew elliptical densities is examined in detail, establishing connections with existing similar work. The final part of the paper specializes further to a form of multivariate skew $t$ density. Likelihood inference for this distribution is examined, and it is illustrated with numerical examples.
△ Less
Submitted 12 November, 2009;
originally announced November 2009.
-
Statistical applications of the multivariate skew-normal distribution
Authors:
Adelchi Azzalini,
Antonella Capitanio
Abstract:
Azzalini & Dalla Valle (1996) have recently discussed the multivariate skew-normal distribution which extends the class of normal distributions by the addition of a shape parameter. The first part of the present paper examines further probabilistic properties of the distribution, with special emphasis on aspects of statistical relevance. Inferential and other statistical issues are discussed in…
▽ More
Azzalini & Dalla Valle (1996) have recently discussed the multivariate skew-normal distribution which extends the class of normal distributions by the addition of a shape parameter. The first part of the present paper examines further probabilistic properties of the distribution, with special emphasis on aspects of statistical relevance. Inferential and other statistical issues are discussed in the following part, with applications to some multivariate statistics problems, illustrated by numerical examples. Finally, a further extension is described which introduces a skewing factor of an elliptical density.
△ Less
Submitted 11 November, 2009;
originally announced November 2009.