-
Edgewise outliers of network indexed signals
Authors:
Christopher Rieser,
Anne Ruiz-Gazen,
Christine Thomas-Agnan
Abstract:
We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes.
In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection ru…
▽ More
We consider models for network indexed multivariate data involving a dependence between variables as well as across graph nodes.
In the framework of these models, we focus on outliers detection and introduce the concept of edgewise outliers. For this purpose, we first derive the distribution of some sums of squares, in particular squared Mahalanobis distances that can be used to fix detection rules and thresholds for outlier detection. We then propose a robust version of the deterministic MCD algorithm that we call edgewise MCD. An application on simulated data shows the interest of taking the dependence structure into account. We also illustrate the utility of the proposed method with a real data set.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Spatial Point Pattern Analysis of the Unidentified Aerial Phenomena in France
Authors:
Thibault Laurent,
Christine Thomas-Agnan,
Michaƫl Vaillant
Abstract:
We model the unidentified aerial phenomena observed in France during the last 60 years as a spatial point pattern. We use some public information such as population density, rate of moisture or presence of airports to model the intensity of the unidentified aerial phenomena. Spatial exploratory data analysis is a first approach to appreciate the link between the intensity of the unidentified aeria…
▽ More
We model the unidentified aerial phenomena observed in France during the last 60 years as a spatial point pattern. We use some public information such as population density, rate of moisture or presence of airports to model the intensity of the unidentified aerial phenomena. Spatial exploratory data analysis is a first approach to appreciate the link between the intensity of the unidentified aerial phenomena and the covariates. We then fit an inhomogeneous spatial Poisson process model with covariates. We find that the significant variables are the population density, the presence of the factories with a nuclear risk and contaminated land, and the rate of moisture. The analysis of the residuals shows that some parts of France (the Belgian border, the tip of Britany, some parts in the SouthEast , the Picardie and Haute-Normandie regions, the Loiret and Corr eze departments) present a high value of local intensity which are not explained by our model.
△ Less
Submitted 2 September, 2015;
originally announced September 2015.
-
Accuracy of areal interpolation methods for count data
Authors:
Van Huyen Do,
Christine Thomas-Agnan,
Anne Vanhems
Abstract:
The combination of several socio-economic data bases originating from different administrative sources collected on several different partitions of a geographic zone of interest into administrative units induces the so called areal interpolation problem. This problem is that of allocating the data from a set of source spatial units to a set of target spatial units. A particular case of that proble…
▽ More
The combination of several socio-economic data bases originating from different administrative sources collected on several different partitions of a geographic zone of interest into administrative units induces the so called areal interpolation problem. This problem is that of allocating the data from a set of source spatial units to a set of target spatial units. A particular case of that problem is the re-allocation to a single target partition which is a regular grid. At the European level for example, the EU directive 'INSPIRE', or INfrastructure for SPatial InfoRmation, encourages the states to provide socio-economic data on a common grid to facilitate economic studies across states. In the literature, there are three main types of such techniques: proportional weighting schemes, smoothing techniques and regression based interpolation. We propose a stochastic model based on Poisson point patterns to study the statistical accuracy of these techniques for regular grid targets in the case of count data. The error depends on the nature of the target variable and its correlation with the auxiliary variable. For simplicity, we restrict attention to proportional weighting schemes and Poisson regression based methods. Our conclusion is that there is no technique which always dominates.
△ Less
Submitted 29 January, 2015;
originally announced January 2015.