-
A new adaptive local polynomial density estimation procedure on complicated domains
Authors:
Karine Bertin,
Nicolas Klutchnikoff,
Frédéric Ouimet
Abstract:
This paper presents a novel approach for pointwise estimation of multivariate density functions on known domains of arbitrary dimensions using nonparametric local polynomial estimators. Our method is highly flexible, as it applies to both simple domains, such as open connected sets, and more complicated domains that are not star-shaped around the point of estimation. This enables us to handle doma…
▽ More
This paper presents a novel approach for pointwise estimation of multivariate density functions on known domains of arbitrary dimensions using nonparametric local polynomial estimators. Our method is highly flexible, as it applies to both simple domains, such as open connected sets, and more complicated domains that are not star-shaped around the point of estimation. This enables us to handle domains with sharp concavities, holes, and local pinches, such as polynomial sectors. Additionally, we introduce a data-driven selection rule based on the general ideas of Goldenshluger and Lepski. Our results demonstrate that the local polynomial estimators are minimax under a $L^2$ risk across a wide range of Hölder-type functional classes. In the adaptive case, we provide oracle inequalities and explicitly determine the convergence rate of our statistical procedure. Simulations on polynomial sectors show that our oracle estimates outperform those of the most popular alternative method, found in the sparr package for the R software. Our statistical procedure is implemented in an online R package which is readily accessible.
△ Less
Submitted 15 May, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
Learning the regularity of multivariate functional data
Authors:
Omar Kassi,
Nicolas Klutchnikoff,
Valentin Patilea
Abstract:
Combining information both within and between sample realizations, we propose a simple estimator for the local regularity of surfaces in the functional data framework. The independently generated surfaces are measured with errors at possibly random discrete times. Non-asymptotic exponential bounds for the concentration of the regularity estimators are derived. An indicator for anisotropy is propos…
▽ More
Combining information both within and between sample realizations, we propose a simple estimator for the local regularity of surfaces in the functional data framework. The independently generated surfaces are measured with errors at possibly random discrete times. Non-asymptotic exponential bounds for the concentration of the regularity estimators are derived. An indicator for anisotropy is proposed and an exponential bound of its risk is derived. Two applications are proposed. We first consider the class of multi-fractional, bi-dimensional, Brownian sheets with domain deformation, and study the nonparametric estimation of the deformation. As a second application, we build minimax optimal, bivariate kernel estimators for the reconstruction of the surfaces.
△ Less
Submitted 2 October, 2023; v1 submitted 26 July, 2023;
originally announced July 2023.
-
Adaptive functional principal components analysis
Authors:
Sunny G. W. Wang,
Valentin Patilea,
Nicolas Klutchnikoff
Abstract:
Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being compu…
▽ More
Functional data analysis almost always involves smoothing discrete observations into curves, because they are never observed in continuous time and rarely without error. Although smoothing parameters affect the subsequent inference, data-driven methods for selecting these parameters are not well-developed, frustrated by the difficulty of using all the information shared by curves while being computationally efficient. On the one hand, smoothing individual curves in an isolated, albeit sophisticated way, ignores useful signals present in other curves. On the other hand, bandwidth selection by automatic procedures such as cross-validation after pooling all the curves together quickly become computationally unfeasible due to the large number of data points. In this paper we propose a new data-driven, adaptive kernel smoothing, specifically tailored for functional principal components analysis through the derivation of sharp, explicit risk bounds for the eigen-elements. The minimization of these quadratic risk bounds provide refined, yet computationally efficient bandwidth rules for each eigen-element separately. Both common and independent design cases are allowed. Rates of convergence for the estimators are derived. An extensive simulation study, designed in a versatile manner to closely mimic the characteristics of real data sets supports our methodological contribution. An illustration on a real data application is provided.
△ Less
Submitted 16 April, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Optimal 1-Wasserstein Distance for WGANs
Authors:
Arthur Stéphanovitch,
Ugo Tanielian,
Benoît Cadre,
Nicolas Klutchnikoff,
Gérard Biau
Abstract:
The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and d…
▽ More
The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
△ Less
Submitted 5 October, 2023; v1 submitted 8 January, 2022;
originally announced January 2022.
-
Clustering multivariate functional data using unsupervised binary trees
Authors:
Steven Golovkine,
Nicolas Klutchnikoff,
Valentin Patilea
Abstract:
We propose a model-based clustering algorithm for a general class of functional data for which the components could be curves or images. The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain. The idea is to build a set of binary trees by recursive splitting of the observations. The number of groups are determined in a…
▽ More
We propose a model-based clustering algorithm for a general class of functional data for which the components could be curves or images. The random functional data realizations could be measured with error at discrete, and possibly random, points in the definition domain. The idea is to build a set of binary trees by recursive splitting of the observations. The number of groups are determined in a data-driven way. The new algorithm provides easily interpretable results and fast predictions for online data sets. Results on simulated datasets reveal good performance in various complex settings. The methodology is applied to the analysis of vehicle trajectories on a German roundabout.
△ Less
Submitted 24 September, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.