Search | arXiv e-print repository

Error Bounds for a Kernel-Based Constrained Optimal Smoothing Approximation

Authors: Laurence Grammont, François Bachoc, Andrés F. López-Lopera

Abstract: This paper establishes error bounds for the convergence of a piecewise linear approximation of the constrained optimal smoothing problem posed in a reproducing kernel Hilbert space (RKHS). This problem can be reformulated as a Bayesian estimation problem involving a Gaussian process related to the kernel of the RKHS. Consequently, error bounds can be interpreted as a quantification of the maximum… ▽ More This paper establishes error bounds for the convergence of a piecewise linear approximation of the constrained optimal smoothing problem posed in a reproducing kernel Hilbert space (RKHS). This problem can be reformulated as a Bayesian estimation problem involving a Gaussian process related to the kernel of the RKHS. Consequently, error bounds can be interpreted as a quantification of the maximum a posteriori (MAP) accuracy. To our knowledge, no error bounds have been proposed for this type of problem so far. The convergence results are provided as a function of the grid size, the regularity of the kernel, and the distance from the kernel interpolant of the approximation to the set of constraints. Inspired by the MaxMod algorithm from recent literature, which sequentially allocates knots for the piecewise linear approximation, we conduct our analysis for non-equispaced knots. These knots are even allowed to be non-dense, which impacts the definition of the optimal smoothing solution and our error bound quantifiers. Finally, we illustrate our theorems through several numerical experiments involving constraints such as boundedness and monotonicity. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2404.17222 [pdf, other]

Asymptotic analysis for covariance parameter estimation of Gaussian processes with functional inputs

Authors: Lucas Reding, Andrés F. López-Lopera, François Bachoc

Abstract: We consider covariance parameter estimation for Gaussian processes with functional inputs. From an increasing-domain asymptotics perspective, we prove the asymptotic consistency and normality of the maximum likelihood estimator. We extend these theoretical guarantees to encompass scenarios accounting for approximation errors in the inputs, which allows robustness of practical implementations relyi… ▽ More We consider covariance parameter estimation for Gaussian processes with functional inputs. From an increasing-domain asymptotics perspective, we prove the asymptotic consistency and normality of the maximum likelihood estimator. We extend these theoretical guarantees to encompass scenarios accounting for approximation errors in the inputs, which allows robustness of practical implementations relying on conventional sampling methods or projections onto a functional basis. Loosely speaking, both consistency and normality hold when the approximation error becomes negligible, a condition that is often achieved as the number of samples or basis functions becomes large. These later asymptotic properties are illustrated through analytical examples, including one that covers the case of non-randomly perturbed grids, as well as several numerical illustrations. △ Less

Submitted 15 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2403.03540 [pdf, ps, other]

Contraction rates and projection subspace estimation with Gaussian process priors in high dimension

Authors: Elie Odin, François Bachoc, Agnès Lagnoux

Abstract: This work explores the dimension reduction problem for Bayesian nonparametric regression and density estimation. More precisely, we are interested in estimating a functional parameter $f$ over the unit ball in $\mathbb{R}^d$, which depends only on a $d_0$-dimensional subspace of $\mathbb{R}^d$, with $d_0 < d$.It is well-known that rescaled Gaussian process priors over the function space achieve sm… ▽ More This work explores the dimension reduction problem for Bayesian nonparametric regression and density estimation. More precisely, we are interested in estimating a functional parameter $f$ over the unit ball in $\mathbb{R}^d$, which depends only on a $d_0$-dimensional subspace of $\mathbb{R}^d$, with $d_0 < d$.It is well-known that rescaled Gaussian process priors over the function space achieve smoothness adaptation and posterior contraction with near minimax-optimal rates. Moreover, hierarchical extensions of this approach, equipped with subspace projection, can also adapt to the intrinsic dimension $d_0$ (\cite{Tokdar2011DimensionAdapt}).When the ambient dimension $d$ does not vary with $n$, the minimax rate remains of the order $n^{-β/(2β+d_0)}$.%When $d$ does not vary with $n$, the order of the minimax rate remains the same regardless of the ambient dimension $d$. However, this is up to multiplicative constants that can become prohibitively large when $d$ grows. The dependences between the contraction rate and the ambient dimension have not been fully explored yet and this work provides a first insight: we let the dimension $d$ grow with $n$ and, by combining the arguments of \cite{Tokdar2011DimensionAdapt} and \cite{Jiang2021VariableSelection}, we derive a growth rate for $d$ that still leads to posterior consistency with minimax rate.The optimality of this growth rate is then discussed.Additionally, we provide a set of assumptions under which consistent estimation of $f$ leads to a correct estimation of the subspace projection, assuming that $d_0$ is known. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.08269 [pdf, other]

Geometry-induced Implicit Regularization in Deep ReLU Neural Networks

Authors: Joachim Bona-Pellissier, Fran çois Malgouyres, Fran çois Bachoc

Abstract: It is well known that neural networks with many more parameters than training examples do not overfit. Implicit regularization phenomena, which are still not well understood, occur during optimization and 'good' networks are favored. Thus the number of parameters is not an adequate measure of complexity if we do not consider all possible networks but only the 'good' ones. To better understand whic… ▽ More It is well known that neural networks with many more parameters than training examples do not overfit. Implicit regularization phenomena, which are still not well understood, occur during optimization and 'good' networks are favored. Thus the number of parameters is not an adequate measure of complexity if we do not consider all possible networks but only the 'good' ones. To better understand which networks are favored during optimization, we study the geometry of the output set as parameters vary. When the inputs are fixed, we prove that the dimension of this set changes and that the local dimension, called batch functional dimension, is almost surely determined by the activation patterns in the hidden layers. We prove that the batch functional dimension is invariant to the symmetries of the network parameterization: neuron permutations and positive rescalings. Empirically, we establish that the batch functional dimension decreases during optimization. As a consequence, optimization leads to parameters with low batch functional dimensions. We call this phenomenon geometry-induced implicit regularization.The batch functional dimension depends on both the network parameters and inputs. To understand the impact of the inputs, we study, for fixed parameters, the largest attainable batch functional dimension when the inputs vary. We prove that this quantity, called computable full functional dimension, is also invariant to the symmetries of the network's parameterization, and is determined by the achievable activation patterns. We also provide a sampling theorem, showing a fast convergence of the estimation of the computable full functional dimension for a random input of increasing size. Empirically we find that the computable full functional dimension remains close to the number of parameters, which is related to the notion of local identifiability. This differs from the observed values for the batch functional dimension computed on training inputs and test inputs. The latter are influenced by geometry-induced implicit regularization. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2310.09194 [pdf, other]

Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling

Authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio, Timothé Krauth

Abstract: Probability density function estimation with weighted samples is the main foundation of all adaptive importance sampling algorithms. Classically, a target distribution is approximated either by a non-parametric model or within a parametric family. However, these models suffer from the curse of dimensionality or from their lack of flexibility. In this contribution, we suggest to use as the approxim… ▽ More Probability density function estimation with weighted samples is the main foundation of all adaptive importance sampling algorithms. Classically, a target distribution is approximated either by a non-parametric model or within a parametric family. However, these models suffer from the curse of dimensionality or from their lack of flexibility. In this contribution, we suggest to use as the approximating model a distribution parameterised by a variational autoencoder. We extend the existing framework to the case of weighted samples by introducing a new objective function. The flexibility of the obtained family of distributions makes it as expressive as a non-parametric model, and despite the very high number of parameters to estimate, this family is much more efficient in high dimension than the classical Gaussian or Gaussian mixture families. Moreover, in order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution for the variational autoencoder latent variables. We also introduce a new pre-training procedure for the variational autoencoder to find good starting weights of the neural networks to prevent as much as possible the posterior collapse phenomenon to happen. At last, we explicit how the resulting distribution can be combined with importance sampling, and we exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension on two multimodal problems. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 20 pages, 5 figures

arXiv:2309.01492 [pdf, other]

Selective inference after convex clustering with $\ell_1$ penalization

Authors: François Bachoc, Cathy Maugis-Rabusseau, Pierre Neuvial

Abstract: Classical inference methods notoriously fail when applied to data-driven test hypotheses or inference targets. Instead, dedicated methodologies are required to obtain statistical guarantees for these selective inference problems. Selective inference is particularly relevant post-clustering, typically when testing a difference in mean between two clusters. In this paper, we address convex clusterin… ▽ More Classical inference methods notoriously fail when applied to data-driven test hypotheses or inference targets. Instead, dedicated methodologies are required to obtain statistical guarantees for these selective inference problems. Selective inference is particularly relevant post-clustering, typically when testing a difference in mean between two clusters. In this paper, we address convex clustering with $\ell_1$ penalization, by leveraging related selective inference tools for regression, based on Gaussian vectors conditioned to polyhedral sets. In the one-dimensional case, we prove a polyhedral characterization of obtaining given clusters, than enables us to suggest a test procedure with statistical guarantees. This characterization also allows us to provide a computationally efficient regularization path algorithm. Then, we extend the above test procedure and guarantees to multi-dimensional clustering with $\ell_1$ penalization, and also to more general multi-dimensional clusterings that aggregate one-dimensional ones. With various numerical experiments, we validate our statistical guarantees and we demonstrate the power of our methods to detect differences in mean between clusters. Our methods are implemented in the R package poclin. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 40 pages, 8 figures

MSC Class: 62F03; 62H30

arXiv:2308.14335 [pdf, other]

Improved learning theory for kernel distribution regression with two-stage sampling

Authors: François Bachoc, Louis Béthune, Alberto González-Sanz, Jean-Michel Loubes

Abstract: The distribution regression problem encompasses many important statistics and machine learning tasks, and arises in a large range of applications. Among various existing approaches to tackle this problem, kernel methods have become a method of choice. Indeed, kernel distribution regression is both computationally favorable, and supported by a recent learning theory. This theory also tackles the tw… ▽ More The distribution regression problem encompasses many important statistics and machine learning tasks, and arises in a large range of applications. Among various existing approaches to tackle this problem, kernel methods have become a method of choice. Indeed, kernel distribution regression is both computationally favorable, and supported by a recent learning theory. This theory also tackles the two-stage sampling setting, where only samples from the input distributions are available. In this paper, we improve the learning theory of kernel distribution regression. We address kernels based on Hilbertian embeddings, that encompass most, if not all, of the existing approaches. We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds on the effect of the two-stage sampling, thanks to a new analysis. We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding. As a consequence, we strictly improve the existing convergence rates for these kernels. Our setting and results are illustrated by numerical experiments. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2212.00568 [pdf, other]

Efficient estimation of multiple expectations with the same sample by adaptive importance sampling and control variates

Authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio

Abstract: Some classical uncertainty quantification problems require the estimation of multiple expectations. Estimating all of them accurately is crucial and can have a major impact on the analysis to perform, and standard existing Monte Carlo methods can be costly to do so. We propose here a new procedure based on importance sampling and control variates for estimating more efficiently multiple expectatio… ▽ More Some classical uncertainty quantification problems require the estimation of multiple expectations. Estimating all of them accurately is crucial and can have a major impact on the analysis to perform, and standard existing Monte Carlo methods can be costly to do so. We propose here a new procedure based on importance sampling and control variates for estimating more efficiently multiple expectations with the same sample. We first show that there exists a family of optimal estimators combining both importance sampling and control variates, which however cannot be used in practice because they require the knowledge of the values of the expectations to estimate. Motivated by the form of these optimal estimators and some interesting properties, we therefore propose an adaptive algorithm. The general idea is to adaptively update the parameters of the estimators for approaching the optimal ones. We suggest then a quantitative stop** criterion that exploits the trade-off between approaching these optimal parameters and having a sufficient budget left. This left budget is then used to draw a new independent sample from the final sampling distribution, allowing to get unbiased estimators of the expectations. We show how to apply our procedure to sensitivity analysis, by estimating Sobol' indices and quantifying the impact of the input distributions. Finally, realistic test cases show the practical interest of the proposed algorithm, and its significant improvement over estimating the expectations separately. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: 28 pages, 4 Figures, 6 Tables

arXiv:2209.10176 [pdf, other]

Large-Sample Properties of Non-Stationary Source Separation for Gaussian Signals

Authors: François Bachoc, Christoph Muehlmann, Klaus Nordhausen, Joni Virta

Abstract: Non-stationary source separation is a well-established branch of blind source separation with many different methods. However, for none of these methods large-sample results are available. To bridge this gap, we develop large-sample theory for NSS-JD, a popular method of non-stationary source separation based on the joint diagonalization of block-wise covariance matrices. We work under an instanta… ▽ More Non-stationary source separation is a well-established branch of blind source separation with many different methods. However, for none of these methods large-sample results are available. To bridge this gap, we develop large-sample theory for NSS-JD, a popular method of non-stationary source separation based on the joint diagonalization of block-wise covariance matrices. We work under an instantaneous linear mixing model for independent Gaussian non-stationary source signals together with a very general set of assumptions: besides boundedness conditions, the only assumptions we make are that the sources exhibit finite dependency and that their variance functions differ sufficiently to be asymptotically separable. The consistency of the unmixing estimator and its convergence to a limiting Gaussian distribution at the standard square root rate are shown to hold under the previous conditions. Simulation experiments are used to verify the theoretical results and to study the impact of block length on the separation. △ Less

Submitted 10 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

arXiv:2209.00885 [pdf, ps, other]

Regret Analysis of Dyadic Search

Authors: François Bachoc, Tommaso Cesari, Roberto Colomboni, Andrea Paudice

Abstract: We analyze the cumulative regret of the Dyadic Search algorithm of Bachoc et al. [2022]. We analyze the cumulative regret of the Dyadic Search algorithm of Bachoc et al. [2022]. △ Less

Submitted 24 September, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.06720

arXiv:2208.06720 [pdf, other]

A Near-Optimal Algorithm for Univariate Zeroth-Order Budget Convex Optimization

Authors: François Bachoc, Tommaso Cesari, Roberto Colomboni, Andrea Paudice

Abstract: This paper studies a natural generalization of the problem of minimizing a univariate convex function $f$ by querying its values sequentially. At each time-step $t$, the optimizer can invest a budget $b_t$ in a query point $X_t$ of their choice to obtain a fuzzy evaluation of $f$ at $X_t$ whose accuracy depends on the amount of budget invested in $X_t$ across times. This setting is motivated by th… ▽ More This paper studies a natural generalization of the problem of minimizing a univariate convex function $f$ by querying its values sequentially. At each time-step $t$, the optimizer can invest a budget $b_t$ in a query point $X_t$ of their choice to obtain a fuzzy evaluation of $f$ at $X_t$ whose accuracy depends on the amount of budget invested in $X_t$ across times. This setting is motivated by the minimization of objectives whose values can only be determined approximately through lengthy or expensive computations. We design an any-time parameter-free algorithm called Dyadic Search, for which we prove near-optimal optimization error guarantees. As a byproduct of our analysis, we show that the classical dependence on the global Lipschitz constant in the error bounds is an artifact of the granularity of the budget. Finally, we illustrate our theoretical findings with numerical simulations. △ Less

Submitted 24 September, 2022; v1 submitted 13 August, 2022; originally announced August 2022.

arXiv:2206.07424 [pdf, other]

Local Identifiability of Deep ReLU Neural Networks: the Theory

Authors: Joachim Bona-Pellissier, François Malgouyres, François Bachoc

Abstract: Is a sample rich enough to determine, at least locally, the parameters of a neural network? To answer this question, we introduce a new local parameterization of a given deep ReLU neural network by fixing the values of some of its weights. This allows us to define local lifting operators whose inverses are charts of a smooth manifold of a high dimensional space. The function implemented by the dee… ▽ More Is a sample rich enough to determine, at least locally, the parameters of a neural network? To answer this question, we introduce a new local parameterization of a given deep ReLU neural network by fixing the values of some of its weights. This allows us to define local lifting operators whose inverses are charts of a smooth manifold of a high dimensional space. The function implemented by the deep ReLU neural network composes the local lifting with a linear operator which depends on the sample. We derive from this convenient representation a geometrical necessary and sufficient condition of local identifiability. Looking at tangent spaces, the geometrical condition provides: 1/ a sharp and testable necessary condition of identifiability and 2/ a sharp and testable sufficient condition of local identifiability. The validity of the conditions can be tested numerically using backpropagation and matrix rank computations. △ Less

Submitted 20 December, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Journal ref: Advances in Neural Information Processing Systems, Nov 2022, New Orleans, United States

arXiv:2202.12679 [pdf, other]

Shapley effect estimation in reliability-oriented sensitivity analysis with correlated inputs by importance sampling

Authors: Julien Demange-Chryst, François Bachoc, Jérôme Morio

Abstract: Reliability-oriented sensitivity analysis aims at combining both reliability and sensitivity analyses by quantifying the influence of each input variable of a numerical model on a quantity of interest related to its failure. In particular, target sensitivity analysis focuses on the occurrence of the failure, and more precisely aims to determine which inputs are more likely to lead to the failure o… ▽ More Reliability-oriented sensitivity analysis aims at combining both reliability and sensitivity analyses by quantifying the influence of each input variable of a numerical model on a quantity of interest related to its failure. In particular, target sensitivity analysis focuses on the occurrence of the failure, and more precisely aims to determine which inputs are more likely to lead to the failure of the system. The Shapley effects are quantitative global sensitivity indices which are able to deal with correlated input variables. They have been recently adapted to the target sensitivity analysis framework. In this article, we investigate two importance-sampling-based estimation schemes of these indices which are more efficient than the existing ones when the failure probability is small. Moreover, an extension to the case where only an i.i.d. input/output N-sample distributed according to the importance sampling auxiliary distribution is proposed. This extension allows to estimate the Shapley effects only with a data set distributed according to the importance sampling auxiliary distribution stemming from a reliability analysis without additional calls to the numerical model. In addition, we study theoretically the absence of bias of some estimators as well as the benefit of importance sampling. We also provide numerical guidelines and finally, realistic test cases show the practical interest of the proposed methods. △ Less

Submitted 24 October, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.10762 [pdf, ps, other]

Multivariate Gaussian Random Fields over Generalized Product Spaces involving the Hypertorus

Authors: François Bachoc, Ana Peron, Emilio Porcu

Abstract: The paper deals with multivariate Gaussian random fields defined over generalized product spaces that involve the hypertorus. The assumption of Gaussianity implies the finite dimensional distributions to be completely specified by the covariance functions, being in this case matrix valued map**s. We start by considering the spectral representations that in turn allow for a characterization of su… ▽ More The paper deals with multivariate Gaussian random fields defined over generalized product spaces that involve the hypertorus. The assumption of Gaussianity implies the finite dimensional distributions to be completely specified by the covariance functions, being in this case matrix valued map**s. We start by considering the spectral representations that in turn allow for a characterization of such covariance functions. We then provide some methods for the construction of these matrix valued map**s. Finally, we consider strategies to evade radial symmetry (called isotropy in spatial statistics) and provide representation theorems for such a more general case. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2112.12982 [pdf, other]

Parameter identifiability of a deep feedforward ReLU neural network

Authors: Joachim Bona-Pellissier, François Bachoc, François Malgouyres

Abstract: The possibility for one to recover the parameters-weights and biases-of a neural network thanks to the knowledge of its function on a subset of the input space can be, depending on the situation, a curse or a blessing. On one hand, recovering the parameters allows for better adversarial attacks and could also disclose sensitive information from the dataset used to construct the network. On the oth… ▽ More The possibility for one to recover the parameters-weights and biases-of a neural network thanks to the knowledge of its function on a subset of the input space can be, depending on the situation, a curse or a blessing. On one hand, recovering the parameters allows for better adversarial attacks and could also disclose sensitive information from the dataset used to construct the network. On the other hand, if the parameters of a network can be recovered, it guarantees the user that the features in the latent spaces can be interpreted. It also provides foundations to obtain formal guarantees on the performances of the network. It is therefore important to characterize the networks whose parameters can be identified and those whose parameters cannot. In this article, we provide a set of conditions on a deep fully-connected feedforward ReLU neural network under which the parameters of the network are uniquely identified-modulo permutation and positive rescaling-from the function it implements on a subset of the input space. △ Less

Submitted 12 May, 2023; v1 submitted 24 December, 2021; originally announced December 2021.

arXiv:2112.07280 [pdf, other]

Posterior contraction rates for constrained deep Gaussian processes in density estimation and classication

Authors: François Bachoc, Agnès Lagnoux

Abstract: We provide posterior contraction rates for constrained deep Gaussian processes in non-parametric density estimation and classication. The constraints are in the form of bounds on the values and on the derivatives of the Gaussian processes in the layers of the composition structure. The contraction rates are rst given in a general framework, in terms of a new concentration function that we introduc… ▽ More We provide posterior contraction rates for constrained deep Gaussian processes in non-parametric density estimation and classication. The constraints are in the form of bounds on the values and on the derivatives of the Gaussian processes in the layers of the composition structure. The contraction rates are rst given in a general framework, in terms of a new concentration function that we introduce and that takes the constraints into account. Then, the general framework is applied to integrated Brownian motions, Riemann-Liouville processes, and Mat{é}rn processes and to standard smoothness classes of functions. In each of these examples, we can recover known minimax rates. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2111.09721 [pdf, ps, other]

Bounds in $L^1$ Wasserstein distance on the normal approximation of general M-estimators

Authors: François Bachoc, Max Fathi

Abstract: We derive quantitative bounds on the rate of convergence in $L^1$ Wasserstein distance of general M-estimators, with an almost sharp (up to a logarithmic term) behavior in the number of observations. We focus on situations where the estimator does not have an explicit expression as a function of the data. The general method may be applied even in situations where the observations are not independe… ▽ More We derive quantitative bounds on the rate of convergence in $L^1$ Wasserstein distance of general M-estimators, with an almost sharp (up to a logarithmic term) behavior in the number of observations. We focus on situations where the estimator does not have an explicit expression as a function of the data. The general method may be applied even in situations where the observations are not independent. Our main application is a rate of convergence for cross validation estimation of covariance parameters of Gaussian processes. △ Less

Submitted 18 November, 2021; originally announced November 2021.

arXiv:2102.01977 [pdf, ps, other]

Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates

Authors: François Bachoc, Tommaso R Cesari, Sébastien Gerchinovitz

Abstract: We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy… ▽ More We study the problem of zeroth-order (black-box) optimization of a Lipschitz function $f$ defined on a compact subset $\mathcal X$ of $\mathbb R^d$, with the additional constraint that algorithms must certify the accuracy of their recommendations. We characterize the optimal number of evaluations of any Lipschitz function $f$ to find and certify an approximate maximizer of $f$ at accuracy $\varepsilon$. Under a weak assumption on $\mathcal X$, this optimal sample complexity is shown to be nearly proportional to the integral $\int_{\mathcal X} \mathrm{d}\boldsymbol x/( \max(f) - f(\boldsymbol x) + \varepsilon )^d$. This result, which was only (and partially) known in dimension $d=1$, solves an open problem dating back to 1991. In terms of techniques, our upper bound relies on a packing bound by Bouttier al. (2020) for the Piyavskii-Shubert algorithm that we link to the above integral. We also show that a certified version of the computationally tractable DOO algorithm matches these packing and integral bounds. Our instance-dependent lower bound differs from traditional worst-case lower bounds in the Lipschitz setting and relies on a local worst-case analysis that could likely prove useful for other learning tasks. △ Less

Submitted 22 March, 2023; v1 submitted 3 February, 2021; originally announced February 2021.

Journal ref: NeurIPS 2021, Dec 2021, Virtual conference, France. 24 p

arXiv:2011.01711 [pdf, other]

doi 10.5705/ss.202021.0326

Test of the Latent Dimension of a Spatial Blind Source Separation Model

Authors: Christoph Muehlmann, François Bachoc, Klaus Nordhausen, Mengxi Yi

Abstract: We assume a spatial blind source separation model in which the observed multivariate spatial data is a linear mixture of latent spatially uncorrelated Gaussian random fields containing a number of pure white noise components. We propose a test on the number of white noise components and obtain the asymptotic distribution of its statistic for a general domain. We also demonstrate how computations c… ▽ More We assume a spatial blind source separation model in which the observed multivariate spatial data is a linear mixture of latent spatially uncorrelated Gaussian random fields containing a number of pure white noise components. We propose a test on the number of white noise components and obtain the asymptotic distribution of its statistic for a general domain. We also demonstrate how computations can be facilitated in the case of gridded observation locations. Based on this test, we obtain a consistent estimator of the true dimension. Simulation studies and an environmental application demonstrate that our test is at least comparable to and often outperforms bootstrap-based techniques, which are also introduced in this paper. △ Less

Submitted 24 August, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Journal ref: Statistica Sinica, 34, 837-865, 2024

arXiv:2010.13405 [pdf, ps, other]

The sample complexity of level set approximation

Authors: François Bachoc, Tommaso Cesari, Sébastien Gerchinovitz

Abstract: We study the problem of approximating the level set of an unknown function by sequentially querying its values. We introduce a family of algorithms called Bisect and Approximate through which we reduce the level set approximation problem to a local function approximation problem. We then show how this approach leads to rate-optimal sample complexity guarantees for H{ö}lder functions, and we invest… ▽ More We study the problem of approximating the level set of an unknown function by sequentially querying its values. We introduce a family of algorithms called Bisect and Approximate through which we reduce the level set approximation problem to a local function approximation problem. We then show how this approach leads to rate-optimal sample complexity guarantees for H{ö}lder functions, and we investigate how such rates improve when additional smoothness or other structural assumptions hold true. △ Less

Submitted 23 February, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:2009.07002 [pdf, ps, other]

Asymptotic analysis of maximum likelihood estimation of covariance parameters for Gaussian processes: an introduction with proofs

Authors: François Bachoc

Abstract: This article provides an introduction to the asymptotic analysis of covariance parameter estimation for Gaussian processes. Maximum likelihood estimation is considered. The aim of this introduction is to be accessible to a wide audience and to present some existing results and proof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under inc… ▽ More This article provides an introduction to the asymptotic analysis of covariance parameter estimation for Gaussian processes. Maximum likelihood estimation is considered. The aim of this introduction is to be accessible to a wide audience and to present some existing results and proof techniques from the literature. The increasing-domain and fixed-domain asymptotic settings are considered. Under increasing-domain asymptotics, it is shown that in general all the components of the covariance parameter can be estimated consistently by maximum likelihood and that asymptotic normality holds. In contrast, under fixed-domain asymptotics, only some components of the covariance parameter, constituting the microergodic parameter, can be estimated consistently. Under fixed-domain asymptotics, the special case of the family of isotropic Matérn covariance functions is considered. It is shown that only a combination of the variance and spatial scale parameter is microergodic. A consistency and asymptotic normality proof is sketched for maximum likelihood estimators. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2009.04188 [pdf, other]

Sequential construction and dimension reduction of Gaussian processes under constraints

Authors: François Bachoc, Andrés F. López Lopera, Olivier Roustant

Abstract: Accounting for inequality constraints, such as boundedness, monotonicity or convexity, is challenging when modeling costly-to-evaluate black box functions. In this regard, finite-dimensional Gaussian process (GP) regression models bring a valuable solution, as they guarantee that the inequality constraints are satisfied everywhere. Nevertheless, these models are currently restricted to small dimen… ▽ More Accounting for inequality constraints, such as boundedness, monotonicity or convexity, is challenging when modeling costly-to-evaluate black box functions. In this regard, finite-dimensional Gaussian process (GP) regression models bring a valuable solution, as they guarantee that the inequality constraints are satisfied everywhere. Nevertheless, these models are currently restricted to small dimensional situations (up to dimension 5). Addressing this issue, we introduce the MaxMod algorithm that sequentially inserts one-dimensional knots or adds active variables, thereby performing at the same time dimension reduction and efficient knot allocation. We prove the convergence of this algorithm. In intermediary steps of the proof, we propose the notion of multi-affine extension and study its properties. We also prove the convergence of finite-dimensional GPs, when the knots are not dense in the input space, extending the recent literature. With simulated and real data, we demonstrate that the MaxMod algorithm remains efficient in higher dimension (at least in dimension 20), and needs fewer knots than other constrained GP models from the state-of-the-art, to reach a given approximation error. △ Less

Submitted 10 March, 2022; v1 submitted 9 September, 2020; originally announced September 2020.

arXiv:2007.14684 [pdf, other]

Asymptotically Equivalent Prediction in Multivariate Geostatistics

Authors: François Bachoc, Emilio Porcu, Moreno Bevilacqua, Reinhard Furrer, Tarik Faouzi

Abstract: Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. I… ▽ More Cokriging is the common method of spatial interpolation (best linear unbiased prediction) in multivariate geostatistics. While best linear prediction has been well understood in univariate spatial statistics, the literature for the multivariate case has been elusive so far. The new challenges provided by modern spatial datasets, being typically multivariate, call for a deeper study of cokriging. In particular, we deal with the problem of misspecified cokriging prediction within the framework of fixed domain asymptotics. Specifically, we provide conditions for equivalence of measures associated with multivariate Gaussian random fields, with index set in a compact set of a d-dimensional Euclidean space. Such conditions have been elusive for over about 50 years of spatial statistics. We then focus on the multivariate Matérn and Generalized Wendland classes of matrix valued covariance functions, that have been very popular for having parameters that are crucial to spatial interpolation, and that control the mean square differentiability of the associated Gaussian process. We provide sufficient conditions, for equivalence of Gaussian measures, relying on the covariance parameters of these two classes. This enables to identify the parameters that are crucial to asymptotically equivalent interpolation in multivariate geostatistics. Our findings are then illustrated through simulation studies. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2006.02087 [pdf, other]

Gaussian linear approximation for the estimation of the Shapley effects

Authors: Baptiste Broto, François Bachoc, Marine Depecker, Jean-Marc Martinez

Abstract: In this paper, we address the estimation of the sensitivity indices called "Shapley eects". These sensitivity indices enable to handle dependent input variables. The Shapley eects are generally dicult to estimate, but they are easily computable in the Gaussian linear framework. The aim of this work is to use the values of the Shapley eects in an approximated Gaussian linear framework as estimators… ▽ More In this paper, we address the estimation of the sensitivity indices called "Shapley eects". These sensitivity indices enable to handle dependent input variables. The Shapley eects are generally dicult to estimate, but they are easily computable in the Gaussian linear framework. The aim of this work is to use the values of the Shapley eects in an approximated Gaussian linear framework as estimators of the true Shapley eects corresponding to a non-linear model. First, we assume that the input variables are Gaussian with small variances. We provide rates of convergence of the estimated Shapley eects to the true Shapley eects. Then, we focus on the case where the inputs are given by an non-Gaussian empirical mean. We prove that, under some mild assumptions, when the number of terms in the empirical mean increases, the dierence between the true Shapley eects and the estimated Shapley eects given by the Gaussian linear approximation converges to 0. Our theoretical results are supported by numerical studies, showing that the Gaussian linear approximation is accurate and enables to decrease the computational time signicantly. △ Less

Submitted 3 June, 2020; originally announced June 2020.

arXiv:1911.11199 [pdf, other]

Asymptotic properties of the maximum likelihood and cross validation estimators for transformed Gaussian processes

Authors: François Bachoc, José Bétancourt, Reinhard Furrer, Thierry Klein

Abstract: The asymptotic analysis of covariance parameter estimation of Gaussian processes has been subject to intensive investigation. However, this asymptotic analysis is very scarce for non-Gaussian processes. In this paper, we study a class of non-Gaussian processes obtained by regular non-linear transformations of Gaussian processes. We provide the increasing-domain asymptotic properties of the (Gaussi… ▽ More The asymptotic analysis of covariance parameter estimation of Gaussian processes has been subject to intensive investigation. However, this asymptotic analysis is very scarce for non-Gaussian processes. In this paper, we study a class of non-Gaussian processes obtained by regular non-linear transformations of Gaussian processes. We provide the increasing-domain asymptotic properties of the (Gaussian) maximum likelihood and cross validation estimators of the covariance parameters of a non-Gaussian process of this class. We show that these estimators are consistent and asymptotically normal, although they are defined as if the process was Gaussian. They do not need to model or estimate the non-linear transformation. Our results can thus be interpreted as a robustness of (Gaussian) maximum likelihood and cross validation towards non-Gaussianity. Our proofs rely on two technical results that are of independent interest for the increasing-domain asymptotic literature of spatial processes. First, we show that, under mild assumptions, coefficients of inverses of large covariance matrices decay at an inverse polynomial rate as a function of the corresponding observation location distances. Second, we provide a general central limit theorem for quadratic forms obtained from transformed Gaussian processes. Finally, our asymptotic results are illustrated by numerical simulations. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: 40 pages, 4 figures

arXiv:1910.14458 [pdf, other]

Rate of convergence for geometric inference based on the empirical Christoffel function

Authors: Mai Trang Vu, François Bachoc, Edouard Pauwels

Abstract: We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parame… ▽ More We consider the problem of estimating the support of a measure from a finite, independent, sample. The estimators which are considered are constructed based on the empirical Christoffel function. Such estimators have been proposed for the problem of set estimation with heuristic justifications. We carry out a detailed finite sample analysis, that allows us to select the threshold and degree parameters as a function of the sample size. We provide a convergence rate analysis of the resulting support estimation procedure. Our analysis establishes that we may obtain finite sample bounds which are comparable to existing rates for different set estimation procedures. Our results rely on concentration inequalities for the empirical Christoffel function and on estimates of the supremum of the Christoffel-Darboux kernel on sets with smooth boundaries, that can be considered of independent interest. △ Less

Submitted 19 May, 2020; v1 submitted 31 October, 2019; originally announced October 2019.

arXiv:1907.12780 [pdf, other]

Block-diagonal covariance estimation and application to the Shapley effects in sensitivity analysis

Authors: Baptiste Broto, François Bachoc, Laura Clouvel, Jean-Marc Martinez

Abstract: In this paper, we aim to estimate block-diagonal covariance matrices for Gaussian data in high dimension and in fixed dimension. We first estimate the block-diagonal structure of the covariance matrix by theoretical and practical estimators which are consistent. We deduce that the suggested estimator of the covariance matrix in high dimension converges with the same rate than if the true decomposi… ▽ More In this paper, we aim to estimate block-diagonal covariance matrices for Gaussian data in high dimension and in fixed dimension. We first estimate the block-diagonal structure of the covariance matrix by theoretical and practical estimators which are consistent. We deduce that the suggested estimator of the covariance matrix in high dimension converges with the same rate than if the true decomposition was known. In fixed dimension , we prove that the suggested estimator is asymptotically efficient. Then, we focus on the estimation of sensitivity indices called "Shapley effects", in the high-dimensional Gaussian linear framework. From the estimated covariance matrix, we obtain an estimator of the Shapley effects with a relative error which goes to zero at the parametric rate up to a logarithm factor. Using the block-diagonal structure of the estimated covariance matrix, this estimator is still available for thousands inputs variables, as long as the maximal block is not too large. △ Less

Submitted 13 February, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1812.09187 [pdf, other]

doi 10.1093/biomet/asz079

Spatial Blind Source Separation

Authors: François Bachoc, Marc G. Genton, Klaus Nordhausen, Anne Ruiz-Gazen, Joni Virta

Abstract: Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verif… ▽ More Recently a blind source separation model was suggested for spatial data together with an estimator based on the simultaneous diagonalisation of two scatter matrices. The asymptotic properties of this estimator are derived here and a new estimator, based on the joint diagonalisation of more than two scatter matrices, is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real data example illustrates the method. △ Less

Submitted 8 August, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Journal ref: Biometrika 107: 627-646 (2020)

arXiv:1812.09168 [pdf, other]

Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution

Authors: Baptiste Broto, François Bachoc, Marine Depecker

Abstract: The Shapley effects are global sensitivity indices: they quantify the impact of each input variable on the output variable in a model. In this work, we suggest new estimators of these sensitivity indices. When the input distribution is known, we investigate the already existing estimator and suggest a new one with a lower variance. Then, when the distribution of the inputs is unknown, we extend th… ▽ More The Shapley effects are global sensitivity indices: they quantify the impact of each input variable on the output variable in a model. In this work, we suggest new estimators of these sensitivity indices. When the input distribution is known, we investigate the already existing estimator and suggest a new one with a lower variance. Then, when the distribution of the inputs is unknown, we extend these estimators. Finally, we provide asymptotic properties of the estimators studied in this article. △ Less

Submitted 13 February, 2020; v1 submitted 21 December, 2018; originally announced December 2018.

arXiv:1807.08988 [pdf, other]

Composite likelihood estimation for a gaussian process under fixed domain asymptotics

Authors: François Bachoc, Moreno Bevilacqua, Daira Velandia

Abstract: We study the problem of estimating the covariance parameters of a one-dimensional Gaussian process with exponential covariance function under fixed-domain asymptotics. We show that the weighted pairwise maximum likelihood estimator of the microergodic parameter can be consistent or inconsistent. This depends on the range of admissible parameter values in the likelihood optimization. On the other h… ▽ More We study the problem of estimating the covariance parameters of a one-dimensional Gaussian process with exponential covariance function under fixed-domain asymptotics. We show that the weighted pairwise maximum likelihood estimator of the microergodic parameter can be consistent or inconsistent. This depends on the range of admissible parameter values in the likelihood optimization. On the other hand, the weighted pairwise conditional maximum likelihood estimator is always consistent. Both estimators are also asymptotically Gaussian when they are consistent. Their asymptotic variances are larger or strictly larger than that of the maximum likelihood estimator. A simulation study is presented in order to compare the finite sample behavior of the pairwise likelihood estimators with their asymptotic distributions. For more general covariance functions, an additional inconsistency result is provided, for the weighted pairwise maximum likelihood estimator of a variance parameter. △ Less

Submitted 12 July, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

arXiv:1806.03135 [pdf, other]

Semi-parametric estimation of the variogram of a Gaussian process with stationary increments

Authors: Jean-Marc Azaïs, François Bachoc, Agnès Lagnoux, Thi Mong Ngoc Nguyen

Abstract: We consider the semi-parametric estimation of a scale parameter of a one-dimensional Gaussian process with known smoothness. We suggest an estimator based on quadratic variations and on the moment method. We provide asymptotic approximations of the mean and variance of this estimator, together with asymptotic normality results, for a large class of Gaussian processes. We allow for general mean fun… ▽ More We consider the semi-parametric estimation of a scale parameter of a one-dimensional Gaussian process with known smoothness. We suggest an estimator based on quadratic variations and on the moment method. We provide asymptotic approximations of the mean and variance of this estimator, together with asymptotic normality results, for a large class of Gaussian processes. We allow for general mean functions and study the aggregation of several estimators based on various variation sequences. In extensive simulation studies, we show that the asymptotic results accurately depict thefinite-sample situations already for small to moderate sample sizes. We also compare various variation sequences and highlight the efficiency of the aggregation procedure. △ Less

Submitted 20 January, 2020; v1 submitted 8 June, 2018; originally announced June 2018.

arXiv:1804.07566 [pdf, ps, other]

doi 10.1214/18-ejs1490

On the Post Selection Inference constant under Restricted Isometry Properties

Authors: François Bachoc, Gilles Blanchard, Pierre Neuvial

Abstract: Uniformly valid confidence intervals post model selection in regression can be constructed based on Post-Selection Inference (PoSI) constants. PoSI constants are minimal for orthogonal design matrices, and can be upper bounded in function of the sparsity of the set of models under consideration, for generic design matrices. In order to improve on these generic sparse upper bounds, we consider desi… ▽ More Uniformly valid confidence intervals post model selection in regression can be constructed based on Post-Selection Inference (PoSI) constants. PoSI constants are minimal for orthogonal design matrices, and can be upper bounded in function of the sparsity of the set of models under consideration, for generic design matrices. In order to improve on these generic sparse upper bounds, we consider design matrices satisfying a Restricted Isometry Property (RIP) condition. We provide a new upper bound on the PoSI constant in this setting. This upper bound is an explicit function of the RIP constant of the design matrix, thereby giving an interpolation between the orthogonal setting and the generic sparse setting. We show that this upper bound is asymptotically optimal in many settings by constructing a matching lower bound. △ Less

Submitted 22 November, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

Comments: Electronic journal of statistics, Shaker Heights, OH : Institute of Mathematical Statistics, 2018

arXiv:1804.03378 [pdf, other]

doi 10.1214/19-EJS1587

Maximum likelihood estimation for Gaussian processes under inequality constraints

Authors: François Bachoc, Agnès Lagnoux, Andrés F. López-Lopera

Abstract: We consider covariance parameter estimation for a Gaussian process under inequality constraints (boundedness, monotonicity or convexity) in fixed-domain asymptotics. We address the estimation of the variance parameter and the estimation of the microergodic parameter of the Matérn and Wendland covariance functions. First, we show that the (unconstrained) maximum likelihood estimator has the same as… ▽ More We consider covariance parameter estimation for a Gaussian process under inequality constraints (boundedness, monotonicity or convexity) in fixed-domain asymptotics. We address the estimation of the variance parameter and the estimation of the microergodic parameter of the Matérn and Wendland covariance functions. First, we show that the (unconstrained) maximum likelihood estimator has the same asymptotic distribution, unconditionally and conditionally to the fact that the Gaussian process satisfies the inequality constraints. Then, we study the recently suggested constrained maximum likelihood estimator. We show that it has the same asymptotic distribution as the (unconstrained) maximum likelihood estimator. In addition, we show in simulations that the constrained maximum likelihood estimator is generally more accurate on finite samples. Finally, we provide extensions to prediction and to noisy observations. △ Less

Submitted 15 July, 2019; v1 submitted 10 April, 2018; originally announced April 2018.

arXiv:1801.04095 [pdf, other]

Sensitivity indices for independent groups of variables

Authors: Baptiste Broto, François Bachoc, Marine Depecker, Jean-Marc Martinez

Abstract: In this paper, we study sensitivity indices for independent groups of variables and we look at the particular case of block-additive models. We show in this case that most of the Sobol indices are equal to zero and that Shapley effects can be estimated more efficiently. We then apply this study to Gaussian linear models, and we provide an efficient algorithm to compute the theoretical sensitivity… ▽ More In this paper, we study sensitivity indices for independent groups of variables and we look at the particular case of block-additive models. We show in this case that most of the Sobol indices are equal to zero and that Shapley effects can be estimated more efficiently. We then apply this study to Gaussian linear models, and we provide an efficient algorithm to compute the theoretical sensitivity indices. In numerical experiments, we show that this algorithm compares favourably to other existing methods. We also use the theoretical results to improve the estimation of the Shapley effects for general models, when the inputs form independent groups of variables. △ Less

Submitted 11 December, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

arXiv:1707.05708 [pdf, other]

Properties and comparison of some Kriging sub-model aggregation methods

Authors: François Bachoc, Nicolas Durrande, Didier Rullière, Clément Chevalier

Abstract: Kriging is a widely employed technique, in particular for computer experiments, in machine learning or in geostatistics. An important challenge for Kriging is the computational burden when the data set is large. This article focuses on a class of methods aiming at decreasing this computational cost, consisting in aggregating Kriging predictors based on smaller data subsets. It proves that aggregat… ▽ More Kriging is a widely employed technique, in particular for computer experiments, in machine learning or in geostatistics. An important challenge for Kriging is the computational burden when the data set is large. This article focuses on a class of methods aiming at decreasing this computational cost, consisting in aggregating Kriging predictors based on smaller data subsets. It proves that aggregation methods that ignore the covariancebetween sub-models can yield an inconsistent final Kriging prediction. In contrast, a theoretical study of the nested Kriging method shows additional attractive properties for it: First, this predictor is consistent, second it can be interpreted as an exact conditional distribution for a modified process and third, the conditional covariances given the observations can be computed efficiently. This article also includes a theoretical and numerical analysis of how the assignment of the observation points to the sub-models can affect the prediction ability of the aggregated model. Finally, the nested Kriging method is extended to measurement errors and to universal Kriging. △ Less

Submitted 26 February, 2021; v1 submitted 17 July, 2017; originally announced July 2017.

arXiv:1701.09055 [pdf, other]

A Gaussian Process Regression Model for Distribution Inputs

Authors: François Bachoc, Fabrice Gamboa, Jean-Michel Loubes, Nil Venet

Abstract: Monge-Kantorovich distances, otherwise known as Wasserstein distances, have received a growing attention in statistics and machine learning as a powerful discrepancy measure for probability distributions. In this paper, we focus on forecasting a Gaussian process indexed by probability distributions. For this, we provide a family of positive definite kernels built using transportation based distanc… ▽ More Monge-Kantorovich distances, otherwise known as Wasserstein distances, have received a growing attention in statistics and machine learning as a powerful discrepancy measure for probability distributions. In this paper, we focus on forecasting a Gaussian process indexed by probability distributions. For this, we provide a family of positive definite kernels built using transportation based distances. We provide a probabilistic understanding of these kernels and characterize the corresponding stochastic processes. We prove that the Gaussian processes indexed by distributions corresponding to these kernels can be efficiently forecast, opening new perspectives in Gaussian process modeling. △ Less

Submitted 29 January, 2018; v1 submitted 31 January, 2017; originally announced January 2017.

arXiv:1611.01043 [pdf, ps, other]

Uniformly valid confidence intervals post-model-selection

Authors: François Bachoc, David Preinerstorfer, Lukas Steinberger

Abstract: We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After develo** a genera… ▽ More We suggest general methods to construct asymptotically uniformly valid confidence intervals post-model-selection. The constructions are based on principles recently proposed by Berk et al. (2013). In particular the candidate models used can be misspecified, the target of inference is model-specific, and coverage is guaranteed for any data-driven model selection procedure. After develo** a general theory we apply our methods to practically important situations where the candidate set of models, from which a working model is selected, consists of fixed design homoskedastic or heteroskedastic linear models, or of binary regression models with general link functions. In an extensive simulation study, we find that the proposed confidence intervals perform remarkably well, even when compared to existing methods that are tailored only for specific model selection procedures. △ Less

Submitted 13 November, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

arXiv:1610.02872 [pdf, other]

Cross-validation estimation of covariance parameters under fixed-domain asymptotics

Authors: Francois Bachoc, Agnes Lagnoux, Thi Mong Ngoc Nguyen

Abstract: We consider a one-dimensional Gaussian process having exponential covariance function. Under fixed-domain asymptotics, we prove the strong consistency and asymptotic normality of a cross validation estimator of the microergodic covariance parameter. In this setting, Ying [40] proved the same asymptotic properties for the maximum likelihood estimator. Our proof includes several original or more inv… ▽ More We consider a one-dimensional Gaussian process having exponential covariance function. Under fixed-domain asymptotics, we prove the strong consistency and asymptotic normality of a cross validation estimator of the microergodic covariance parameter. In this setting, Ying [40] proved the same asymptotic properties for the maximum likelihood estimator. Our proof includes several original or more involved components, compared to that of Ying. Also, while the asymptotic variance of maximum likelihood does not depend on the triangular array of observation points under consideration, that of cross validation does, and is shown to be lower and upper bounded. The lower bound coincides with the asymptotic variance of maximum likelihood. We provide examples of triangular arrays of observation points achieving the lower and upper bounds. We illustrate our asymptotic results with simulations, and provide extensions to the case of an unknown mean function. To our knowledge, this work constitutes the first fixed-domain asymptotic analysis of cross validation. △ Less

Submitted 25 July, 2017; v1 submitted 10 October, 2016; originally announced October 2016.

Journal ref: Journal of Multivariate Analysis, Elsevier, 2017, 160, pp.42 - 67

arXiv:1608.01118 [pdf, ps, other]

A supermartingale approach to Gaussian process based sequential design of experiments

Authors: Julien Bect, François Bachoc, David Ginsbourger

Abstract: Gaussian process (GP) models have become a well-established frameworkfor the adaptive design of costly experiments, and notably of computerexperiments. GP-based sequential designs have been found practicallyefficient for various objectives, such as global optimization(estimating the global maximum or maximizer(s) of a function),reliability analysis (estimating a probability of failure) or theesti… ▽ More Gaussian process (GP) models have become a well-established frameworkfor the adaptive design of costly experiments, and notably of computerexperiments. GP-based sequential designs have been found practicallyefficient for various objectives, such as global optimization(estimating the global maximum or maximizer(s) of a function),reliability analysis (estimating a probability of failure) or theestimation of level sets and excursion sets. In this paper, we studythe consistency of an important class of sequential designs, known asstepwise uncertainty reduction (SUR) strategies. Our approach relieson the key observation that the sequence of residual uncertaintymeasures, in SUR strategies, is generally a supermartingale withrespect to the filtration generated by the observations. Thisobservation enables us to establish generic consistency results for abroad class of SUR strategies. The consistency of several popularsequential design strategies is then obtained by means of this generalresult. Notably, we establish the consistency of two SUR strategiesproposed by Bect, Ginsbourger, Li, Picheny and Vazquez (Stat. Comp.,2012)---to the best of our knowledge, these are the first proofs ofconsistency for GP-based sequential design algorithms dedicated to theestimation of excursion sets and their measure. We also establish anew, more general proof of consistency for the expected improvementalgorithm for global optimization which, unlike previous results inthe literature, applies to any GP with continuous sample paths. △ Less

Submitted 30 August, 2018; v1 submitted 3 August, 2016; originally announced August 2016.

arXiv:1603.09059 [pdf, ps, other]

Maximum likelihood estimation for a bivariate Gaussian process under fixed domain asymptotics

Authors: Daira Velandia, François Bachoc, Moreno Bevilacqua, Xavier Gendre, Jean-Michel Loubes

Abstract: We consider maximum likelihood estimation with data from a bivariate Gaussian process with a separable exponential covariance model under fixed domain asymptotic. We first characterize the equivalence of Gaussian measures under this model. Then consistency and asymptotic distribution for the microergodic parameters are established. A simulation study is presented in order to compare the finite sam… ▽ More We consider maximum likelihood estimation with data from a bivariate Gaussian process with a separable exponential covariance model under fixed domain asymptotic. We first characterize the equivalence of Gaussian measures under this model. Then consistency and asymptotic distribution for the microergodic parameters are established. A simulation study is presented in order to compare the finite sample behavior of the maximum likelihood estimator with the given asymptotic distribution. △ Less

Submitted 30 March, 2016; originally announced March 2016.

arXiv:1602.02882 [pdf, ps, other]

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

Authors: François Bachoc, Reinhard Furrer

Abstract: There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum lik… ▽ More There has been a growing interest in providing models for multivariate spatial processes. A majority of these models specify a parametric matrix covariance function. Based on observations, the parameters are estimated by maximum likelihood or variants thereof. While the asymptotic properties of maximum likelihood estimators for univariate spatial processes have been analyzed in detail, maximum likelihood estimators for multivariate spatial processes have not received their deserved attention yet. In this article we consider the classical increasing-domain asymptotic setting restricting the minimum distance between the locations. Then, one of the main components to be studied from a theoretical point of view is the asymptotic positive definiteness of the underlying covariance matrix. Based on very weak assumptions on the matrix covariance function we show that the smallest eigenvalue of the covariance matrix is asymptotically bounded away from zero. Several practical implications are discussed as well. △ Less

Submitted 9 February, 2016; originally announced February 2016.

arXiv:1506.01833 [pdf, other]

Asymptotic properties of multivariate tapering for estimation and prediction

Authors: R. Furrer, F. Bachoc, J. Du

Abstract: Parameter estimation for and prediction of spatially or spatio--temporally correlated random processes are used in many areas and often require the solution of a large linear system based on the covariance matrix of the observations. In recent years, the dataset sizes to which these methods are applied have steadily increased such that straightforward statistical tools are computationally too expe… ▽ More Parameter estimation for and prediction of spatially or spatio--temporally correlated random processes are used in many areas and often require the solution of a large linear system based on the covariance matrix of the observations. In recent years, the dataset sizes to which these methods are applied have steadily increased such that straightforward statistical tools are computationally too expensive to be used. In the univariate context, tapering, i.e., creating sparse approximate linear systems, has been shown to be an efficient tool in both the estimation and prediction settings. The asymptotic properties are derived under an infill asymptotic setting. In this paper we use a domain increasing framework for estimation and prediction using multivariate tapering. Under this asymptotic regime we prove that tapering (one-tapered form) preserves the consistency of the untapered maximum likelihood estimator and show that tapering has asymptotically the same mean squared prediction error as using the corresponding untapered predictor. The theoretical results are illustrated with simulations. △ Less

Submitted 5 June, 2015; originally announced June 2015.

arXiv:1503.00444 [pdf, other]

Optimal configurations of lines and a statistical application

Authors: François Bachoc, Martin Ehler, Manuel Gräf

Abstract: Motivated by the construction of confidence intervals in statistics, we study optimal configurations of $2^d-1$ lines in real projective space $RP^{d-1}$. For small $d$, we determine line sets that numerically minimize a wide variety of potential functions among all configurations of $2^d-1$ lines through the origin. Numerical experiments verify that our findings enable to assess efficiently the t… ▽ More Motivated by the construction of confidence intervals in statistics, we study optimal configurations of $2^d-1$ lines in real projective space $RP^{d-1}$. For small $d$, we determine line sets that numerically minimize a wide variety of potential functions among all configurations of $2^d-1$ lines through the origin. Numerical experiments verify that our findings enable to assess efficiently the tightness of a bound arising from the statistical literature. △ Less

Submitted 2 March, 2015; originally announced March 2015.

Comments: 13 pages

arXiv:1412.4605 [pdf, other]

Valid confidence intervals for post-model-selection predictors

Authors: François Bachoc, Hannes Leeb, Benedikt M. Pötscher

Abstract: We consider inference post-model-selection in linear regression. In this setting, Berk et al.(2013) recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain non-standard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals t… ▽ More We consider inference post-model-selection in linear regression. In this setting, Berk et al.(2013) recently introduced a class of confidence sets, the so-called PoSI intervals, that cover a certain non-standard quantity of interest with a user-specified minimal coverage probability, irrespective of the model selection procedure that is being used. In this paper, we generalize the PoSI intervals to post-model-selection predictors. △ Less

Submitted 27 January, 2017; v1 submitted 15 December, 2014; originally announced December 2014.

Comments: Some material added. Some restructuring of the paper. Some minor errors corrected

MSC Class: 62F25; 62J05

Journal ref: Annals of Statistics 47 (2019), 1475-1504

arXiv:1412.1926 [pdf, other]

Asymptotic analysis of covariance parameter estimation for Gaussian processes in the misspecified case

Authors: François Bachoc

Abstract: In parametric estimation of covariance function of Gaussian processes, it is often the case that the true covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been shown that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihoo… ▽ More In parametric estimation of covariance function of Gaussian processes, it is often the case that the true covariance function does not belong to the parametric set used for estimation. This situation is called the misspecified case. In this case, it has been shown that, for irregular spatial sampling of observation points, Cross Validation can yield smaller prediction errors than Maximum Likelihood. Motivated by this observation, we provide a general asymptotic analysis of the misspecified case, for independent and uniformly distributed observation points. We prove that the Maximum Likelihood estimator asymptotically minimizes a Kullback-Leibler divergence, within the misspecified parametric set, while Cross Validation asymptotically minimizes the integrated square prediction error. In a Monte Carlo simulation, we show that the covariance parameters estimated by Maximum Likelihood and Cross Validation, and the corresponding Kullback-Leibler divergences and integrated square prediction errors, can be strongly contrasting. On a more technical level, we provide new increasing-domain asymptotic results for independent and uniformly distributed observation points. △ Less

Submitted 12 November, 2015; v1 submitted 5 December, 2014; originally announced December 2014.

Comments: A supplementary material (pdf) is available in the arXiv sources

arXiv:1411.5883 [pdf, other]

Hastings-Metropolis algorithm on Markov chains for small-probability estimation

Authors: François Bachoc, Lionel Lenôtre, Achref Bachouch

Abstract: Shielding studies in neutron transport, with Monte Carlo codes, yield challenging problems of small-probability estimation. The particularity of these studies is that the small probability to estimate is formulated in terms of the distribution of a Markov chain, instead of that of a random vector in more classical cases. Thus, it is not straightforward to adapt classical statistical methods, for e… ▽ More Shielding studies in neutron transport, with Monte Carlo codes, yield challenging problems of small-probability estimation. The particularity of these studies is that the small probability to estimate is formulated in terms of the distribution of a Markov chain, instead of that of a random vector in more classical cases. Thus, it is not straightforward to adapt classical statistical methods, for estimating small probabilities involving random vectors, to these neutron-transport problems. A recent interacting-particle method for small-probability estimation, relying on the Hastings-Metropolis algorithm, is presented. It is shown how to adapt the Hastings-Metropolis algorithm when dealing with Markov chains. A convergence result is also shown. Then, the practical implementation of the resulting method for small-probability estimation is treated in details, for a Monte Carlo shielding study. Finally, it is shown, for this study, that the proposed interacting-particle method considerably outperforms a simple-Monte Carlo method, when the probability to estimate is small. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 33 pages

arXiv:1301.4321 [pdf, other]

Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes

Authors: François Bachoc

Abstract: Covariance parameter estimation of Gaussian processes is analyzed in an asymptotic framework. The spatial sampling is a randomly perturbed regular grid and its deviation from the perfect regular grid is controlled by a single scalar regularity parameter. Consistency and asymptotic normality are proved for the Maximum Likelihood and Cross Validation estimators of the covariance parameters. The asym… ▽ More Covariance parameter estimation of Gaussian processes is analyzed in an asymptotic framework. The spatial sampling is a randomly perturbed regular grid and its deviation from the perfect regular grid is controlled by a single scalar regularity parameter. Consistency and asymptotic normality are proved for the Maximum Likelihood and Cross Validation estimators of the covariance parameters. The asymptotic covariance matrices of the covariance parameter estimators are deterministic functions of the regularity parameter. By means of an exhaustive study of the asymptotic covariance matrices, it is shown that the estimation is improved when the regular grid is strongly perturbed. Hence, an asymptotic confirmation is given to the commonly admitted fact that using groups of observation points with small spacing is beneficial to covariance function estimation. Finally, the prediction error, using a consistent estimator of the covariance parameters, is analyzed in details. △ Less

Submitted 8 December, 2014; v1 submitted 18 January, 2013; originally announced January 2013.

Comments: 47 pages. A supplementary material (pdf) is available in the arXiv sources

Journal ref: Journal of Multivariate Analysis 125 (2014) 1-35

arXiv:1301.4320 [pdf, other]

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification

Authors: François Bachoc

Abstract: The Maximum Likelihood (ML) and Cross Validation (CV) methods for estimating covariance hyper-parameters are compared, in the context of Kriging with a misspecified covariance structure. A two-step approach is used. First, the case of the estimation of a single variance hyper-parameter is addressed, for which the fixed correlation function is misspecified. A predictive variance based quality crite… ▽ More The Maximum Likelihood (ML) and Cross Validation (CV) methods for estimating covariance hyper-parameters are compared, in the context of Kriging with a misspecified covariance structure. A two-step approach is used. First, the case of the estimation of a single variance hyper-parameter is addressed, for which the fixed correlation function is misspecified. A predictive variance based quality criterion is introduced and a closed-form expression of this criterion is derived. It is shown that when the correlation function is misspecified, the CV does better compared to ML, while ML is optimal when the model is well-specified. In the second step, the results of the first step are extended to the case when the hyper-parameters of the correlation function are also estimated from data. △ Less

Submitted 31 May, 2013; v1 submitted 18 January, 2013; originally announced January 2013.

Comments: A supplementary material (pdf) is available in the arXiv sources

Journal ref: Computational Statistics and Data Analysis (2013), pp. 55-69

Showing 1–48 of 48 results for author: Bachoc, F