Search | arXiv e-print repository

Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates

Authors: Rémi Leluc, Aymeric Dieuleveut, François Portier, Johan Segers, Aigerim Zhuman

Abstract: The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are poly… ▽ More The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are polynomials on the sphere that form an orthonormal basis of the set of square-integrable functions on the sphere. Putting these two facts together, a new Monte Carlo method, hereby referred to as Spherical Harmonics Control Variates (SHCV), is proposed for approximating the SW distance using spherical harmonics as control variates. The resulting approach is shown to have good theoretical properties, e.g., a no-error property for Gaussian measures under a certain form of linear dependency between the variables. Moreover, an improved rate of convergence, compared to Monte Carlo, is established for general measures. The convergence analysis relies on the Lipschitz property associated to the SW integrand. Several numerical experiments demonstrate the superior performance of SHCV against state-of-the-art methods for SW distance computation. △ Less

Submitted 15 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024

MSC Class: 65C05 (Primary) 65D30; 68Txx; 68Wxx (Secondary)

arXiv:2312.09969 [pdf, other]

Nearest Neighbor Sampling for Covariate Shift Adaptation

Authors: François Portier, Lionel Truquet, Ikko Yamane

Abstract: Many existing covariate shift adaptation methods estimate sample weights given to loss values to mitigate the gap between the source and the target distribution. However, estimating the optimal weights typically involves computationally expensive matrix inversion and hyper-parameter tuning. In this paper, we propose a new covariate shift adaptation method which avoids estimating the weights. The b… ▽ More Many existing covariate shift adaptation methods estimate sample weights given to loss values to mitigate the gap between the source and the target distribution. However, estimating the optimal weights typically involves computationally expensive matrix inversion and hyper-parameter tuning. In this paper, we propose a new covariate shift adaptation method which avoids estimating the weights. The basic idea is to directly work on unlabeled target data, labeled according to the $k$-nearest neighbors in the source dataset. Our analysis reveals that setting $k = 1$ is an optimal choice. This property removes the necessity of tuning the only hyper-parameter $k$ and leads to a running time quasi-linear in the sample size. Our results include sharp rates of convergence for our estimator, with a tight control of the mean square error and explicit constants. In particular, the variance of our estimators has the same rate of convergence as for standard parametric estimation despite their non-parametric nature. The proposed estimator shares similarities with some matching-based treatment effect estimators used, e.g., in biostatistics, econometrics, and epidemiology. Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy. △ Less

Submitted 28 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

arXiv:2310.14826 [pdf, other]

Sharp error bounds for imbalanced classification: how many examples in the minority class?

Authors: Anass Aghbalou, François Portier, Anne Sabourin

Abstract: When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one cl… ▽ More When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field. △ Less

Submitted 16 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2305.06151 [pdf, other]

Speeding up Monte Carlo Integration: Control Neighbors for Optimal Convergence

Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

Abstract: A novel linear integration rule called $\textit{control neighbors}$ is proposed in which nearest neighbor estimates act as control variates to speed up the convergence rate of the Monte Carlo procedure on metric spaces. The main result is the $\mathcal{O}(n^{-1/2} n^{-s/d})$ convergence rate -- where $n$ stands for the number of evaluations of the integrand and $d$ for the dimension of the domain… ▽ More A novel linear integration rule called $\textit{control neighbors}$ is proposed in which nearest neighbor estimates act as control variates to speed up the convergence rate of the Monte Carlo procedure on metric spaces. The main result is the $\mathcal{O}(n^{-1/2} n^{-s/d})$ convergence rate -- where $n$ stands for the number of evaluations of the integrand and $d$ for the dimension of the domain -- of this estimate for Hölder functions with regularity $s \in (0,1]$, a rate which, in some sense, is optimal. Several numerical experiments validate the complexity bound and highlight the good performance of the proposed estimator. △ Less

Submitted 4 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: Accepted to Bernoulli (2024)

arXiv:2205.11890 [pdf, other]

A Quadrature Rule combining Control Variates and Adaptive Importance Sampling

Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

Abstract: Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simp… ▽ More Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simple weighted least squares approach is proposed to improve the procedure with control variates. The procedure takes the form of a quadrature rule with adapted quadrature weights to reflect the information brought in by the control variates. The quadrature points and weights do not depend on the integrand, a computational advantage in case of multiple integrands. Moreover, the target density needs to be known only up to a multiplicative constant. Our main result is a non-asymptotic bound on the probabilistic error of the procedure. The bound proves that for improving the estimate's accuracy, the benefits from adaptive importance sampling and control variates can be combined. The good behavior of the method is illustrated empirically on synthetic examples and real-world data for Bayesian linear regression. △ Less

Submitted 5 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Journal ref: Advances in Neural Information Processing Systems (NeurIPS), 2022

arXiv:2110.15590 [pdf, other]

Adaptive Importance Sampling meets Mirror Descent: a Bias-variance tradeoff

Authors: Anna Korba, François Portier

Abstract: Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major drawback of adaptive importance sampling is the large variance of the weights which is known to badly impact the accuracy of the estimates. This paper investigates a regularization strategy whose basic principle is to raise the i… ▽ More Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major drawback of adaptive importance sampling is the large variance of the weights which is known to badly impact the accuracy of the estimates. This paper investigates a regularization strategy whose basic principle is to raise the importance weights at a certain power. This regularization parameter, that might evolve between zero and one during the algorithm, is shown (i) to balance between the bias and the variance and (ii) to be connected to the mirror descent framework. Using a kernel density estimate to build the sampling policy, the uniform convergence is established under mild conditions. Finally, several practical ways to choose the regularization parameter are discussed and the benefits of the proposed approach are illustrated empirically. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: 35 pages, 5 figures

MSC Class: 62L20

arXiv:2110.15083 [pdf, ps, other]

Nearest neighbor empirical processes

Authors: François Portier

Abstract: In the regression framework, the empirical measure based on the responses resulting from the nearest neighbors, among the covariates, to a given point $x$ is introduced and studied as a central statistical quantity. First, the associated empirical process is shown to satisfy a uniform central limit theorem under a local bracketing entropy condition on the underlying class of functions reflecting t… ▽ More In the regression framework, the empirical measure based on the responses resulting from the nearest neighbors, among the covariates, to a given point $x$ is introduced and studied as a central statistical quantity. First, the associated empirical process is shown to satisfy a uniform central limit theorem under a local bracketing entropy condition on the underlying class of functions reflecting the localizing nature of the nearest neighbor algorithm. Second a uniform non-asymptotic bound is established under a well-known condition, often referred to as Vapnik-Chervonenkis, on the uniform entropy numbers. The covariance of the Gaussian limit obtained in the uniform central limit theorem is simply equal to the conditional covariance operator given the covariate value. This suggests the possibility of using standard formulas to estimate the variance by using only the nearest neighbors instead of the full data. This is illustrated on two problems: the estimation of the conditional cumulative distribution function and local linear regression. △ Less

Submitted 10 April, 2024; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 34 pages

MSC Class: 62G05

arXiv:2108.01432 [pdf, other]

Tail inverse regression for dimension reduction with extreme response

Authors: Anass Aghbalou, François Portier, Anne Sabourin, Chen Zhou

Abstract: We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose… ▽ More We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose an original definition of Tail Conditional Independence which matches this purpose. Inspired by Sliced Inverse Regression (SIR) methods, we develop a novel framework (TIREX, Tail Inverse Regression for EXtreme response) in order to estimate an extreme sufficient dimension reduction (SDR) space of potentially smaller dimension than that of a classical SDR space. We prove the weak convergence of tail empirical processes involved in the estimation procedure and we illustrate the relevance of the proposed approach on simulated and real world data. △ Less

Submitted 24 February, 2023; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: main paper: 31 pages + supplementary material: 16 pages

MSC Class: 62G32; 62H25; 62G08; 62G30

arXiv:2107.12825 [pdf]

Individual Survival Curves with Conditional Normalizing Flows

Authors: Guillaume Ausset, Tom Ciffreo, Francois Portier, Stephan Clémençon, Timothée Papin

Abstract: Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medic… ▽ More Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medicine. We introduce here a conditional normalizing flow based estimate of the time-to-event density as a way to model highly flexible and individualized conditional survival distributions. We use a novel hierarchical formulation of normalizing flows to enable efficient fitting of flexible conditional distributions without overfitting and show how the normalizing flow formulation can be efficiently adapted to the censored setting. We experimentally validate the proposed approach on a synthetic dataset as well as four open medical datasets and an example of a common financial problem. △ Less

Submitted 27 July, 2021; originally announced July 2021.

Comments: IEEE DSAA '21

arXiv:2105.11818 [pdf, other]

SGD with Coordinate Sampling: Theory and Practice

Authors: Rémi Leluc, François Portier

Abstract: While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, w… ▽ More While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, we develop an algorithm, MUSKETEER, based on a reinforcement strategy: after collecting information on the noisy gradients, it samples the most promising coordinate (all for one); then it moves along the one direction yielding an important decrease of the objective (one for all). Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems. △ Less

Submitted 15 October, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Journal of Machine Learning Research 2022

arXiv:2006.15043 [pdf, other]

Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic Bounds and Applications

Authors: Guillaume Ausset, Stephan Clémençon, François Portier

Abstract: Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted upon observing a (possibly high dimensional) ra… ▽ More Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted upon observing a (possibly high dimensional) random vector $X$ by means of a predictive function $f(X)$ as accurately as possible in the mean-squared sense and study a nearest-neighbour-based pointwise estimate of the gradient of the optimal predictive function, the regression function $m(x)=\mathbb{E}[Y\mid X=x]$. Under classic smoothness conditions combined with the assumption that the tails of $Y-m(X)$ are sub-Gaussian, we prove nonasymptotic bounds improving upon those obtained for alternative estimation methods. Beyond the novel theoretical results established, several illustrative numerical experiments have been carried out. The latter provide strong empirical evidence that the estimation method proposed works very well for various statistical problems involving gradient estimation, namely dimensionality reduction, stochastic gradient descent optimization and quantifying disentanglement. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.12839 [pdf, other]

Conditional independence testing via weighted partial copulas and nearest neighbors

Authors: Pascal Bianchi, Kevin Elgui, François Portier

Abstract: This paper introduces the \textit{weighted partial copula} function for testing conditional independence. The proposed test procedure results from these two ingredients: (i) the test statistic is an explicit Cramer-von Mises transformation of the \textit{weighted partial copula}, (ii) the regions of rejection are computed using a bootstrap procedure which mimics conditional independence by generat… ▽ More This paper introduces the \textit{weighted partial copula} function for testing conditional independence. The proposed test procedure results from these two ingredients: (i) the test statistic is an explicit Cramer-von Mises transformation of the \textit{weighted partial copula}, (ii) the regions of rejection are computed using a bootstrap procedure which mimics conditional independence by generating samples from the product measure of the estimated conditional marginals. Under conditional independence, the weak convergence of the \textit{weighted partial copula proces}s is established when the marginals are estimated using a smoothed local linear estimator. Finally, an experimental section demonstrates that the proposed test has competitive power compared to recent state-of-the-art methods such as kernel-based test. △ Less

Submitted 12 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

arXiv:2006.09223 [pdf, ps, other]

Risk bounds when learning infinitely many response functions by ordinary linear regression

Authors: Vincent Plassier, François Portier, Johan Segers

Abstract: Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response function… ▽ More Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response functions are modelled as linear functionals of the mapped features, with coefficients calibrated via ordinary least squares. We provide convergence guarantees on the worst-case excess prediction risk by controlling the convergence rate of the excess risk uniformly in the response function. The dimension of the feature map is allowed to tend to infinity with the sample size. The collection of response functions, although potentially infinite, is supposed to have a finite Vapnik-Chervonenkis dimension. The bound derived can be applied when building multiple surrogate models in a reasonable computing time. △ Less

Submitted 27 November, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 27 pages

arXiv:1910.11095 [pdf, other]

High dimensional regression for regenerative time-series: an application to road traffic modeling

Authors: Mohammed Bouchouia, François Portier

Abstract: A statistical predictive model in which a high-dimensional time-series regenerates at the end of each day is used to model road traffic. Due to the regeneration, prediction is based on a daily modeling using a vector autoregressive model that combines linearly the past observations of the day. Due to the high-dimension, the learning algorithm follows from an L1-penalization of the regression coeff… ▽ More A statistical predictive model in which a high-dimensional time-series regenerates at the end of each day is used to model road traffic. Due to the regeneration, prediction is based on a daily modeling using a vector autoregressive model that combines linearly the past observations of the day. Due to the high-dimension, the learning algorithm follows from an L1-penalization of the regression coefficients. Excess risk bounds are established under the high-dimensional framework in which the number of road sections goes to infinity with the number of observed days. Considering floating car data observed in an urban area, the approach is compared to state-of-the-art methods including neural networks. In addition of being highly competitive in terms of prediction, it enables the identification of the most determinant sections of the road network. △ Less

Submitted 26 January, 2021; v1 submitted 24 October, 2019; originally announced October 2019.

MSC Class: Primary 62J05; 62J07; secondary 62P30

arXiv:1906.01908 [pdf, other]

Empirical Risk Minimization under Random Censorship: Theory and Practice

Authors: Guillaume Ausset, Stéphan Clémençon, François Portier

Abstract: We consider the classic supervised learning problem, where a continuous non-negative random label $Y$ (i.e. a random duration) is to be predicted based upon observing a random vector $X$ valued in $\mathbb{R}^d$ with $d\geq 1$ by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis… ▽ More We consider the classic supervised learning problem, where a continuous non-negative random label $Y$ (i.e. a random duration) is to be predicted based upon observing a random vector $X$ valued in $\mathbb{R}^d$ with $d\geq 1$ by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis for instance, training observations can be right censored, meaning that, rather than on independent copies of $(X,Y)$, statistical learning relies on a collection of $n\geq 1$ independent realizations of the triplet $(X, \; \min\{Y,\; C\},\; δ)$, where $C$ is a nonnegative r.v. with unknown distribution, modeling censorship and $δ=\mathbb{I}\{Y\leq C\}$ indicates whether the duration is right censored or not. As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship $C$ given $X$, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization. It is established, under mild conditions, that the learning rate of minimizers of this biased/weighted empirical risk functional is of order $O_{\mathbb{P}}(\sqrt{\log(n)/n})$ when ignoring model bias issues inherent to plug-in estimation, as can be attained in absence of censorship. Beyond theoretical results, numerical experiments are presented in order to illustrate the relevance of the approach developed. △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: Submitted to JMLR. 18 pages + Appendix

arXiv:1806.05830 [pdf, other]

Parametric versus nonparametric: the fitness coefficient

Authors: Gildas Mazo, François Portier

Abstract: The fitness coefficient, introduced in this paper, results from a competition between parametric and nonparametric density estimators within the likelihood of the data. As illustrated on several real datasets, the fitness coefficient generally agrees with p-values but is easier to compute and interpret. Namely, the fitness coefficient can be interpreted as the proportion of data coming from the pa… ▽ More The fitness coefficient, introduced in this paper, results from a competition between parametric and nonparametric density estimators within the likelihood of the data. As illustrated on several real datasets, the fitness coefficient generally agrees with p-values but is easier to compute and interpret. Namely, the fitness coefficient can be interpreted as the proportion of data coming from the parametric model. Moreover, the fitness coefficient can be used to build a semiparamteric compromise which improves inference over the parametric and nonparametric approaches. From a theoretical perspective, the fitness coefficient is shown to converge in probability to one if the model is true and to zero if the model is false. From a practical perspective, the utility of the fitness coefficient is illustrated on real and simulated datasets. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1806.02107 [pdf, ps, other]

Rademacher complexity for Markov chains : Applications to kernel smoothing and Metropolis-Hasting

Authors: Patrice Bertail, François Portier

Abstract: Following the seminal approach by Talagrand, the concept of Rademacher complexity for independent sequences of random variables is extended to Markov chains. The proposed notion of "block Rademacher complexity" (of a class of functions) follows from renewal theory and allows to control the expected values of suprema (over the class of functions) of empirical processes based on Harris Markov chains… ▽ More Following the seminal approach by Talagrand, the concept of Rademacher complexity for independent sequences of random variables is extended to Markov chains. The proposed notion of "block Rademacher complexity" (of a class of functions) follows from renewal theory and allows to control the expected values of suprema (over the class of functions) of empirical processes based on Harris Markov chains as well as the excess probability. For classes of Vapnik-Chervonenkis type, bounds on the "block Rademacher complexity" are established. These bounds depend essentially on the sample size and the probability tails of the regeneration times. The proposed approach is employed to obtain convergence rates for the kernel density estimator of the stationary measure and to derive concentration inequalities for the Metropolis-Hasting algorithm. △ Less

Submitted 6 July, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 22 pages

arXiv:1806.01082 [pdf, other]

On an extension of the promotion time cure model

Authors: François Portier, Ingrid Van Keilegom, Anouar El Ghouch

Abstract: We consider the problem of estimating the distribution of time-to-event data that are subject to censoring and for which the event of interest might never occur, i.e., some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model \citep{yakovlev1996}, which adapts the Cox model to the presence of cured s… ▽ More We consider the problem of estimating the distribution of time-to-event data that are subject to censoring and for which the event of interest might never occur, i.e., some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model \citep{yakovlev1996}, which adapts the Cox model to the presence of cured subjects. Estimating the conditional distribution results in a complicated constrained optimization problem, and inference is difficult as no closed-formula for the variance is available. We propose a new model, inspired by the Cox model, that leads to a simple estimation procedure and that presents a closed formula for the variance. We derive some asymptotic properties of the estimators and we show the practical behaviour of our procedure by means of simulations. We also apply our model and estimation method to a breast cancer data set. △ Less

Submitted 4 June, 2018; originally announced June 2018.

Comments: 41 pages, 5 figures

arXiv:1806.00989 [pdf, other]

Asymptotic optimality of adaptive importance sampling

Authors: Bernard Delyon, François Portier

Abstract: Adaptive importance sampling (AIS) uses past samples to update the \textit{sampling policy} $q_t$ at each stage $t$. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in this paper concerns the behavior of empirical sums bas… ▽ More Adaptive importance sampling (AIS) uses past samples to update the \textit{sampling policy} $q_t$ at each stage $t$. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in this paper concerns the behavior of empirical sums based on AIS. Without making any assumption on the allocation policy $n_t$, the theory developed involves no restriction on the split of computational resources between the explore (i) and the exploit (ii) step. It is shown that AIS is asymptotically optimal : the asymptotic behavior of AIS is the same as some "oracle" strategy that knows the targeted sampling policy from the beginning. From a practical perspective, weighted AIS is introduced, a new method that allows to forget poor samples from early stages. △ Less

Submitted 3 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

Comments: 19 pages, 3 figures

arXiv:1801.01797 [pdf, ps, other]

Monte Carlo integration with a growing number of control variates

Authors: François Portier, Johan Segers

Abstract: It is well known that Monte Carlo integration with variance reduction by means of control variates can be implemented by the ordinary least squares estimator for the intercept in a multiple linear regression model. A central limit theorem is established for the integration error if the number of control variates tends to infinity. The integration error is scaled by the standard deviation of the er… ▽ More It is well known that Monte Carlo integration with variance reduction by means of control variates can be implemented by the ordinary least squares estimator for the intercept in a multiple linear regression model. A central limit theorem is established for the integration error if the number of control variates tends to infinity. The integration error is scaled by the standard deviation of the error term in the regression model. If the linear span of the control variates is dense in a function space that contains the integrand, the integration error tends to zero at a rate which is faster than the square root of the number of Monte Carlo replicates. Depending on the situation, increasing the number of control variates may or may not be computationally more efficient than increasing the Monte Carlo sample size. △ Less

Submitted 9 October, 2019; v1 submitted 5 January, 2018; originally announced January 2018.

Comments: 22 pages. Numerical experiments in earlier version

MSC Class: 60F05; 62J05; 65C05

arXiv:1509.04413 [pdf, ps, other]

Efficiency of Z-estimators indexed by the objective functions

Authors: François Portier

Abstract: We study the convergence of $Z$-estimators $\widehat θ(η)\in \mathbb R^p$ for which the objective function depends on a parameter $η$ that belongs to a Banach space $\mathcal H$. Our results include the uniform consistency over $\mathcal H$ and the weak convergence in the space of bounded $\mathbb R^p$-valued functions defined on $\mathcal H$. Furthermore when $η$ is a tuning parameter optimally s… ▽ More We study the convergence of $Z$-estimators $\widehat θ(η)\in \mathbb R^p$ for which the objective function depends on a parameter $η$ that belongs to a Banach space $\mathcal H$. Our results include the uniform consistency over $\mathcal H$ and the weak convergence in the space of bounded $\mathbb R^p$-valued functions defined on $\mathcal H$. Furthermore when $η$ is a tuning parameter optimally selected at $η_0$, we provide conditions under which an estimated $\widehat η$ can be replaced by $η_0$ without affecting the asymptotic variance. Interestingly, these conditions are free from any rate of convergence of $\widehat η$ to $η_0$ but they require the space described by $\widehat η$ to be not too large. We highlight several applications of our results and we study in detail the case where $η$ is the weight function in weighted regression. △ Less

Submitted 15 September, 2015; originally announced September 2015.

Comments: 25 pages, 4 figures

MSC Class: 62F12; 62F35; 62G20

Showing 1–21 of 21 results for author: Portier, F