Skip to main content

Showing 1–21 of 21 results for author: Portier, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2402.01493  [pdf, other

    stat.ML cs.LG

    Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates

    Authors: Rémi Leluc, Aymeric Dieuleveut, François Portier, Johan Segers, Aigerim Zhuman

    Abstract: The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are poly… ▽ More

    Submitted 15 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

    MSC Class: 65C05 (Primary) 65D30; 68Txx; 68Wxx (Secondary)

  2. arXiv:2312.09969  [pdf, other

    stat.ML cs.LG

    Nearest Neighbor Sampling for Covariate Shift Adaptation

    Authors: François Portier, Lionel Truquet, Ikko Yamane

    Abstract: Many existing covariate shift adaptation methods estimate sample weights given to loss values to mitigate the gap between the source and the target distribution. However, estimating the optimal weights typically involves computationally expensive matrix inversion and hyper-parameter tuning. In this paper, we propose a new covariate shift adaptation method which avoids estimating the weights. The b… ▽ More

    Submitted 28 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2310.14826  [pdf, other

    stat.ML cs.LG

    Sharp error bounds for imbalanced classification: how many examples in the minority class?

    Authors: Anass Aghbalou, François Portier, Anne Sabourin

    Abstract: When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one cl… ▽ More

    Submitted 16 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  4. arXiv:2305.06151  [pdf, other

    math.NA math.ST stat.CO

    Speeding up Monte Carlo Integration: Control Neighbors for Optimal Convergence

    Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

    Abstract: A novel linear integration rule called $\textit{control neighbors}$ is proposed in which nearest neighbor estimates act as control variates to speed up the convergence rate of the Monte Carlo procedure on metric spaces. The main result is the $\mathcal{O}(n^{-1/2} n^{-s/d})$ convergence rate -- where $n$ stands for the number of evaluations of the integrand and $d$ for the dimension of the domain… ▽ More

    Submitted 4 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted to Bernoulli (2024)

  5. arXiv:2205.11890  [pdf, other

    stat.ML cs.LG math.ST

    A Quadrature Rule combining Control Variates and Adaptive Importance Sampling

    Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

    Abstract: Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simp… ▽ More

    Submitted 5 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS), 2022

  6. arXiv:2110.15590  [pdf, other

    math.ST stat.ML

    Adaptive Importance Sampling meets Mirror Descent: a Bias-variance tradeoff

    Authors: Anna Korba, François Portier

    Abstract: Adaptive importance sampling is a widely spread Monte Carlo technique that uses a re-weighting strategy to iteratively estimate the so-called target distribution. A major drawback of adaptive importance sampling is the large variance of the weights which is known to badly impact the accuracy of the estimates. This paper investigates a regularization strategy whose basic principle is to raise the i… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

    Comments: 35 pages, 5 figures

    MSC Class: 62L20

  7. arXiv:2110.15083  [pdf, ps, other

    math.ST stat.ML

    Nearest neighbor empirical processes

    Authors: François Portier

    Abstract: In the regression framework, the empirical measure based on the responses resulting from the nearest neighbors, among the covariates, to a given point $x$ is introduced and studied as a central statistical quantity. First, the associated empirical process is shown to satisfy a uniform central limit theorem under a local bracketing entropy condition on the underlying class of functions reflecting t… ▽ More

    Submitted 10 April, 2024; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: 34 pages

    MSC Class: 62G05

  8. arXiv:2108.01432  [pdf, other

    math.ST stat.ME

    Tail inverse regression for dimension reduction with extreme response

    Authors: Anass Aghbalou, François Portier, Anne Sabourin, Chen Zhou

    Abstract: We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose… ▽ More

    Submitted 24 February, 2023; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: main paper: 31 pages + supplementary material: 16 pages

    MSC Class: 62G32; 62H25; 62G08; 62G30

  9. arXiv:2107.12825  [pdf

    cs.LG stat.ML

    Individual Survival Curves with Conditional Normalizing Flows

    Authors: Guillaume Ausset, Tom Ciffreo, Francois Portier, Stephan Clémençon, Timothée Papin

    Abstract: Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medic… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: IEEE DSAA '21

  10. arXiv:2105.11818  [pdf, other

    stat.ML cs.LG

    SGD with Coordinate Sampling: Theory and Practice

    Authors: Rémi Leluc, François Portier

    Abstract: While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, w… ▽ More

    Submitted 15 October, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

    Comments: Journal of Machine Learning Research 2022

  11. arXiv:2006.15043  [pdf, other

    cs.LG stat.ML

    Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic Bounds and Applications

    Authors: Guillaume Ausset, Stephan Clémençon, François Portier

    Abstract: Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted upon observing a (possibly high dimensional) ra… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  12. arXiv:2006.12839  [pdf, other

    stat.ME stat.ML

    Conditional independence testing via weighted partial copulas and nearest neighbors

    Authors: Pascal Bianchi, Kevin Elgui, François Portier

    Abstract: This paper introduces the \textit{weighted partial copula} function for testing conditional independence. The proposed test procedure results from these two ingredients: (i) the test statistic is an explicit Cramer-von Mises transformation of the \textit{weighted partial copula}, (ii) the regions of rejection are computed using a bootstrap procedure which mimics conditional independence by generat… ▽ More

    Submitted 12 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  13. arXiv:2006.09223  [pdf, ps, other

    stat.ML cs.LG math.ST

    Risk bounds when learning infinitely many response functions by ordinary linear regression

    Authors: Vincent Plassier, François Portier, Johan Segers

    Abstract: Consider the problem of learning a large number of response functions simultaneously based on the same input variables. The training data consist of a single independent random sample of the input variables drawn from a common distribution together with the associated responses. The input variables are mapped into a high-dimensional linear space, called the feature space, and the response function… ▽ More

    Submitted 27 November, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 27 pages

  14. arXiv:1910.11095  [pdf, other

    math.ST stat.ML

    High dimensional regression for regenerative time-series: an application to road traffic modeling

    Authors: Mohammed Bouchouia, François Portier

    Abstract: A statistical predictive model in which a high-dimensional time-series regenerates at the end of each day is used to model road traffic. Due to the regeneration, prediction is based on a daily modeling using a vector autoregressive model that combines linearly the past observations of the day. Due to the high-dimension, the learning algorithm follows from an L1-penalization of the regression coeff… ▽ More

    Submitted 26 January, 2021; v1 submitted 24 October, 2019; originally announced October 2019.

    MSC Class: Primary 62J05; 62J07; secondary 62P30

  15. arXiv:1906.01908  [pdf, other

    cs.LG math.ST stat.ML

    Empirical Risk Minimization under Random Censorship: Theory and Practice

    Authors: Guillaume Ausset, Stéphan Clémençon, François Portier

    Abstract: We consider the classic supervised learning problem, where a continuous non-negative random label $Y$ (i.e. a random duration) is to be predicted based upon observing a random vector $X$ valued in $\mathbb{R}^d$ with $d\geq 1$ by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Submitted to JMLR. 18 pages + Appendix

  16. arXiv:1806.05830  [pdf, other

    math.ST stat.ME

    Parametric versus nonparametric: the fitness coefficient

    Authors: Gildas Mazo, François Portier

    Abstract: The fitness coefficient, introduced in this paper, results from a competition between parametric and nonparametric density estimators within the likelihood of the data. As illustrated on several real datasets, the fitness coefficient generally agrees with p-values but is easier to compute and interpret. Namely, the fitness coefficient can be interpreted as the proportion of data coming from the pa… ▽ More

    Submitted 15 June, 2018; originally announced June 2018.

  17. arXiv:1806.02107  [pdf, ps, other

    math.ST stat.CO

    Rademacher complexity for Markov chains : Applications to kernel smoothing and Metropolis-Hasting

    Authors: Patrice Bertail, François Portier

    Abstract: Following the seminal approach by Talagrand, the concept of Rademacher complexity for independent sequences of random variables is extended to Markov chains. The proposed notion of "block Rademacher complexity" (of a class of functions) follows from renewal theory and allows to control the expected values of suprema (over the class of functions) of empirical processes based on Harris Markov chains… ▽ More

    Submitted 6 July, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: 22 pages

  18. arXiv:1806.01082  [pdf, other

    math.ST stat.ME

    On an extension of the promotion time cure model

    Authors: François Portier, Ingrid Van Keilegom, Anouar El Ghouch

    Abstract: We consider the problem of estimating the distribution of time-to-event data that are subject to censoring and for which the event of interest might never occur, i.e., some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model \citep{yakovlev1996}, which adapts the Cox model to the presence of cured s… ▽ More

    Submitted 4 June, 2018; originally announced June 2018.

    Comments: 41 pages, 5 figures

  19. arXiv:1806.00989  [pdf, other

    math.ST stat.CO stat.ML

    Asymptotic optimality of adaptive importance sampling

    Authors: Bernard Delyon, François Portier

    Abstract: Adaptive importance sampling (AIS) uses past samples to update the \textit{sampling policy} $q_t$ at each stage $t$. Each stage $t$ is formed with two steps : (i) to explore the space with $n_t$ points according to $q_t$ and (ii) to exploit the current amount of information to update the sampling policy. The very fundamental question raised in this paper concerns the behavior of empirical sums bas… ▽ More

    Submitted 3 October, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

    Comments: 19 pages, 3 figures

  20. arXiv:1801.01797  [pdf, ps, other

    math.ST stat.AP

    Monte Carlo integration with a growing number of control variates

    Authors: François Portier, Johan Segers

    Abstract: It is well known that Monte Carlo integration with variance reduction by means of control variates can be implemented by the ordinary least squares estimator for the intercept in a multiple linear regression model. A central limit theorem is established for the integration error if the number of control variates tends to infinity. The integration error is scaled by the standard deviation of the er… ▽ More

    Submitted 9 October, 2019; v1 submitted 5 January, 2018; originally announced January 2018.

    Comments: 22 pages. Numerical experiments in earlier version

    MSC Class: 60F05; 62J05; 65C05

  21. arXiv:1509.04413  [pdf, ps, other

    math.ST stat.ME

    Efficiency of Z-estimators indexed by the objective functions

    Authors: François Portier

    Abstract: We study the convergence of $Z$-estimators $\widehat θ(η)\in \mathbb R^p$ for which the objective function depends on a parameter $η$ that belongs to a Banach space $\mathcal H$. Our results include the uniform consistency over $\mathcal H$ and the weak convergence in the space of bounded $\mathbb R^p$-valued functions defined on $\mathcal H$. Furthermore when $η$ is a tuning parameter optimally s… ▽ More

    Submitted 15 September, 2015; originally announced September 2015.

    Comments: 25 pages, 4 figures

    MSC Class: 62F12; 62F35; 62G20