Search | arXiv e-print repository

Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates

Authors: Rémi Leluc, Aymeric Dieuleveut, François Portier, Johan Segers, Aigerim Zhuman

Abstract: The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are poly… ▽ More The Sliced-Wasserstein (SW) distance between probability measures is defined as the average of the Wasserstein distances resulting for the associated one-dimensional projections. As a consequence, the SW distance can be written as an integral with respect to the uniform measure on the sphere and the Monte Carlo framework can be employed for calculating the SW distance. Spherical harmonics are polynomials on the sphere that form an orthonormal basis of the set of square-integrable functions on the sphere. Putting these two facts together, a new Monte Carlo method, hereby referred to as Spherical Harmonics Control Variates (SHCV), is proposed for approximating the SW distance using spherical harmonics as control variates. The resulting approach is shown to have good theoretical properties, e.g., a no-error property for Gaussian measures under a certain form of linear dependency between the variables. Moreover, an improved rate of convergence, compared to Monte Carlo, is established for general measures. The convergence analysis relies on the Lipschitz property associated to the SW integrand. Several numerical experiments demonstrate the superior performance of SHCV against state-of-the-art methods for SW distance computation. △ Less

Submitted 15 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024

MSC Class: 65C05 (Primary) 65D30; 68Txx; 68Wxx (Secondary)

arXiv:2305.06151 [pdf, other]

Speeding up Monte Carlo Integration: Control Neighbors for Optimal Convergence

Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

Abstract: A novel linear integration rule called $\textit{control neighbors}$ is proposed in which nearest neighbor estimates act as control variates to speed up the convergence rate of the Monte Carlo procedure on metric spaces. The main result is the $\mathcal{O}(n^{-1/2} n^{-s/d})$ convergence rate -- where $n$ stands for the number of evaluations of the integrand and $d$ for the dimension of the domain… ▽ More A novel linear integration rule called $\textit{control neighbors}$ is proposed in which nearest neighbor estimates act as control variates to speed up the convergence rate of the Monte Carlo procedure on metric spaces. The main result is the $\mathcal{O}(n^{-1/2} n^{-s/d})$ convergence rate -- where $n$ stands for the number of evaluations of the integrand and $d$ for the dimension of the domain -- of this estimate for Hölder functions with regularity $s \in (0,1]$, a rate which, in some sense, is optimal. Several numerical experiments validate the complexity bound and highlight the good performance of the proposed estimator. △ Less

Submitted 4 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: Accepted to Bernoulli (2024)

arXiv:2207.13572 [pdf, other]

Membership Inference Attacks via Adversarial Examples

Authors: Hamid Jalalzai, Elie Kadoche, Rémi Leluc, Vincent Plassier

Abstract: The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training dat… ▽ More The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm. In this paper, we develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model near its training samples. We extend our work by providing a novel defense mechanism. Our contributions are supported by empirical evidence through convincing numerical experiments. △ Less

Submitted 22 November, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: Trustworthy and Socially Responsible Machine Learning (TSRML 2022) co-located with NeurIPS 2022

arXiv:2205.11890 [pdf, other]

A Quadrature Rule combining Control Variates and Adaptive Importance Sampling

Authors: Rémi Leluc, François Portier, Johan Segers, Aigerim Zhuman

Abstract: Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simp… ▽ More Driven by several successful applications such as in stochastic gradient descent or in Bayesian computation, control variates have become a major tool for Monte Carlo integration. However, standard methods do not allow the distribution of the particles to evolve during the algorithm, as is the case in sequential simulation methods. Within the standard adaptive importance sampling framework, a simple weighted least squares approach is proposed to improve the procedure with control variates. The procedure takes the form of a quadrature rule with adapted quadrature weights to reflect the information brought in by the control variates. The quadrature points and weights do not depend on the integrand, a computational advantage in case of multiple integrands. Moreover, the target density needs to be known only up to a multiplicative constant. Our main result is a non-asymptotic bound on the probabilistic error of the procedure. The bound proves that for improving the estimate's accuracy, the benefits from adaptive importance sampling and control variates can be combined. The good behavior of the method is illustrated empirically on synthetic examples and real-world data for Bayesian linear regression. △ Less

Submitted 5 October, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Journal ref: Advances in Neural Information Processing Systems (NeurIPS), 2022

arXiv:2105.11818 [pdf, other]

SGD with Coordinate Sampling: Theory and Practice

Authors: Rémi Leluc, François Portier

Abstract: While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, w… ▽ More While classical forms of stochastic gradient descent algorithm treat the different coordinates in the same way, a framework allowing for adaptive (non uniform) coordinate sampling is developed to leverage structure in data. In a non-convex setting and including zeroth order gradient estimate, almost sure convergence as well as non-asymptotic bounds are established. Within the proposed framework, we develop an algorithm, MUSKETEER, based on a reinforcement strategy: after collecting information on the noisy gradients, it samples the most promising coordinate (all for one); then it moves along the one direction yielding an important decrease of the objective (one for all). Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems. △ Less

Submitted 15 October, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Journal of Machine Learning Research 2022

arXiv:2008.07365 [pdf, other]

Feature Clustering for Support Identification in Extreme Regions

Authors: Hamid Jalalzai, Rémi Leluc

Abstract: Understanding the complex structure of multivariate extremes is a major challenge in various fields from portfolio monitoring and environmental risk management to insurance. In the framework of multivariate Extreme Value Theory, a common characterization of extremes' dependence structure is the angular measure. It is a suitable measure to work in extreme regions as it provides meaningful insights… ▽ More Understanding the complex structure of multivariate extremes is a major challenge in various fields from portfolio monitoring and environmental risk management to insurance. In the framework of multivariate Extreme Value Theory, a common characterization of extremes' dependence structure is the angular measure. It is a suitable measure to work in extreme regions as it provides meaningful insights concerning the subregions where extremes tend to concentrate their mass. The present paper develops a novel optimization-based approach to assess the dependence structure of extremes. This support identification scheme rewrites as estimating clusters of features which best capture the support of extremes. The dimension reduction technique we provide is applied to statistical learning tasks such as feature clustering and anomaly detection. Numerical experiments provide strong empirical evidence of the relevance of our approach. △ Less

Submitted 8 February, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

Showing 1–6 of 6 results for author: Leluc, R