Skip to main content

Showing 1–50 of 57 results for author: Clémençon, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.06849  [pdf, other

    stat.ML cs.LG

    Flexible Parametric Inference for Space-Time Hawkes Processes

    Authors: Emilia Siviero, Guillaume Staerman, Stephan Clémençon, Thomas Moreau

    Abstract: Many modern spatio-temporal data sets, in sociology, epidemiology or seismology, for example, exhibit self-exciting characteristics, triggering and clustering behaviors both at the same time, that a suitable Hawkes space-time process can accurately capture. This paper aims to develop a fast and flexible parametric inference technique to recover the parameters of the kernel functions involved in th… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2403.07464  [pdf, other

    math.ST stat.ME stat.ML

    On Ranking-based Tests of Independence

    Authors: Myrto Limnios, Stéphan Clémençon

    Abstract: In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking. The rationale behind our approach relies on the fact that, the independence hypothesis… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2308.01023  [pdf, other

    math.ST math.FA stat.ML

    Regular Variation in Hilbert Spaces and Principal Component Analysis for Functional Extremes

    Authors: Stephan Clémençon, Nathan Huet, Anne Sabourin

    Abstract: Motivated by the increasing availability of data of functional nature, we develop a general probabilistic and statistical framework for extremes of regularly varying random elements $X$ in $L^2[0,1]$. We place ourselves in a Peaks-Over-Threshold framework where a functional extreme is defined as an observation $X$ whose $L^2$-norm $\|X\|$ is comparatively large. Our goal is to propose a dimension… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 29 pages (main paper), 5 pages (appendix)

  4. arXiv:2303.12878  [pdf, other

    cs.LG stat.ML

    Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues

    Authors: Morgane Goibert, Clément Calauzènes, Ekhine Irurozki, Stéphan Clémençon

    Abstract: As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed. Preference data, in the form of (complete) rankings in the simplest situations, are no exception and the demand for appropriate concepts and tools is all the more pressing given that technologies fed by or producing this type o… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

  5. arXiv:2303.03084  [pdf, other

    stat.ML cs.LG math.ST

    On Regression in Extreme Regions

    Authors: Nathan Huet, Stephan Clémençon, Anne Sabourin

    Abstract: The statistical learning problem consists in building a predictive function $\hat{f}$ based on independent copies of $(X,Y)$ so that $Y$ is approximated by $\hat{f}(X)$ with minimum (squared) error. Motivated by various applications, special attention is paid here to the case of extreme (i.e. very large) observations $X$. Because of their rarity, the contributions of such observations to the (empi… ▽ More

    Submitted 10 April, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 16 pages (main paper), 13 pages (appendix)

  6. arXiv:2211.07245  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Assessing Uncertainty in Similarity Scoring: Performance & Fairness in Face Recognition

    Authors: Jean-Rémy Conti, Stéphan Clémençon

    Abstract: The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function. In order to draw reliable conclusions based on empirical ROC analysis, accurately evaluating the uncertainty level related to statistical versions of the ROC curves of interest is absolutely necessary, especially for applications with considerable societal impac… ▽ More

    Submitted 20 February, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted to ICLR 2024

  7. arXiv:2211.00603  [pdf, other

    stat.ML cs.LG

    On Medians of (Randomized) Pairwise Means

    Authors: Pierre Laforgue, Stephan Clémençon, Patrice Bertail

    Abstract: Tournament procedures, recently introduced in Lugosi & Mendelson (2016), offer an appealing alternative, from a theoretical perspective at least, to the principle of Empirical Risk Minimization in machine learning. Statistical learning by Median-of-Means (MoM) basically consists in segmenting the training data into blocks of equal size and comparing the statistical performance of every pair of can… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  8. arXiv:2202.07365  [pdf, other

    stat.ML cs.LG

    A Statistical Learning View of Simple Kriging

    Authors: Emilia Siviero, Emilie Chautru, Stephan Clémençon

    Abstract: In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish.… ▽ More

    Submitted 2 February, 2024; v1 submitted 15 February, 2022; originally announced February 2022.

    Comments: 41 pages

  9. arXiv:2201.08105  [pdf, other

    cs.LG stat.ML

    Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications

    Authors: Morgane Goibert, Stéphan Clémençon, Ekhine Irurozki, Pavlo Mozharovskyi

    Abstract: The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $Σ$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $Σ$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, r… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

  10. arXiv:2201.06616  [pdf, other

    stat.ML cs.LG

    Improving the quality control of seismic data through active learning

    Authors: Mathieu Chambefort, Raphaël Butez, Emilie Chautru, Stephan Clémençon

    Abstract: In image denoising problems, the increasing density of available images makes an exhaustive visual inspection impossible and therefore automated methods based on machine-learning must be deployed for this purpose. This is particulary the case in seismic signal processing. Engineers/geophysicists have to deal with millions of seismic time series. Finding the sub-surface properties useful for the oi… ▽ More

    Submitted 20 January, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Comments: 10 pages

  11. arXiv:2201.05115  [pdf, other

    stat.ML cs.LG

    Functional Anomaly Detection: a Benchmark Study

    Authors: Guillaume Staerman, Eric Adjakossa, Pavlo Mozharovskyi, Vera Hofer, Jayant Sen Gupta, Stephan Clémençon

    Abstract: The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phen… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

  12. arXiv:2109.09590  [pdf, other

    math.ST stat.ML

    Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

    Authors: Myrto Limnios, Nathan Noiry, Stéphan Clémençon

    Abstract: The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  13. arXiv:2109.02357  [pdf, other

    cs.CV cs.CY cs.LG stat.ML

    Fighting Selection Bias in Statistical Learning: Application to Visual Recognition from Biased Image Databases

    Authors: Stephan Clémençon, Pierre Laforgue, Robin Vogel

    Abstract: In practice, and especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven performances on different population segments has highlighted the representativeness issues induced by a naive aggregation of the datasets. In this paper, we show how bi… ▽ More

    Submitted 1 November, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

  14. arXiv:2107.12825  [pdf

    cs.LG stat.ML

    Individual Survival Curves with Conditional Normalizing Flows

    Authors: Guillaume Ausset, Tom Ciffreo, Francois Portier, Stephan Clémençon, Timothée Papin

    Abstract: Survival analysis, or time-to-event modelling, is a classical statistical problem that has garnered a lot of interest for its practical use in epidemiology, demographics or actuarial sciences. Recent advances on the subject from the point of view of machine learning have been concerned with precise per-individual predictions instead of population studies, driven by the rise of individualized medic… ▽ More

    Submitted 27 July, 2021; originally announced July 2021.

    Comments: IEEE DSAA '21

  15. arXiv:2106.11068  [pdf, other

    stat.ML cs.LG

    Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon

    Abstract: Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey7… ▽ More

    Submitted 4 February, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

  16. arXiv:2104.03966  [pdf, other

    math.ST stat.ML

    Concentration bounds for the empirical angular measure with statistical learning applications

    Authors: Stéphan Clémençon, Hamid Jalalzai, Stéphane Lhaut, Anne Sabourin, Johan Segers

    Abstract: The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, t… ▽ More

    Submitted 17 October, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: 24 pages (main paper), 21 pages (supplement), 2 figures

    MSC Class: Primary 62G05; 62G30; 62G32; secondary 62H30

  17. arXiv:2104.02943  [pdf, other

    math.ST stat.ML

    Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

    Authors: Stéphan Clémençon, Myrto Limnios, Nicolas Vayatis

    Abstract: The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring/ranking applications such as the AUC, th… ▽ More

    Submitted 24 January, 2023; v1 submitted 7 April, 2021; originally announced April 2021.

    Journal ref: Electronic Journal of Statistics , Shaker Heights, OH : Institute of Mathematical Statistics, 2021, 15 (2), pp.4659 -- 4717

  18. arXiv:2103.12711  [pdf, other

    stat.ML cs.LG

    A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stéphan Clémençon, Florence d'Alché-Buc

    Abstract: The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametri… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

  19. arXiv:2006.15043  [pdf, other

    cs.LG stat.ML

    Nearest Neighbour Based Estimates of Gradients: Sharp Nonasymptotic Bounds and Applications

    Authors: Guillaume Ausset, Stephan Clémençon, François Portier

    Abstract: Motivated by a wide variety of applications, ranging from stochastic optimization to dimension reduction through variable selection, the problem of estimating gradients accurately is of crucial importance in statistics and learning theory. We consider here the classic regression setup, where a real valued square integrable r.v. $Y$ is to be predicted upon observing a (possibly high dimensional) ra… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  20. arXiv:2006.05240  [pdf, other

    stat.ML cs.LG

    Generalization Bounds in the Presence of Outliers: a Median-of-Means Study

    Authors: Pierre Laforgue, Guillaume Staerman, Stephan Clémençon

    Abstract: In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $θ$ of a square integrable r.v. $Z$, around which accurate nonasymptotic confidence bounds can be built, even when $Z$ does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to desig… ▽ More

    Submitted 7 February, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  21. arXiv:2002.09420  [pdf, other

    stat.ML cs.LG

    A Multiclass Classification Approach to Label Ranking

    Authors: Stephan Clémençon, Robin Vogel

    Abstract: In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $\mathcal{Y}=\{1,\; \ldots,\; K \}$ with $K\geq 3$, based upon observing a r.v. $X$, taking its values in $\mathbb{R}^q$ with $q\geq 1$ say, by means of a classification rule $g:\mathbb{R}^q\to \mathcal{Y}$ with minimum probability of error $\mathbb{P}\{Y\neq g(X) \}$. However, in a wide variety of situ… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: 15 pages, 6 figures

  22. arXiv:2002.08159  [pdf, other

    stat.ML cs.LG

    Learning Fair Scoring Functions: Bipartite Ranking under ROC-based Fairness Constraints

    Authors: Robin Vogel, Aurélien Bellet, Stephan Clémençon

    Abstract: Many applications of AI involve scoring individuals using a learned function of their attributes. These predictive risk scores are then used to take decisions based on whether the score exceeds a certain threshold, which may vary depending on the context. The level of delegation granted to such systems in critical applications like credit lending and medical diagnosis will heavily depend on how qu… ▽ More

    Submitted 25 February, 2021; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 35 pages, 13 figures, 6 tables

  23. arXiv:2002.05145  [pdf, other

    stat.ML cs.LG

    Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

    Authors: Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier

    Abstract: We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as $P$ and dominates it. In the unrealistic case where the likelihood ratio $Φ(z)=dP/dP'(z)$ is known, one may… ▽ More

    Submitted 19 February, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: 20 pages, 7 tables and figures

  24. arXiv:1910.04085  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon

    Abstract: With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored wit… ▽ More

    Submitted 13 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

  25. arXiv:1907.07523  [pdf, other

    stat.ME stat.AP stat.ML

    A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

    Authors: Maël Chiapino, Stéphan Clémençon, Vincent Feuillard, Anne Sabourin

    Abstract: In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,. .. , X d) valued in R d , correspond to the simultaneous occurrence of extreme values for certain subgroups $α$ $\subset$ {1,. .. , d} of variables Xj. Under the heavy-tail assumption, which is precisely appropriate for modeling these pheno… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

  26. arXiv:1906.12304  [pdf, other

    stat.ML cs.LG

    Statistical Learning from Biased Training Samples

    Authors: Stephan Clémençon, Pierre Laforgue

    Abstract: With the deluge of digitized information in the Big Data era, massive datasets are becoming increasingly available for learning predictive models. However, in many practical situations, the poor control of the data acquisition processes may naturally jeopardize the outputs of machine learning algorithms, and selection bias issues are now the subject of much attention in the literature. The present… ▽ More

    Submitted 1 November, 2022; v1 submitted 28 June, 2019; originally announced June 2019.

  27. On Tree-based Methods for Similarity Learning

    Authors: Stéphan Clémençon, Robin Vogel

    Abstract: In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods. Building automatically such measures is the specific purpose of metric/similarity learning. In Vogel et al. (2018), similarity learning is formulated as a pairwise bipartite ranking problem: ideally, the larger the probability that two… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: 17 pages, 4 figures

  28. arXiv:1906.09234  [pdf, other

    stat.ML cs.LG

    Trade-offs in Large-Scale Distributed Tuplewise Estimation and Learning

    Authors: Robin Vogel, Aurélien Bellet, Stephan Clémençon, Ons Jelassi, Guillaume Papa

    Abstract: The development of cluster computing frameworks has allowed practitioners to scale out various statistical estimation and machine learning algorithms with minimal programming effort. This is especially true for machine learning problems whose objective function is nicely separable across individual data points, such as classification and regression. In contrast, statistical learning tasks involvin… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: 23 pages, 6 figures, ECML 2019

  29. arXiv:1906.01908  [pdf, other

    cs.LG math.ST stat.ML

    Empirical Risk Minimization under Random Censorship: Theory and Practice

    Authors: Guillaume Ausset, Stéphan Clémençon, François Portier

    Abstract: We consider the classic supervised learning problem, where a continuous non-negative random label $Y$ (i.e. a random duration) is to be predicted based upon observing a random vector $X$ valued in $\mathbb{R}^d$ with $d\geq 1$ by means of a regression rule with minimum least square error. In various applications, ranging from industrial quality control to public health through credit risk analysis… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Submitted to JMLR. 18 pages + Appendix

  30. arXiv:1904.04573  [pdf, other

    stat.ML cs.LG

    Functional Isolation Forest

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon, Florence d'Alché-Buc

    Abstract: For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional n… ▽ More

    Submitted 9 October, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

  31. arXiv:1810.06291  [pdf, other

    stat.ML cs.LG

    Dimensionality Reduction and (Bucket) Ranking: a Mass Transportation Approach

    Authors: Mastane Achab, Anna Korba, Stephan Clémençon

    Abstract: Whereas most dimensionality reduction techniques (e.g. PCA, ICA, NMF) for multivariate data essentially rely on linear algebra to a certain extent, summarizing ranking data, viewed as realizations of a random permutation $Σ$ on a set of items indexed by $i\in \{1,\ldots,\; n\}$, is a great statistical challenge, due to the absence of vector space structure for the set of permutations… ▽ More

    Submitted 30 August, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

  32. arXiv:1807.06981  [pdf, other

    stat.ML cs.AI cs.LG

    A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

    Authors: Robin Vogel, Aurélien Bellet, Stéphan Clémençon

    Abstract: The performance of many machine learning techniques depends on the choice of an appropriate similarity or distance measure on the input space. Similarity learning (or metric learning) aims at building such a measure from training data so that observations with the same (resp. different) label are as close (resp. far) as possible. In this paper, similarity learning is investigated from the perspect… ▽ More

    Submitted 18 July, 2018; originally announced July 2018.

    Comments: 8 pages main paper, 22 pages with appendices, proceedings of ICML 2018

    Journal ref: PMLR 80 (2018) 5062-5071

  33. arXiv:1805.11028  [pdf, other

    stat.ML cs.LG

    Autoencoding any Data through Kernel Autoencoders

    Authors: Pierre Laforgue, Stephan Clémençon, Florence d'Alché-Buc

    Abstract: This paper investigates a novel algorithmic approach to data representation based on kernel methods. Assuming that the observations lie in a Hilbert space X, the introduced Kernel Autoencoder (KAE) is the composition of map**s from vector-valued Reproducing Kernel Hilbert Spaces (vv-RKHSs) that minimizes the expected reconstruction error. Beyond a first extension of the autoencoding scheme to po… ▽ More

    Submitted 2 December, 2020; v1 submitted 28 May, 2018; originally announced May 2018.

  34. arXiv:1805.02908  [pdf, other

    stat.ML cs.LG

    Profitable Bandits

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier

    Abstract: Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K possible actions. For each action chosen, she then receives the sum of a random number of rewards. Her objective is to maximize her cumulated earnings. We adapt and study three well-known strategie… ▽ More

    Submitted 8 May, 2018; originally announced May 2018.

  35. arXiv:1801.05772  [pdf, other

    stat.ML

    Ranking Data with Continuous Labels through Oriented Recursive Partitions

    Authors: Stephan Clémençon, Mastane Achab

    Abstract: We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space $\mathcal{X}$ and the goal is to order all possible observations x in $\mathcal{X}$ by means of a scoring function $s:\mathcal{X}\rightarrow \mathbb{R}$ so that s(X) and Y tend to increase or decrease together… ▽ More

    Submitted 17 January, 2018; originally announced January 2018.

  36. arXiv:1711.00070  [pdf, other

    math.ST stat.ML

    Ranking Median Regression: Learning to Order through Local Consensus

    Authors: Stephan Clémençon, Anna Korba, Eric Sibony

    Abstract: This article is devoted to the problem of predicting the value taken by a random permutation $Σ$, describing the preferences of an individual over a set of numbered items $\{1,\; \ldots,\; n\}$ say, based on the observation of an input/explanatory r.v. $X$ e.g. characteristics of the individual), when error is measured by the Kendall $τ$ distance. In the probabilistic formulation of the 'Learning… ▽ More

    Submitted 18 December, 2017; v1 submitted 31 October, 2017; originally announced November 2017.

  37. arXiv:1707.08820  [pdf, other

    stat.ML cs.LG

    Max K-armed bandit: On the ExtremeHunter algorithm and beyond

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

    Abstract: This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and next propose an alternative approach, showing that, remarkably, Extreme Bandits can be reduc… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

  38. arXiv:1705.01305  [pdf, other

    stat.ML

    Mass Volume Curves and Anomaly Ranking

    Authors: Stephan Clémençon, Albert Thomas

    Abstract: This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more `abnormal' as they are located far in the tail(s) of the underlying probability… ▽ More

    Submitted 3 September, 2018; v1 submitted 3 May, 2017; originally announced May 2017.

  39. arXiv:1606.02421  [pdf, other

    stat.ML cs.AI cs.DC cs.LG eess.SY

    Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions

    Authors: Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

    Abstract: In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function, for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the no… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

  40. arXiv:1603.09584  [pdf, other

    stat.ML

    Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Extremes play a special role in Anomaly Detection. Beyond inference and simulation purposes, probabilistic tools borrowed from Extreme Value Theory (EVT), such as the angular measure, can also be used to design novel statistical learning methods for Anomaly Detection/ranking. This paper proposes a new algorithm based on multivariate EVT to learn how to rank observations in a high dimensional space… ▽ More

    Submitted 31 March, 2016; originally announced March 2016.

    Comments: arXiv admin note: text overlap with arXiv:1507.05899

  41. arXiv:1511.05464  [pdf, other

    stat.ML cs.DC cs.LG eess.SY stat.CO

    Extending Gossip Algorithms to Distributed Estimation of U-Statistics

    Authors: Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

    Abstract: Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems. Whereas distributed estimation of sample mean statistics has been the subject of a good deal of attention, computation of $U$-statistics, relying on more expensive averaging over pairs of observations, is a less investigated area. Yet, such data functionals are essential to describe… ▽ More

    Submitted 17 November, 2015; originally announced November 2015.

    Comments: to be presented at NIPS 2015

    MSC Class: 68Uxx; 62J15; 68Q32; 62-04;

  42. arXiv:1508.06091  [pdf, ps, other

    stat.ML cs.LG

    AUC Optimisation and Collaborative Filtering

    Authors: Charanpal Dhanjal, Romaric Gaudel, Stephan Clemencon

    Abstract: In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list ar… ▽ More

    Submitted 25 August, 2015; originally announced August 2015.

  43. arXiv:1507.05899  [pdf, other

    stat.ML

    Sparsity in Multivariate Extremes with Applications to Anomaly Detection

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e.g. portfolio monitoring, insurance, environmental risk management and anomaly detection. One convenient (non-parametric) characterization of extremal dependence in the framework of multivariate Extreme Value Theory (EVT) is the angu… ▽ More

    Submitted 14 March, 2016; v1 submitted 21 July, 2015; originally announced July 2015.

  44. arXiv:1502.01684  [pdf, other

    stat.ML math.PR

    On Anomaly Ranking and Excess-Mass Curves

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Learning how to rank multivariate unlabeled observations depending on their degree of abnormality/novelty is a crucial problem in a wide range of applications. In practice, it generally consists in building a real valued "scoring" function on the feature space so as to quantify to which extent observations should be considered as abnormal. In the 1-d situation, measurements are generally considere… ▽ More

    Submitted 5 February, 2015; originally announced February 2015.

  45. arXiv:1501.02629  [pdf, other

    stat.ML cs.AI cs.LG

    Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

    Authors: Stéphan Clémençon, Aurélien Bellet, Igor Colin

    Abstract: In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i.e. functionals of the training data with low variance that take the form of averages over $k$-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sampl… ▽ More

    Submitted 19 April, 2016; v1 submitted 12 January, 2015; originally announced January 2015.

    Comments: To appear in Journal of Machine Learning Research. 34 pages. v2: minor correction to Theorem 4 and its proof, added 1 reference. v3: typo corrected in Proposition 3. v4: improved presentation, added experiments on model selection for clustering, fixed minor typos

    Journal ref: Journal of Machine Learning Research 17(76):1-36, 2016

  46. arXiv:1501.02218  [pdf, other

    stat.ML

    Survey schemes for stochastic gradient descent with applications to M-estimation

    Authors: Stéphan Clémençon, Patrice Bertail, Emilie Chautru, Guillaume Papa

    Abstract: In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible. A natural approach in this context consists in using survey schemes and substituting the "full data" statistics with their counterparts based on the resulting random samples, of manageab… ▽ More

    Submitted 9 January, 2015; originally announced January 2015.

    Comments: 31 pages

  47. arXiv:1402.1054  [pdf, ps, other

    stat.AP

    On Recent Advances in Supervised Ranking for Metabolite Profiling

    Authors: Charanpal Dhanjal, Stéphan Clémençon

    Abstract: This paper focuses on data arising from the field of metabolomics, a rapidly develo** area concerned by the analysis of the chemical fingerprints (i.e. the metabolite profile). The metabolite profile is left by specific chemical processes occurring in biological cells, tissues or organs. It is the main purpose of this article to develop and implement scoring techniques so as to rank all possible… ▽ More

    Submitted 5 February, 2014; originally announced February 2014.

  48. arXiv:1401.6449  [pdf, other

    stat.AP cs.SI

    A statistical network analysis of the HIV/AIDS epidemics in Cuba

    Authors: Stéphan Clémençon, Hector De Arazoza, Fabrice Rossi, Viet Chi Tran

    Abstract: The Cuban contact-tracing detection system set up in 1986 allowed the reconstruction and analysis of the sexual network underlying the epidemic (5,389 vertices and 4,073 edges, giant component of 2,386 nodes and 3,168 edges), shedding light onto the spread of HIV and the role of contact-tracing. Clustering based on modularity optimization provides a better visualization and understanding of the ne… ▽ More

    Submitted 22 May, 2015; v1 submitted 24 January, 2014; originally announced January 2014.

  49. arXiv:1401.2451  [pdf, ps, other

    stat.ML

    Online Matrix Completion Through Nuclear Norm Regularisation

    Authors: Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon

    Abstract: It is the main goal of this paper to propose a novel method to perform matrix completion on-line. Motivated by a wide variety of applications, ranging from the design of recommender systems to sensor network localization through seismic data reconstruction, we consider the matrix completion problem when entries of the matrix of interest are observed gradually. Precisely, we place ourselves in the… ▽ More

    Submitted 10 January, 2014; originally announced January 2014.

    Comments: Corrected a typo in the affiliation

  50. arXiv:1312.5066  [pdf, other

    stat.ML

    Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

    Authors: Stéphan Clémençon, Marine Depecker

    Abstract: It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA). Given a training set of independent realizations of a (possibly sampled) second-order random function with a (locally) smooth autocorrelation structure and to which a binary label is randomly assigned, the objective is to learn a scoring function s with optimal ROC cur… ▽ More

    Submitted 18 December, 2013; originally announced December 2013.