Skip to main content

Showing 1–15 of 15 results for author: Sabourin, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.14826  [pdf, other

    stat.ML cs.LG

    Sharp error bounds for imbalanced classification: how many examples in the minority class?

    Authors: Anass Aghbalou, François Portier, Anne Sabourin

    Abstract: When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one cl… ▽ More

    Submitted 16 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  2. arXiv:2308.01023  [pdf, other

    math.ST math.FA stat.ML

    Regular Variation in Hilbert Spaces and Principal Component Analysis for Functional Extremes

    Authors: Stephan Clémençon, Nathan Huet, Anne Sabourin

    Abstract: Motivated by the increasing availability of data of functional nature, we develop a general probabilistic and statistical framework for extremes of regularly varying random elements $X$ in $L^2[0,1]$. We place ourselves in a Peaks-Over-Threshold framework where a functional extreme is defined as an observation $X$ whose $L^2$-norm $\|X\|$ is comparatively large. Our goal is to propose a dimension… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 29 pages (main paper), 5 pages (appendix)

  3. arXiv:2303.03084  [pdf, other

    stat.ML cs.LG math.ST

    On Regression in Extreme Regions

    Authors: Nathan Huet, Stephan Clémençon, Anne Sabourin

    Abstract: The statistical learning problem consists in building a predictive function $\hat{f}$ based on independent copies of $(X,Y)$ so that $Y$ is approximated by $\hat{f}(X)$ with minimum (squared) error. Motivated by various applications, special attention is paid here to the case of extreme (i.e. very large) observations $X$. Because of their rarity, the contributions of such observations to the (empi… ▽ More

    Submitted 10 April, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: 16 pages (main paper), 13 pages (appendix)

  4. arXiv:2108.01432  [pdf, other

    math.ST stat.ME

    Tail inverse regression for dimension reduction with extreme response

    Authors: Anass Aghbalou, François Portier, Anne Sabourin, Chen Zhou

    Abstract: We consider the problem of supervised dimension reduction with a particular focus on extreme values of the target $Y\in\mathbb{R}$ to be explained by a covariate vector $X \in \mathbb{R}^p$. The general purpose is to define and estimate a projection on a lower dimensional subspace of the covariate space which is sufficient for predicting exceedances of the target above high thresholds. We propose… ▽ More

    Submitted 24 February, 2023; v1 submitted 30 July, 2021; originally announced August 2021.

    Comments: main paper: 31 pages + supplementary material: 16 pages

    MSC Class: 62G32; 62H25; 62G08; 62G30

  5. arXiv:2104.03966  [pdf, other

    math.ST stat.ML

    Concentration bounds for the empirical angular measure with statistical learning applications

    Authors: Stéphan Clémençon, Hamid Jalalzai, Stéphane Lhaut, Anne Sabourin, Johan Segers

    Abstract: The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins. Its statistical recovery is an important step in learning problems involving observations far away from the center. In the common situation that the components of the vector have different distributions, t… ▽ More

    Submitted 17 October, 2022; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: 24 pages (main paper), 21 pages (supplement), 2 figures

    MSC Class: Primary 62G05; 62G30; 62G32; secondary 62H30

  6. arXiv:2003.11593  [pdf, other

    stat.ML cs.CL cs.LG

    Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

    Authors: Hamid Jalalzai, Pierre Colombo, Chloé Clavel, Eric Gaussier, Giovanna Varni, Emmanuel Vignon, Anne Sabourin

    Abstract: The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the… ▽ More

    Submitted 25 March, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS), Dec 2020

  7. arXiv:1907.07523  [pdf, other

    stat.ME stat.AP stat.ML

    A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

    Authors: Maël Chiapino, Stéphan Clémençon, Vincent Feuillard, Anne Sabourin

    Abstract: In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,. .. , X d) valued in R d , correspond to the simultaneous occurrence of extreme values for certain subgroups $α$ $\subset$ {1,. .. , d} of variables Xj. Under the heavy-tail assumption, which is precisely appropriate for modeling these pheno… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

  8. arXiv:1906.11043  [pdf, other

    math.ST stat.ME stat.ML

    Principal Component Analysis for Multivariate Extremes

    Authors: Holger Drees, Anne Sabourin

    Abstract: The first order behavior of multivariate heavy-tailed random vectors above large radial thresholds is ruled by a limit measure in a regular variation framework. For a high dimensional vector, a reasonable assumption is that the support of this measure is concentrated on a lower dimensional subspace, meaning that certain linear combinations of the components are much likelier to be large than other… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

  9. arXiv:1802.09977  [pdf, ps, other

    stat.ME

    Identifying groups of variables with the potential of being large simultaneously

    Authors: Maël Chiapino, Anne Sabourin, Johan Segers

    Abstract: Identifying groups of variables that may be large simultaneously amounts to finding out which joint tail dependence coefficients of a multivariate distribution are positive. The asymptotic distribution of a vector of nonparametric, rank-based estimators of these coefficients justifies a stop** criterion in an algorithm that searches the collection of all possible groups of variables in a systema… ▽ More

    Submitted 27 February, 2018; originally announced February 2018.

    Comments: 23 pages, 2 tables

    MSC Class: 62G32

  10. arXiv:1707.08820  [pdf, other

    stat.ML cs.LG

    Max K-armed bandit: On the ExtremeHunter algorithm and beyond

    Authors: Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

    Abstract: This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold. We first significantly refine the analysis of the ExtremeHunter algorithm carried out in Carpentier and Valko (2014), and next propose an alternative approach, showing that, remarkably, Extreme Bandits can be reduc… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

  11. arXiv:1603.09584  [pdf, other

    stat.ML

    Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Extremes play a special role in Anomaly Detection. Beyond inference and simulation purposes, probabilistic tools borrowed from Extreme Value Theory (EVT), such as the angular measure, can also be used to design novel statistical learning methods for Anomaly Detection/ranking. This paper proposes a new algorithm based on multivariate EVT to learn how to rank observations in a high dimensional space… ▽ More

    Submitted 31 March, 2016; originally announced March 2016.

    Comments: arXiv admin note: text overlap with arXiv:1507.05899

  12. arXiv:1507.05899  [pdf, other

    stat.ML

    Sparsity in Multivariate Extremes with Applications to Anomaly Detection

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e.g. portfolio monitoring, insurance, environmental risk management and anomaly detection. One convenient (non-parametric) characterization of extremal dependence in the framework of multivariate Extreme Value Theory (EVT) is the angu… ▽ More

    Submitted 14 March, 2016; v1 submitted 21 July, 2015; originally announced July 2015.

  13. arXiv:1502.01684  [pdf, other

    stat.ML math.PR

    On Anomaly Ranking and Excess-Mass Curves

    Authors: Nicolas Goix, Anne Sabourin, Stéphan Clémençon

    Abstract: Learning how to rank multivariate unlabeled observations depending on their degree of abnormality/novelty is a crucial problem in a wide range of applications. In practice, it generally consists in building a real valued "scoring" function on the feature space so as to quantify to which extent observations should be considered as abnormal. In the 1-d situation, measurements are generally considere… ▽ More

    Submitted 5 February, 2015; originally announced February 2015.

  14. arXiv:1412.0838  [pdf, other

    stat.ME stat.AP

    Semi-parametric modeling of excesses above high multivariate thresholds with censored data

    Authors: Anne Sabourin

    Abstract: How to include censored data in a statistical analysis is a recur-rent issue in statistics. In multivariate extremes, the dependence structure of large observations can be characterized in terms of a non parametric angular measure, while marginal excesses above asymptotically large thresholds have a parametric distribution. In this work, a flexible semi-parametric Dirichlet mix-ture model for angu… ▽ More

    Submitted 2 December, 2014; originally announced December 2014.

  15. arXiv:1411.7782  [pdf, other

    stat.AP stat.ME

    Combining regional estimation and historical floods: a multivariate semi-parametric peaks-over-threshold model with censored data

    Authors: Anne Sabourin, Benjamin Renard

    Abstract: The estimation of extreme flood quantiles is challenging due to the relative scarcity of extreme data compared to typical target return periods. Several approaches have been developed over the years to face this challenge, including regional estimation and the use of historical flood data. This paper investigates the combination of both approaches using a multivariate peaks-over-threshold model, t… ▽ More

    Submitted 28 November, 2014; originally announced November 2014.

    MSC Class: 86A05 (primary); 62P12 (secondary)