-
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
Authors:
Jayneel Parekh,
Quentin Bouniot,
Pavlo Mozharovskyi,
Alasdair Newson,
Florence d'Alché-Buc
Abstract:
Develo** inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limita…
▽ More
Develo** inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limitations, specially for large-scale images. We propose here a novel method that relies on map** the concept features to the latent space of a pretrained generative model. The use of a generative model enables high quality visualization, and naturally lays out an intuitive and interactive procedure for better interpretation of the learnt concepts. Furthermore, leveraging pretrained generative models has the additional advantage of making the training of the system more efficient. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images. Project page available at https://jayneelparekh.github.io/VisCoIN_project_page/
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Conformal uncertainty quantification using kernel depth measures in separable Hilbert spaces
Authors:
Marcos Matabuena,
Rahul Ghosal,
Pavlo Mozharovskyi,
Oscar Hernan Madrid Padilla,
Jukka-Pekka Onnela
Abstract:
Depth measures have gained popularity in the statistical literature for defining level sets in complex data structures like multivariate data, functional data, and graphs. Despite their versatility, integrating depth measures into regression modeling for establishing prediction regions remains underexplored. To address this gap, we propose a novel method utilizing a model-free uncertainty quantifi…
▽ More
Depth measures have gained popularity in the statistical literature for defining level sets in complex data structures like multivariate data, functional data, and graphs. Despite their versatility, integrating depth measures into regression modeling for establishing prediction regions remains underexplored. To address this gap, we propose a novel method utilizing a model-free uncertainty quantification algorithm based on conditional depth measures and conditional kernel mean embeddings. This enables the creation of tailored prediction and tolerance regions in regression models handling complex statistical responses and predictors in separable Hilbert spaces. Our focus in this paper is exclusively on examples where the response is a functional data object. To enhance practicality, we introduce a conformal prediction algorithm, providing non-asymptotic guarantees in the derived prediction region. Additionally, we establish both conditional and unconditional consistency results and fast convergence rates in some special homoscedastic cases. We evaluate the model finite sample performance in extensive simulation studies with different function objects as probability distributions and functional data. Finally, we apply the approach in a digital health application related to physical activity, aiming to offer personalized recommendations in the US. population based on individuals' characteristics.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Anomaly component analysis
Authors:
Romain Valla,
Pavlo Mozharovskyi,
Florence d'Alché-Buc
Abstract:
At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry a…
▽ More
At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry and science. While a substantial body of literature is devoted to detection of anomalies, little attention is payed to their explanation. This is the case mostly due to intrinsically non-supervised nature of the task and non-robustness of the exploratory methods like principal component analysis (PCA).
We introduce a new statistical tool dedicated for exploratory analysis of abnormal observations using data depth as a score. Anomaly component analysis (shortly ACA) is a method that searches a low-dimensional data representation that best visualises and explains anomalies. This low-dimensional representation not only allows to distinguish groups of anomalies better than the methods of the state of the art, but as well provides a -- linear in variables and thus easily interpretable -- explanation for anomalies. In a comparative simulation and real-data study, ACA also proves advantageous for anomaly analysis with respect to methods present in the literature.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Fast kernel half-space depth for data with non-convex supports
Authors:
Arturo Castellanos,
Pavlo Mozharovskyi,
Florence d'Alché-Buc,
Hicham Janati
Abstract:
Data depth is a statistical function that generalizes order and quantiles to the multivariate setting and beyond, with applications spanning over descriptive and visual statistics, anomaly detection, testing, etc. The celebrated halfspace depth exploits data geometry via an optimization program to deliver properties of invariances, robustness, and non-parametricity. Nevertheless, it implicitly ass…
▽ More
Data depth is a statistical function that generalizes order and quantiles to the multivariate setting and beyond, with applications spanning over descriptive and visual statistics, anomaly detection, testing, etc. The celebrated halfspace depth exploits data geometry via an optimization program to deliver properties of invariances, robustness, and non-parametricity. Nevertheless, it implicitly assumes convex data supports and requires exponential computational cost. To tackle distribution's multimodality, we extend the halfspace depth in a Reproducing Kernel Hilbert Space (RKHS). We show that the obtained depth is intuitive and establish its consistency with provable concentration bounds that allow for homogeneity testing. The proposed depth can be computed using manifold gradient making faster than halfspace depth by several orders of magnitude. The performance of our depth is demonstrated through numerical simulations as well as applications such as anomaly detection on real data and homogeneity testing.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Towards On-device Learning on the Edge: Ways to Select Neurons to Update under a Budget Constraint
Authors:
Aël Quélennec,
Enzo Tartaglione,
Pavlo Mozharovskyi,
Van-Tam Nguyen
Abstract:
In the realm of efficient on-device learning under extreme memory and computation constraints, a significant gap in successful approaches persists. Although considerable effort has been devoted to efficient inference, the main obstacle to efficient learning is the prohibitive cost of backpropagation. The resources required to compute gradients and update network parameters often exceed the limits…
▽ More
In the realm of efficient on-device learning under extreme memory and computation constraints, a significant gap in successful approaches persists. Although considerable effort has been devoted to efficient inference, the main obstacle to efficient learning is the prohibitive cost of backpropagation. The resources required to compute gradients and update network parameters often exceed the limits of tightly constrained memory budgets. This paper challenges conventional wisdom and proposes a series of experiments that reveal the existence of superior sub-networks. Furthermore, we hint at the potential for substantial gains through a dynamic neuron selection strategy when fine-tuning a target task. Our efforts extend to the adaptation of a recent dynamic neuron selection strategy pioneered by Bragagnolo et al. (NEq), revealing its effectiveness in the most stringent scenarios. Our experiments demonstrate, in the average case, the superiority of a NEq-inspired approach over a random selection. This observation prompts a compelling avenue for further exploration in the area, highlighting the opportunity to design a new class of algorithms designed to facilitate parameter update selection. Our findings usher in a new era of possibilities in the field of on-device learning under extreme constraints and encourage the pursuit of innovative strategies for efficient, resource-friendly model fine-tuning.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Tailoring Mixup to Data for Calibration
Authors:
Quentin Bouniot,
Pavlo Mozharovskyi,
Florence d'Alché-Buc
Abstract:
Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic la…
▽ More
Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic labels assigned and the true label distributions, which can deteriorate calibration. In this work, we argue that the likelihood of manifold intrusion increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves performance and calibration of models, while being much more efficient. The code for our work is available at https://github.com/qbouniot/sim_kernel_mixup.
△ Less
Submitted 11 June, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization
Authors:
Jayneel Parekh,
Sanjeel Parekh,
Pavlo Mozharovskyi,
Gaël Richard,
Florence d'Alché-Buc
Abstract:
This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a n…
▽ More
This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, an interpreter is trained to generate a regularized intermediate embedding from hidden layers of a target network, learnt as time-activations of a pre-learnt NMF dictionary. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on a variety of classification tasks, including multi-label data for real-world audio and music.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Optimized preprocessing and Tiny ML for Attention State Classification
Authors:
Yinghao Wang,
Rémi Nahon,
Enzo Tartaglione,
Pavlo Mozharovskyi,
Van-Tam Nguyen
Abstract:
In this paper, we present a new approach to mental state classification from EEG signals by combining signal processing techniques and machine learning (ML) algorithms. We evaluate the performance of the proposed method on a dataset of EEG recordings collected during a cognitive load task and compared it to other state-of-the-art methods. The results show that the proposed method achieves high acc…
▽ More
In this paper, we present a new approach to mental state classification from EEG signals by combining signal processing techniques and machine learning (ML) algorithms. We evaluate the performance of the proposed method on a dataset of EEG recordings collected during a cognitive load task and compared it to other state-of-the-art methods. The results show that the proposed method achieves high accuracy in classifying mental states and outperforms state-of-the-art methods in terms of classification accuracy and computational efficiency.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Anomaly detection using data depth: multivariate case
Authors:
Pavlo Mozharovskyi
Abstract:
Anomaly detection is a branch of machine learning and data analysis which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any bran…
▽ More
Anomaly detection is a branch of machine learning and data analysis which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any branch of industry and science. By providing a robust ordering, data depth -- statistical function that measures belongingness of any point of the space to a data set -- becomes a particularly useful tool for detection of anomalies. Already known for its theoretical properties, data depth has undergone substantial computational developments in the last decade and particularly recent years, which has made it applicable for contemporary-sized problems of data analysis and machine learning.
In this article, data depth is studied as an efficient anomaly detection tool, assigning abnormality labels to observations with lower depth values, in a multivariate setting. Practical questions of necessity and reasonability of invariances and shape of the depth function, its robustness and computational complexity, choice of the threshold are discussed. Illustrations include use-cases that underline advantageous behaviour of data depth in various settings.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Statistical process monitoring of artificial neural networks
Authors:
Anna Malinovskaya,
Pavlo Mozharovskyi,
Philipp Otto
Abstract:
The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during t…
▽ More
The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the ANN provides accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose considering the latent feature representation of the data (called "embedding") generated by the ANN to determine the time when the data stream starts being nonstationary. In particular, we monitor embeddings by applying multivariate control charts based on the data depth calculation and normalized ranks. The performance of the introduced method is compared with benchmark approaches for various ANN architectures and different underlying data formats.
△ Less
Submitted 27 July, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
On exact computation of Tukey depth central regions
Authors:
Vít Fojtík,
Petra Laketa,
Pavlo Mozharovskyi,
Stanislav Nagy
Abstract:
The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the $d$-dimensional space whose Tukey depth exceeds given thresholds $k$. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm A…
▽ More
The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the $d$-dimensional space whose Tukey depth exceeds given thresholds $k$. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm A from Liu et al. (2019), and prove that it yields exact results in dimension $d=2$, or for a low threshold $k$ in arbitrary dimension. We provide examples where Algorithm A fails to recover the exact Tukey depth region for $d>2$, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion.
△ Less
Submitted 31 October, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF
Authors:
Jayneel Parekh,
Sanjeel Parekh,
Pavlo Mozharovskyi,
Florence d'Alché-Buc,
Gaël Richard
Abstract:
This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden la…
▽ More
This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on popular benchmarks, including a real-world multi-label classification task.
△ Less
Submitted 24 October, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications
Authors:
Morgane Goibert,
Stéphan Clémençon,
Ekhine Irurozki,
Pavlo Mozharovskyi
Abstract:
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $Σ$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $Σ$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, r…
▽ More
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $Σ$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $Σ$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities for the analysis of ranking data by means of a metric-based notion of depth function on the symmetric group. Overcoming the absence of vector space structure on $\mathfrak{S}_n$, the latter defines a center-outward ordering of the permutations in the support of $P$ and extends the classic metric-based formulation of consensus ranking (medians corresponding then to the deepest permutations). The axiomatic properties that ranking depths should ideally possess are listed, while computational and generalization issues are studied at length. Beyond the theoretical analysis carried out, the relevance of the novel concepts and methods introduced for a wide variety of statistical tasks are also supported by numerous numerical experiments.
△ Less
Submitted 20 January, 2022;
originally announced January 2022.
-
Functional Anomaly Detection: a Benchmark Study
Authors:
Guillaume Staerman,
Eric Adjakossa,
Pavlo Mozharovskyi,
Vera Hofer,
Jayant Sen Gupta,
Stephan Clémençon
Abstract:
The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phen…
▽ More
The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phenomenon under surveillance. In order to exploit fully the information thus collected, the observations cannot be treated as multivariate data anymore and a functional analysis approach is required. It is the purpose of this paper to investigate the performance of recent techniques for anomaly detection in the functional setup on real datasets. After an overview of the state-of-the-art and a visual-descriptive study, a variety of anomaly detection methods are compared. While taxonomies of abnormalities (e.g. shape, location) in the functional setup are documented in the literature, assigning a specific type to the identified anomalies appears to be a challenging task. Thus, strengths and weaknesses of the existing approaches are benchmarked in view of these highlighted types in a simulation study. Anomaly detection methods are next evaluated on two datasets, related to the monitoring of helicopters in flight and to the spectrometry of construction materials namely. The benchmark analysis is concluded by recommendation guidance for practitioners.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stéphan Clémençon
Abstract:
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey7…
▽ More
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the \textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in \cite{IRW}, modified in order to satisfy the property of \textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by \cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $x\in \mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.
△ Less
Submitted 4 February, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Pierre Colombo,
Stéphan Clémençon,
Florence d'Alché-Buc
Abstract:
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametri…
▽ More
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a data set. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets -- the depth-trimmed regions -- give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions w.r.t. each distribution. Its good behavior w.r.t. major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the data set and its dimension. The quality of this approximation as well as the performance of the proposed approach are illustrated in numerical experiments.
△ Less
Submitted 10 October, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Distributionally robust halfspace depth
Authors:
Jevgenijs Ivanovs,
Pavlo Mozharovskyi
Abstract:
Tukey's halfspace depth can be seen as a stochastic program and as such it is not guarded against optimizer's curse, so that a limited training sample may easily result in a poor out-of-sample performance. We propose a generalized halfspace depth concept relying on the recent advances in distributionally robust optimization, where every halfspace is examined using the respective worst-case distrib…
▽ More
Tukey's halfspace depth can be seen as a stochastic program and as such it is not guarded against optimizer's curse, so that a limited training sample may easily result in a poor out-of-sample performance. We propose a generalized halfspace depth concept relying on the recent advances in distributionally robust optimization, where every halfspace is examined using the respective worst-case distribution in the Wasserstein ball of radius $δ\geq 0$ centered at the empirical law. This new depth can be seen as a smoothed and regularized classical halfspace depth which is retrieved as $δ\downarrow 0$. It inherits most of the main properties of the latter and, additionally, enjoys various new attractive features such as continuity and strict positivity beyond the convex hull of the support. We provide numerical illustrations of the new depth and its advantages, and develop some fundamental theory. In particular, we study the upper level sets and the median region including their breakdown properties.
△ Less
Submitted 10 May, 2024; v1 submitted 3 January, 2021;
originally announced January 2021.
-
A Framework to Learn with Interpretation
Authors:
Jayneel Parekh,
Pavlo Mozharovskyi,
Florence d'Alché-Buc
Abstract:
To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen r…
▽ More
To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.
△ Less
Submitted 23 February, 2022; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Approximate computation of projection depths
Authors:
Rainer Dyckerhoff,
Pavlo Mozharovskyi,
Stanislav Nagy
Abstract:
Data depth is a concept in multivariate statistics that measures the centrality of a point in a given data cloud in $\IR^d$. If the depth of a point can be represented as the minimum of the depths with respect to all one-dimensional projections of the data, then the depth satisfies the so-called projection property. Such depths form an important class that includes many of the depths that have bee…
▽ More
Data depth is a concept in multivariate statistics that measures the centrality of a point in a given data cloud in $\IR^d$. If the depth of a point can be represented as the minimum of the depths with respect to all one-dimensional projections of the data, then the depth satisfies the so-called projection property. Such depths form an important class that includes many of the depths that have been proposed in literature. For depths that satisfy the projection property an approximate algorithm can easily be constructed since taking the minimum of the depths with respect to only a finite number of one-dimensional projections yields an upper bound for the depth with respect to the multivariate data. Such an algorithm is particularly useful if no exact algorithm exists or if the exact algorithm has a high computational complexity, as is the case with the halfspace depth or the projection depth. To compute these depths in high dimensions, the use of an approximate algorithm with better complexity is surely preferable. Instead of focusing on a single method we provide a comprehensive and fair comparison of several methods, both already described in the literature and original.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
When OT meets MoM: Robust estimation of Wasserstein Distance
Authors:
Guillaume Staerman,
Pierre Laforgue,
Pavlo Mozharovskyi,
Florence d'Alché-Buc
Abstract:
Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to lev…
▽ More
Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
△ Less
Submitted 18 February, 2022; v1 submitted 18 June, 2020;
originally announced June 2020.
-
Choosing among notions of multivariate depth statistics
Authors:
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, con…
▽ More
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.
△ Less
Submitted 5 May, 2021; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Flexible and Context-Specific AI Explainability: A Multidisciplinary Approach
Authors:
Valérie Beaudouin,
Isabelle Bloch,
David Bounie,
Stéphan Clémençon,
Florence d'Alché-Buc,
James Eagan,
Winston Maxwell,
Pavlo Mozharovskyi,
Jayneel Parekh
Abstract:
The recent enthusiasm for artificial intelligence (AI) is due principally to advances in deep learning. Deep learning methods are remarkably accurate, but also opaque, which limits their potential use in safety-critical applications. To achieve trust and accountability, designers and operators of machine learning algorithms must be able to explain the inner workings, the results and the causes of…
▽ More
The recent enthusiasm for artificial intelligence (AI) is due principally to advances in deep learning. Deep learning methods are remarkably accurate, but also opaque, which limits their potential use in safety-critical applications. To achieve trust and accountability, designers and operators of machine learning algorithms must be able to explain the inner workings, the results and the causes of failures of algorithms to users, regulators, and citizens. The originality of this paper is to combine technical, legal and economic aspects of explainability to develop a framework for defining the "right" level of explain-ability in a given context. We propose three logical steps: First, define the main contextual factors, such as who the audience of the explanation is, the operational context, the level of harm that the system could cause, and the legal/regulatory framework. This step will help characterize the operational and legal needs for explanation, and the corresponding social benefits. Second, examine the technical tools available, including post hoc approaches (input perturbation, saliency maps...) and hybrid AI approaches. Third, as function of the first two steps, choose the right levels of global and local explanation outputs, taking into the account the costs involved. We identify seven kinds of costs and emphasize that explanations are socially useful only when total social benefits exceed costs.
△ Less
Submitted 13 March, 2020;
originally announced March 2020.
-
Uniform convergence rates for the approximated halfspace and projection depth
Authors:
Stanislav Nagy,
Rainer Dyckerhoff,
Pavlo Mozharovskyi
Abstract:
The computational complexity of some depths that satisfy the projection property, such as the halfspace depth or the projection depth, is known to be high, especially for data of higher dimensionality. In such scenarios, the exact depth is frequently approximated using a randomized approach: The data are projected into a finite number of directions uniformly distributed on the unit sphere, and the…
▽ More
The computational complexity of some depths that satisfy the projection property, such as the halfspace depth or the projection depth, is known to be high, especially for data of higher dimensionality. In such scenarios, the exact depth is frequently approximated using a randomized approach: The data are projected into a finite number of directions uniformly distributed on the unit sphere, and the minimal depth of these univariate projections is used to approximate the true depth. We provide a theoretical background for this approximation procedure. Several uniform consistency results are established, and the corresponding uniform convergence rates are provided. For elliptically symmetric distributions and the halfspace depth it is shown that the obtained uniform convergence rates are sharp. In particular, guidelines for the choice of the number of random projections in order to achieve a given precision of the depths are stated.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stephan Clémençon
Abstract:
With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored wit…
▽ More
With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored with an ever finer granularity. The concept of statistical depth, which reflects centrality of an arbitrary observation w.r.t. a statistical population may play a crucial role in this regard, anomalies corresponding to observations with 'small' depth. Supported by sound theoretical and computational developments in the recent decades, it has proven to be extremely useful, in particular in functional spaces. However, most approaches documented in the literature consist in evaluating independently the centrality of each point forming the time series and consequently exhibit a certain insensitivity to possible shape changes. In this paper, we propose a novel notion of functional depth based on the area of the convex hull of sampled curves, capturing gradual departures from centrality, even beyond the envelope of the data, in a natural fashion. We discuss practical relevance of commonly imposed axioms on functional depths and investigate which of them are satisfied by the notion of depth we promote here. Estimation and computational issues are also addressed and various numerical experiments provide empirical evidence of the relevance of the approach proposed.
△ Less
Submitted 13 February, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Functional Isolation Forest
Authors:
Guillaume Staerman,
Pavlo Mozharovskyi,
Stephan Clémençon,
Florence d'Alché-Buc
Abstract:
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional n…
▽ More
For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular Isolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a Functional Statistical Depth function is defined and discussed as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.
△ Less
Submitted 9 October, 2019; v1 submitted 9 April, 2019;
originally announced April 2019.
-
Depth for curve data and applications
Authors:
Pierre Lafaye de Micheaux,
Pavlo Mozharovskyi,
Myriam Vimond
Abstract:
John W. Tukey (1975) defined statistical data depth as a function that determines centrality of an arbitrary point with respect to a data cloud or to a probability measure. During the last decades, this seminal idea of data depth evolved into a powerful tool proving to be useful in various fields of science. Recently, extending the notion of data depth to the functional setting attracted a lot of…
▽ More
John W. Tukey (1975) defined statistical data depth as a function that determines centrality of an arbitrary point with respect to a data cloud or to a probability measure. During the last decades, this seminal idea of data depth evolved into a powerful tool proving to be useful in various fields of science. Recently, extending the notion of data depth to the functional setting attracted a lot of attention among theoretical and applied statisticians. We go further and suggest a notion of data depth suitable for data represented as curves, or trajectories, which is independent of the parametrization. We show that our curve depth satisfies theoretical requirements of general depth functions that are meaningful for trajectories. We apply our methodology to diffusion tensor brain images and also to pattern recognition of hand written digits and letters. Supplementary Materials are available online.
△ Less
Submitted 21 February, 2020; v1 submitted 1 January, 2019;
originally announced January 2019.
-
Nonparametric imputation by data depth
Authors:
Pavlo Mozharovskyi,
Julie Josse,
Francois Husson
Abstract:
We present single imputation method for missing values which borrows the idea of data depth---a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each singl…
▽ More
We present single imputation method for missing values which borrows the idea of data depth---a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each single iteration, imputation reverts to optimization of quadratic, linear, or quasiconcave functions that are solved analytically by linear programming or the Nelder-Mead method. As it accounts for the underlying data topology, the procedure is distribution free, allows imputation close to the data geometry, can make prediction in situations where local imputation (k-nearest neighbors, random forest) cannot, and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that a special case---when using the Mahalanobis depth---has direct connection to well-known methods for the multivariate normal model, such as iterated regression and regularized PCA. The methodology is extended to multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies show good results compared with existing popular alternatives. The method has been implemented as an R-package. Supplementary materials for the article are available online.
△ Less
Submitted 6 August, 2018; v1 submitted 12 January, 2017;
originally announced January 2017.
-
Depth and depth-based classification with R-package ddalpha
Authors:
Oleksii Pokotylo,
Pavlo Mozharovskyi,
Rainer Dyckerhoff
Abstract:
Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of th…
▽ More
Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification.
ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the $DDα$-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition.
△ Less
Submitted 14 August, 2016;
originally announced August 2016.
-
Tukey depth: linear programming and applications
Authors:
Pavlo Mozharovskyi
Abstract:
Determining the representativeness of a point within a data cloud has recently become a desirable task in multivariate analysis. The concept of statistical depth function, which reflects centrality of an arbitrary point, appears to be useful and has been studied intensively during the last decades. Here the issue of exact computation of the classical Tukey data depth is addressed. The paper sugges…
▽ More
Determining the representativeness of a point within a data cloud has recently become a desirable task in multivariate analysis. The concept of statistical depth function, which reflects centrality of an arbitrary point, appears to be useful and has been studied intensively during the last decades. Here the issue of exact computation of the classical Tukey data depth is addressed. The paper suggests an algorithm that exploits connection between the Tukey depth and linear separability and is based on iterative application of linear programming. The algorithm further develops the idea of the cone segmentation of the Euclidean space and allows for efficient implementation due to the special search structure. The presentation is complemented by relationship to similar concepts and examples of application.
△ Less
Submitted 29 February, 2016;
originally announced March 2016.
-
Fast computation of Tukey trimmed regions and median in dimension $p>2$
Authors:
Xiaohui Liu,
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
Given data in $\mathbb{R}^{p}$, a Tukey $κ$-trimmed region is the set of all points that have at least Tukey depth $κ$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient c…
▽ More
Given data in $\mathbb{R}^{p}$, a Tukey $κ$-trimmed region is the set of all points that have at least Tukey depth $κ$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension $p > 2$. We construct two novel algorithms to compute a Tukey $κ$-trimmed region, a naïve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the naïve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median.
△ Less
Submitted 8 November, 2018; v1 submitted 16 December, 2014;
originally announced December 2014.
-
Exact computation of the halfspace depth
Authors:
Rainer Dyckerhoff,
Pavlo Mozharovskyi
Abstract:
For computing the exact value of the halfspace depth of a point w.r.t. a data cloud of $n$ points in arbitrary dimension, a theoretical framework is suggested. Based on this framework a whole class of algorithms can be derived. In all of these algorithms the depth is calculated as the minimum over a finite number of depth values w.r.t. proper projections of the data cloud. Three variants of this c…
▽ More
For computing the exact value of the halfspace depth of a point w.r.t. a data cloud of $n$ points in arbitrary dimension, a theoretical framework is suggested. Based on this framework a whole class of algorithms can be derived. In all of these algorithms the depth is calculated as the minimum over a finite number of depth values w.r.t. proper projections of the data cloud. Three variants of this class are studied in more detail. All of these algorithms are capable of dealing with data that are not in general position and even with data that contain ties. As is shown by simulations, all proposed algorithms prove to be very efficient.
△ Less
Submitted 12 January, 2016; v1 submitted 25 November, 2014;
originally announced November 2014.
-
Classifying real-world data with the $DDα$-procedure
Authors:
Pavlo Mozharovskyi,
Karl Mosler,
Tatjana Lange
Abstract:
The $DDα$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $α$-procedure. To each d…
▽ More
The $DDα$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $α$-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions' shape best and is most robust, `outsiders', that is data points having zero depth in all classes, need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $DDα$-procedure is available as an R-package.
△ Less
Submitted 28 October, 2015; v1 submitted 19 July, 2014;
originally announced July 2014.
-
Fast DD-classification of functional data
Authors:
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the $DD$-plot, which is a subset of the unit hypercube. T…
▽ More
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the $DD$-plot, which is a subset of the unit hypercube. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification on $[0,1]^q$. The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik-Cervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as a benchmark study.
△ Less
Submitted 28 January, 2016; v1 submitted 5 March, 2014;
originally announced March 2014.
-
Fast nonparametric classification based on data depth
Authors:
Tatjana Lange,
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
A new procedure, called DDa-procedure, is developed to solve the problem of classifying d-dimensional objects into q >= 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^q. Specifically, the depth is the zonoid depth, and the algorithm is the alpha-procedure. In case of more th…
▽ More
A new procedure, called DDa-procedure, is developed to solve the problem of classifying d-dimensional objects into q >= 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^q. Specifically, the depth is the zonoid depth, and the algorithm is the alpha-procedure. In case of more than two classes several binary classifications are performed and a majority rule is applied. Special treatments are discussed for 'outsiders', that is, data having zero depth vector. The DDa-classifier is applied to simulated as well as real data, and the results are compared with those of similar procedures that have been recently proposed. In most cases the new procedure has comparable error rates, but is much faster than other classification approaches, including the SVM.
△ Less
Submitted 17 December, 2012; v1 submitted 20 July, 2012;
originally announced July 2012.