Skip to main content

Showing 1–34 of 34 results for author: Mozharovskyi, P

.
  1. arXiv:2407.01331  [pdf, other

    cs.CV cs.AI cs.LG

    Restyling Unsupervised Concept Based Interpretable Networks with Generative Models

    Authors: Jayneel Parekh, Quentin Bouniot, Pavlo Mozharovskyi, Alasdair Newson, Florence d'Alché-Buc

    Abstract: Develo** inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limita… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page available at https://jayneelparekh.github.io/VisCoIN_project_page/

  2. arXiv:2405.13970  [pdf, other

    stat.ME

    Conformal uncertainty quantification using kernel depth measures in separable Hilbert spaces

    Authors: Marcos Matabuena, Rahul Ghosal, Pavlo Mozharovskyi, Oscar Hernan Madrid Padilla, Jukka-Pekka Onnela

    Abstract: Depth measures have gained popularity in the statistical literature for defining level sets in complex data structures like multivariate data, functional data, and graphs. Despite their versatility, integrating depth measures into regression modeling for establishing prediction regions remains underexplored. To address this gap, we propose a novel method utilizing a model-free uncertainty quantifi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  3. arXiv:2312.16139  [pdf, other

    stat.ME cs.LG stat.ML

    Anomaly component analysis

    Authors: Romain Valla, Pavlo Mozharovskyi, Florence d'Alché-Buc

    Abstract: At the crossway of machine learning and data analysis, anomaly detection aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification and isolation constitute an important task in almost any area of industry a… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 41 pages, 25 figures, 13 tables

  4. arXiv:2312.14136  [pdf, other

    stat.ML cs.LG

    Fast kernel half-space depth for data with non-convex supports

    Authors: Arturo Castellanos, Pavlo Mozharovskyi, Florence d'Alché-Buc, Hicham Janati

    Abstract: Data depth is a statistical function that generalizes order and quantiles to the multivariate setting and beyond, with applications spanning over descriptive and visual statistics, anomaly detection, testing, etc. The celebrated halfspace depth exploits data geometry via an optimization program to deliver properties of invariances, robustness, and non-parametricity. Nevertheless, it implicitly ass… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 30 pages

  5. arXiv:2312.05282  [pdf, other

    cs.LG cs.CV

    Towards On-device Learning on the Edge: Ways to Select Neurons to Update under a Budget Constraint

    Authors: Aël Quélennec, Enzo Tartaglione, Pavlo Mozharovskyi, Van-Tam Nguyen

    Abstract: In the realm of efficient on-device learning under extreme memory and computation constraints, a significant gap in successful approaches persists. Although considerable effort has been devoted to efficient inference, the main obstacle to efficient learning is the prohibitive cost of backpropagation. The resources required to compute gradients and update network parameters often exceed the limits… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 8 pages, 4 figures, 2 tables, WACV2024 - SCIoT workshop

  6. arXiv:2311.01434  [pdf, other

    cs.LG cs.AI stat.ML

    Tailoring Mixup to Data for Calibration

    Authors: Quentin Bouniot, Pavlo Mozharovskyi, Florence d'Alché-Buc

    Abstract: Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved performance, Mixup is also a good technique for improving calibration and predictive uncertainty. However, mixing data carelessly can lead to manifold intrusion, i.e., conflicts between the synthetic la… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  7. arXiv:2305.07132  [pdf, other

    cs.SD cs.LG eess.AS

    Tackling Interpretability in Audio Classification Networks with Non-negative Matrix Factorization

    Authors: Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Gaël Richard, Florence d'Alché-Buc

    Abstract: This paper tackles two major problem settings for interpretability of audio processing networks, post-hoc and by-design interpretation. For post-hoc interpretation, we aim to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. This is extended to present an inherently interpretable model with high performance. To this end, we propose a n… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: Under submission at IEEE/ACM TASLP. arXiv admin note: text overlap with arXiv:2202.11479

  8. Optimized preprocessing and Tiny ML for Attention State Classification

    Authors: Yinghao Wang, Rémi Nahon, Enzo Tartaglione, Pavlo Mozharovskyi, Van-Tam Nguyen

    Abstract: In this paper, we present a new approach to mental state classification from EEG signals by combining signal processing techniques and machine learning (ML) algorithms. We evaluate the performance of the proposed method on a dataset of EEG recordings collected during a cognitive load task and compared it to other state-of-the-art methods. The results show that the proposed method achieves high acc… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  9. arXiv:2210.02851  [pdf, other

    stat.ML cs.LG stat.AP

    Anomaly detection using data depth: multivariate case

    Authors: Pavlo Mozharovskyi

    Abstract: Anomaly detection is a branch of machine learning and data analysis which aims at identifying observations that exhibit abnormal behaviour. Be it measurement errors, disease development, severe weather, production quality default(s) (items) or failed equipment, financial frauds or crisis events, their on-time identification, isolation and explanation constitute an important task in almost any bran… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

  10. arXiv:2209.07436  [pdf, other

    stat.ME cs.LG stat.AP stat.ML

    Statistical process monitoring of artificial neural networks

    Authors: Anna Malinovskaya, Pavlo Mozharovskyi, Philipp Otto

    Abstract: The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during t… ▽ More

    Submitted 27 July, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

    Journal ref: Technometrics, 2023

  11. arXiv:2208.04587  [pdf, ps, other

    stat.CO

    On exact computation of Tukey depth central regions

    Authors: Vít Fojtík, Petra Laketa, Pavlo Mozharovskyi, Stanislav Nagy

    Abstract: The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the $d$-dimensional space whose Tukey depth exceeds given thresholds $k$. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm A… ▽ More

    Submitted 31 October, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

    MSC Class: 62-08; 62H12; 62G05

  12. arXiv:2202.11479  [pdf, other

    cs.SD cs.LG eess.AS

    Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

    Authors: Jayneel Parekh, Sanjeel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc, Gaël Richard

    Abstract: This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden la… ▽ More

    Submitted 24 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: Accepted at NeurIPS 2022

  13. arXiv:2201.08105  [pdf, other

    cs.LG stat.ML

    Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications

    Authors: Morgane Goibert, Stéphan Clémençon, Ekhine Irurozki, Pavlo Mozharovskyi

    Abstract: The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $Σ$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $Σ$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, r… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

  14. arXiv:2201.05115  [pdf, other

    stat.ML cs.LG

    Functional Anomaly Detection: a Benchmark Study

    Authors: Guillaume Staerman, Eric Adjakossa, Pavlo Mozharovskyi, Vera Hofer, Jayant Sen Gupta, Stephan Clémençon

    Abstract: The increasing automation in many areas of the Industry expressly demands to design efficient machine-learning solutions for the detection of abnormal events. With the ubiquitous deployment of sensors monitoring nearly continuously the health of complex infrastructures, anomaly detection can now rely on measurements sampled at a very high frequency, providing a very rich representation of the phen… ▽ More

    Submitted 13 January, 2022; originally announced January 2022.

  15. arXiv:2106.11068  [pdf, other

    stat.ML cs.LG

    Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon

    Abstract: Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey7… ▽ More

    Submitted 4 February, 2022; v1 submitted 21 June, 2021; originally announced June 2021.

  16. arXiv:2103.12711  [pdf, other

    stat.ML cs.LG

    A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stéphan Clémençon, Florence d'Alché-Buc

    Abstract: The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in Machine Learning. Focusing on continuous probability distributions on the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametri… ▽ More

    Submitted 10 October, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

  17. arXiv:2101.00726  [pdf, other

    math.OC stat.ME

    Distributionally robust halfspace depth

    Authors: Jevgenijs Ivanovs, Pavlo Mozharovskyi

    Abstract: Tukey's halfspace depth can be seen as a stochastic program and as such it is not guarded against optimizer's curse, so that a limited training sample may easily result in a poor out-of-sample performance. We propose a generalized halfspace depth concept relying on the recent advances in distributionally robust optimization, where every halfspace is examined using the respective worst-case distrib… ▽ More

    Submitted 10 May, 2024; v1 submitted 3 January, 2021; originally announced January 2021.

  18. arXiv:2010.09345  [pdf, other

    cs.LG stat.ML

    A Framework to Learn with Interpretation

    Authors: Jayneel Parekh, Pavlo Mozharovskyi, Florence d'Alché-Buc

    Abstract: To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen r… ▽ More

    Submitted 23 February, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

  19. arXiv:2007.08016  [pdf, other

    stat.CO

    Approximate computation of projection depths

    Authors: Rainer Dyckerhoff, Pavlo Mozharovskyi, Stanislav Nagy

    Abstract: Data depth is a concept in multivariate statistics that measures the centrality of a point in a given data cloud in $\IR^d$. If the depth of a point can be represented as the minimum of the depths with respect to all one-dimensional projections of the data, then the depth satisfies the so-called projection property. Such depths form an important class that includes many of the depths that have bee… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    MSC Class: 62G05; 62H12; 90C26

  20. arXiv:2006.10325  [pdf, other

    stat.ML cs.LG

    When OT meets MoM: Robust estimation of Wasserstein Distance

    Authors: Guillaume Staerman, Pierre Laforgue, Pavlo Mozharovskyi, Florence d'Alché-Buc

    Abstract: Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to lev… ▽ More

    Submitted 18 February, 2022; v1 submitted 18 June, 2020; originally announced June 2020.

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021

  21. arXiv:2004.01927  [pdf, other

    stat.ME

    Choosing among notions of multivariate depth statistics

    Authors: Karl Mosler, Pavlo Mozharovskyi

    Abstract: Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, con… ▽ More

    Submitted 5 May, 2021; v1 submitted 4 April, 2020; originally announced April 2020.

    MSC Class: Primary 62H05; 62H30; secondary 62-07

  22. arXiv:2003.07703  [pdf, other

    cs.CY

    Flexible and Context-Specific AI Explainability: A Multidisciplinary Approach

    Authors: Valérie Beaudouin, Isabelle Bloch, David Bounie, Stéphan Clémençon, Florence d'Alché-Buc, James Eagan, Winston Maxwell, Pavlo Mozharovskyi, Jayneel Parekh

    Abstract: The recent enthusiasm for artificial intelligence (AI) is due principally to advances in deep learning. Deep learning methods are remarkably accurate, but also opaque, which limits their potential use in safety-critical applications. To achieve trust and accountability, designers and operators of machine learning algorithms must be able to explain the inner workings, the results and the causes of… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

  23. arXiv:1910.05956  [pdf, ps, other

    math.ST stat.CO

    Uniform convergence rates for the approximated halfspace and projection depth

    Authors: Stanislav Nagy, Rainer Dyckerhoff, Pavlo Mozharovskyi

    Abstract: The computational complexity of some depths that satisfy the projection property, such as the halfspace depth or the projection depth, is known to be high, especially for data of higher dimensionality. In such scenarios, the exact depth is frequently approximated using a randomized approach: The data are projected into a finite number of directions uniformly distributed on the unit sphere, and the… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

    MSC Class: 62G20; 62H12

    Journal ref: Electron. J. Statist. 14 (2) 3939 - 3975, 2020

  24. arXiv:1910.04085  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon

    Abstract: With the ubiquity of sensors in the IoT era, statistical observations are becoming increasingly available in the form of massive (multivariate) time-series. Formulated as unsupervised anomaly detection tasks, an abundance of applications like aviation safety management, the health monitoring of complex infrastructures or fraud detection can now rely on such functional data, acquired and stored wit… ▽ More

    Submitted 13 February, 2020; v1 submitted 9 October, 2019; originally announced October 2019.

  25. arXiv:1904.04573  [pdf, other

    stat.ML cs.LG

    Functional Isolation Forest

    Authors: Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon, Florence d'Alché-Buc

    Abstract: For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional n… ▽ More

    Submitted 9 October, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

  26. arXiv:1901.00180  [pdf, other

    stat.ME

    Depth for curve data and applications

    Authors: Pierre Lafaye de Micheaux, Pavlo Mozharovskyi, Myriam Vimond

    Abstract: John W. Tukey (1975) defined statistical data depth as a function that determines centrality of an arbitrary point with respect to a data cloud or to a probability measure. During the last decades, this seminal idea of data depth evolved into a powerful tool proving to be useful in various fields of science. Recently, extending the notion of data depth to the functional setting attracted a lot of… ▽ More

    Submitted 21 February, 2020; v1 submitted 1 January, 2019; originally announced January 2019.

  27. arXiv:1701.03513  [pdf, other

    stat.ME

    Nonparametric imputation by data depth

    Authors: Pavlo Mozharovskyi, Julie Josse, Francois Husson

    Abstract: We present single imputation method for missing values which borrows the idea of data depth---a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each singl… ▽ More

    Submitted 6 August, 2018; v1 submitted 12 January, 2017; originally announced January 2017.

  28. arXiv:1608.04109  [pdf, other

    stat.CO stat.ML

    Depth and depth-based classification with R-package ddalpha

    Authors: Oleksii Pokotylo, Pavlo Mozharovskyi, Rainer Dyckerhoff

    Abstract: Following the seminal idea of Tukey, data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R-package ddalpha is a software directed to fuse experience of th… ▽ More

    Submitted 14 August, 2016; originally announced August 2016.

  29. arXiv:1603.00069  [pdf, other

    stat.CO

    Tukey depth: linear programming and applications

    Authors: Pavlo Mozharovskyi

    Abstract: Determining the representativeness of a point within a data cloud has recently become a desirable task in multivariate analysis. The concept of statistical depth function, which reflects centrality of an arbitrary point, appears to be useful and has been studied intensively during the last decades. Here the issue of exact computation of the classical Tukey data depth is addressed. The paper sugges… ▽ More

    Submitted 29 February, 2016; originally announced March 2016.

  30. arXiv:1412.5122  [pdf, other

    stat.CO

    Fast computation of Tukey trimmed regions and median in dimension $p>2$

    Authors: Xiaohui Liu, Karl Mosler, Pavlo Mozharovskyi

    Abstract: Given data in $\mathbb{R}^{p}$, a Tukey $κ$-trimmed region is the set of all points that have at least Tukey depth $κ$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient c… ▽ More

    Submitted 8 November, 2018; v1 submitted 16 December, 2014; originally announced December 2014.

    MSC Class: 62F10; 62F35

  31. arXiv:1411.6927  [pdf, other

    stat.CO stat.ME

    Exact computation of the halfspace depth

    Authors: Rainer Dyckerhoff, Pavlo Mozharovskyi

    Abstract: For computing the exact value of the halfspace depth of a point w.r.t. a data cloud of $n$ points in arbitrary dimension, a theoretical framework is suggested. Based on this framework a whole class of algorithms can be derived. In all of these algorithms the depth is calculated as the minimum over a finite number of depth values w.r.t. proper projections of the data cloud. Three variants of this c… ▽ More

    Submitted 12 January, 2016; v1 submitted 25 November, 2014; originally announced November 2014.

  32. arXiv:1407.5185  [pdf, other

    stat.AP stat.ME

    Classifying real-world data with the $DDα$-procedure

    Authors: Pavlo Mozharovskyi, Karl Mosler, Tatjana Lange

    Abstract: The $DDα$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $α$-procedure. To each d… ▽ More

    Submitted 28 October, 2015; v1 submitted 19 July, 2014; originally announced July 2014.

    Journal ref: Advances in Data Analysis and Classification 9 (2015), 287 - 314

  33. arXiv:1403.1158  [pdf, other

    stat.ME

    Fast DD-classification of functional data

    Authors: Karl Mosler, Pavlo Mozharovskyi

    Abstract: A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the $DD$-plot, which is a subset of the unit hypercube. T… ▽ More

    Submitted 28 January, 2016; v1 submitted 5 March, 2014; originally announced March 2014.

  34. arXiv:1207.4992  [pdf, ps, other

    stat.ML cs.LG

    Fast nonparametric classification based on data depth

    Authors: Tatjana Lange, Karl Mosler, Pavlo Mozharovskyi

    Abstract: A new procedure, called DDa-procedure, is developed to solve the problem of classifying d-dimensional objects into q >= 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^q. Specifically, the depth is the zonoid depth, and the algorithm is the alpha-procedure. In case of more th… ▽ More

    Submitted 17 December, 2012; v1 submitted 20 July, 2012; originally announced July 2012.

    MSC Class: 62H30

    Journal ref: Statistical Papers 55 (2014), 49-69