Partial Wasserstein and Maximum Mean Discrepancy distances for bridging the gap between outlier detection and drift detection
Authors:
Thomas Viehmann
Abstract:
With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained durin…
▽ More
With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing.
There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection.
In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.
△ Less
Submitted 28 June, 2021; v1 submitted 9 June, 2021;
originally announced June 2021.
Numerically more stable computation of the p-values for the two-sample Kolmogorov-Smirnov test
Authors:
Thomas Viehmann
Abstract:
The two-sample Kolmogorov-Smirnov test is a widely used statistical test for detecting whether two samples are likely to come from the same distribution. Implementations typically recur on an article of Hodges from 1957. The advances in computation speed make it feasible to compute exact p-values for a much larger range of problem sizes, but these run into numerical stability problems from floatin…
▽ More
The two-sample Kolmogorov-Smirnov test is a widely used statistical test for detecting whether two samples are likely to come from the same distribution. Implementations typically recur on an article of Hodges from 1957. The advances in computation speed make it feasible to compute exact p-values for a much larger range of problem sizes, but these run into numerical stability problems from floating point operations. We provide a simple transformation of the defining recurrence for the two-side two-sample KS test that avoids this.
△ Less
Submitted 23 September, 2021; v1 submitted 16 February, 2021;
originally announced February 2021.
Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss
Authors:
Thomas Viehmann
Abstract:
In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb
In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb
△ Less
Submitted 5 July, 2019; v1 submitted 1 July, 2019;
originally announced July 2019.