Search | arXiv e-print repository

arXiv:2106.12893 [pdf, other]

Partial Wasserstein and Maximum Mean Discrepancy distances for bridging the gap between outlier detection and drift detection

Authors: Thomas Viehmann

Abstract: With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained durin… ▽ More With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing. There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection. In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution. △ Less

Submitted 28 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

arXiv:2102.08037 [pdf, other]

Numerically more stable computation of the p-values for the two-sample Kolmogorov-Smirnov test

Authors: Thomas Viehmann

Abstract: The two-sample Kolmogorov-Smirnov test is a widely used statistical test for detecting whether two samples are likely to come from the same distribution. Implementations typically recur on an article of Hodges from 1957. The advances in computation speed make it feasible to compute exact p-values for a much larger range of problem sizes, but these run into numerical stability problems from floatin… ▽ More The two-sample Kolmogorov-Smirnov test is a widely used statistical test for detecting whether two samples are likely to come from the same distribution. Implementations typically recur on an article of Hodges from 1957. The advances in computation speed make it feasible to compute exact p-values for a much larger range of problem sizes, but these run into numerical stability problems from floating point operations. We provide a simple transformation of the defining recurrence for the two-side two-sample KS test that avoids this. △ Less

Submitted 23 September, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Report number: 2021-1

arXiv:1907.01729 [pdf, ps, other]

Implementation of batched Sinkhorn iterations for entropy-regularized Wasserstein loss

Authors: Thomas Viehmann

Abstract: In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb In this report, we review the calculation of entropy-regularised Wasserstein loss introduced by Cuturi and document a practical implementation in PyTorch. Code is available at https://github.com/t-vi/pytorch-tvmisc/blob/master/wasserstein-distance/Pytorch_Wasserstein.ipynb △ Less

Submitted 5 July, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Showing 1–3 of 3 results for author: Viehmann, T