Search | arXiv e-print repository

Nyström Kernel Stein Discrepancy

Authors: Florian Kalinke, Zoltan Szabo, Bharath K. Sriperumbudur

Abstract: Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the con… ▽ More Kernel methods underpin many of the most successful approaches in data science and statistics, and they allow representing probability measures as elements of a reproducing kernel Hilbert space without loss of information. Recently, the kernel Stein discrepancy (KSD), which combines Stein's method with kernel techniques, gained considerable attention. Through the Stein operator, KSD allows the construction of powerful goodness-of-fit tests where it is sufficient to know the target distribution up to a multiplicative constant. However, the typical U- and V-statistic-based KSD estimators suffer from a quadratic runtime complexity, which hinders their application in large-scale settings. In this work, we propose a Nyström-based KSD acceleration -- with runtime $\mathcal O\!\left(mn+m^3\right)$ for $n$ samples and $m\ll n$ Nyström points -- , show its $\sqrt{n}$-consistency under the null with a classical sub-Gaussian assumption, and demonstrate its applicability for goodness-of-fit testing on a suite of benchmarks. △ Less

Submitted 12 June, 2024; originally announced June 2024.

MSC Class: 46E22 (Primary) 62G10 (Secondary) ACM Class: G.3; I.2.6

arXiv:2403.07735 [pdf, ps, other]

The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels

Authors: Florian Kalinke, Zoltan Szabo

Abstract: Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as dis… ▽ More Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb R^d$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O\!\left(n^{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nyström-based one) on $\mathbb R^d$. △ Less

Submitted 12 March, 2024; originally announced March 2024.

MSC Class: 62C20; 46E22; 47B32; 94A15; 62G10 ACM Class: G.3; H.1.1; I.2.6

arXiv:2402.00592 [pdf, other]

Partial-Label Learning with a Reject Option

Authors: Tobias Fuchs, Florian Kalinke, Klemens Böhm

Abstract: In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they… ▽ More In real-world applications, one often encounters ambiguously labeled data, where different annotators assign conflicting class labels. Partial-label learning allows training classifiers in this weakly supervised setting, where state-of-the-art methods already show good predictive performance. However, even the best algorithms give incorrect predictions, which can have severe consequences when they impact actions or decisions. We propose a novel risk-consistent partial-label learning algorithm with a reject option, that is, the algorithm can reject unsure predictions. Extensive experiments on artificial and real-world datasets show that our method provides the best trade-off between the number and accuracy of non-rejected predictions when compared to our competitors, which use confidence thresholds for rejecting unsure predictions instead. When evaluated without the reject option, our nearest neighbor-based approach also achieves competitive prediction performance. △ Less

Submitted 5 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2302.09930 [pdf, other]

Nyström $M$-Hilbert-Schmidt Independence Criterion

Authors: Florian Kalinke, Zoltán Szabó

Abstract: Kernel techniques are among the most popular and powerful approaches of data science. Among the key features that make kernels ubiquitous are (i) the number of domains they have been designed for, (ii) the Hilbert structure of the function class associated to kernels facilitating their statistical analysis, and (iii) their ability to represent probability distributions without loss of information.… ▽ More Kernel techniques are among the most popular and powerful approaches of data science. Among the key features that make kernels ubiquitous are (i) the number of domains they have been designed for, (ii) the Hilbert structure of the function class associated to kernels facilitating their statistical analysis, and (iii) their ability to represent probability distributions without loss of information. These properties give rise to the immense success of Hilbert-Schmidt independence criterion (HSIC) which is able to capture joint independence of random variables under mild conditions, and permits closed-form estimators with quadratic computational complexity (w.r.t. the sample size). In order to alleviate the quadratic computational bottleneck in large-scale applications, multiple HSIC approximations have been proposed, however these estimators are restricted to $M=2$ random variables, do not extend naturally to the $M\ge 2$ case, and lack theoretical guarantees. In this work, we propose an alternative Nyström-based HSIC estimator which handles the $M\ge 2$ case, prove its consistency, and demonstrate its applicability in multiple contexts, including synthetic examples, dependency testing of media annotations, and causal discovery. △ Less

Submitted 31 July, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

MSC Class: 46E22; 94A17 ACM Class: I.2.6; H.1.1

Journal ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1005-1015, 2023

Showing 1–4 of 4 results for author: Kalinke, F