Active Sequential Two-Sample Testing

Li, Weizhi; Kadambi, Prad; Saidi, Pouria; Ramamurthy, Karthikeyan Natesan; Dasarathy, Gautam; Berisha, Visar

Computer Science > Machine Learning

arXiv:2301.12616v4 (cs)

[Submitted on 30 Jan 2023 (v1), last revised 28 Jun 2024 (this version, v4)]

Title:Active Sequential Two-Sample Testing

Authors:Weizhi Li, Prad Kadambi, Pouria Saidi, Karthikeyan Natesan Ramamurthy, Gautam Dasarathy, Visar Berisha

View PDF HTML (experimental)

Abstract:A two-sample hypothesis test is a statistical procedure used to determine whether the distributions generating two samples are identical. We consider the two-sample testing problem in a new scenario where the sample measurements (or sample features) are inexpensive to access, but their group memberships (or labels) are costly. To address the problem, we devise the first \emph{active sequential two-sample testing framework} that not only sequentially but also \emph{actively queries}. Our test statistic is a likelihood ratio where one likelihood is found by maximization over all class priors, and the other is provided by a probabilistic classification model. The classification model is adaptively updated and used to predict where the (unlabelled) features have a high dependency on labels; labeling the ``high-dependency'' features leads to the increased power of the proposed testing framework. In theory, we provide the proof that our framework produces an \emph{anytime-valid} $p$-value. In addition, we characterize the proposed framework's gain in testing power by analyzing the mutual information between the feature and label variables in asymptotic and finite-sample scenarios. In practice, we introduce an instantiation of our framework and evaluate it using several experiments; the experiments on the synthetic, MNIST, and application-specific datasets demonstrate that the testing power of the instantiated active sequential test significantly increases while the Type I error is under control.

Subjects:	Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2301.12616 [cs.LG]
	(or arXiv:2301.12616v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2301.12616

Submission history

From: Weizhi Li [view email]
[v1] Mon, 30 Jan 2023 02:23:49 UTC (797 KB)
[v2] Tue, 31 Jan 2023 01:49:26 UTC (796 KB)
[v3] Thu, 2 Feb 2023 02:00:53 UTC (798 KB)
[v4] Fri, 28 Jun 2024 03:57:21 UTC (509 KB)

Computer Science > Machine Learning

Title:Active Sequential Two-Sample Testing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Active Sequential Two-Sample Testing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators