Testing for Outliers with Conformal p-values

Bates, Stephen; Candès, Emmanuel; Lei, Lihua; Romano, Yaniv; Sesia, Matteo

doi:10.1214/22-AOS2244

Statistics > Methodology

arXiv:2104.08279 (stat)

[Submitted on 16 Apr 2021 (v1), last revised 25 May 2022 (this version, v3)]

Title:Testing for Outliers with Conformal p-values

Authors:Stephen Bates, Emmanuel Candès, Lihua Lei, Yaniv Romano, Matteo Sesia

View PDF

Abstract:This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

Comments:	Revision May 24, 2022: added "asymptotic" and "Monte Carlo" conditional calibration methods; added power analyses; updated numerical experiments to include new methods
Subjects:	Methodology (stat.ME); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2104.08279 [stat.ME]
	(or arXiv:2104.08279v3 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2104.08279
Journal reference:	Ann. Statist. 51(1): 149-178 (February 2023)
Related DOI:	https://doi.org/10.1214/22-AOS2244

Submission history

From: Matteo Sesia [view email]
[v1] Fri, 16 Apr 2021 17:59:21 UTC (4,337 KB)
[v2] Mon, 19 Apr 2021 16:31:16 UTC (4,339 KB)
[v3] Wed, 25 May 2022 02:35:07 UTC (5,063 KB)

Statistics > Methodology

Title:Testing for Outliers with Conformal p-values

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:Testing for Outliers with Conformal p-values

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators