Search | arXiv e-print repository

arXiv:2005.12227 [pdf, other]

doi 10.1007/978-3-030-36938-5_39

Keyed Non-Parametric Hypothesis Tests

Authors: Yao Cheng, Cheng-Kang Chu, Hsiao-Ying Lin, Marius Lombard-Platet, David Naccache

Abstract: The recent popularity of machine learning calls for a deeper understanding of AI security. Amongst the numerous AI threats published so far, poisoning attacks currently attract considerable attention. In a poisoning attack the opponent partially tampers the dataset used for learning to mislead the classifier during the testing phase. This paper proposes a new protection strategy against poisonin… ▽ More The recent popularity of machine learning calls for a deeper understanding of AI security. Amongst the numerous AI threats published so far, poisoning attacks currently attract considerable attention. In a poisoning attack the opponent partially tampers the dataset used for learning to mislead the classifier during the testing phase. This paper proposes a new protection strategy against poisoning attacks. The technique relies on a new primitive called keyed non-parametric hypothesis tests allowing to evaluate under adversarial conditions the training input's conformance with a previously learned distribution $\mathfrak{D}$. To do so we use a secret key $κ$ unknown to the opponent. Keyed non-parametric hypothesis tests differs from classical tests in that the secrecy of $κ$ prevents the opponent from misleading the keyed test into concluding that a (significantly) tampered dataset belongs to $\mathfrak{D}$. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: Paper published in NSS 2019

arXiv:2005.04740 [pdf, ps, other]

Approaching Optimal Duplicate Detection in a Sliding Window

Authors: Rémi Géraud-Stewart, Marius Lombard-Platet, David Naccache

Abstract: Duplicate detection is the problem of identifying whether a given item has previously appeared in a (possibly infinite) stream of data, when only a limited amount of memory is available. Unfortunately the infinite stream setting is ill-posed, and error rates of duplicate detection filters turn out to be heavily constrained: consequently they appear to provide no advantage, asymptotically, over a… ▽ More Duplicate detection is the problem of identifying whether a given item has previously appeared in a (possibly infinite) stream of data, when only a limited amount of memory is available. Unfortunately the infinite stream setting is ill-posed, and error rates of duplicate detection filters turn out to be heavily constrained: consequently they appear to provide no advantage, asymptotically, over a biased coin toss [8]. In this paper we formalize the sliding window setting introduced by [13,16], and show that a perfect (zero error) solution can be used up to a maximal window size $w_\text{max}$. Above this threshold we show that some existing duplicate detection filters (designed for the $\textit{non-windowed}$ setting) perform better that those targeting the windowed problem. Finally, we introduce a "queuing construction" that improves on the performance of some duplicate detection filters in the windowed setting. We also analyse the security of our filters in an adversarial setting. △ Less

Submitted 10 May, 2020; originally announced May 2020.

arXiv:1901.04358 [pdf, ps, other]

doi 10.1145/3297280.3297335

Quotient Hash Tables - Efficiently Detecting Duplicates in Streaming Data

Authors: Rémi Géraud, Marius Lombard-Platet, David Naccache

Abstract: This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33\% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce a… ▽ More This article presents the Quotient Hash Table (QHT) a new data structure for duplicate detection in unbounded streams. QHTs stem from a corrected analysis of streaming quotient filters (SQFs), resulting in a 33\% reduction in memory usage for equal performance. We provide a new and thorough analysis of both algorithms, with results of interest to other existing constructions. We also introduce an optimised version of our new data structure dubbed Queued QHT with Duplicates (QQHTD). Finally we discuss the effect of adversarial inputs for hash-based duplicate filters similar to QHT. △ Less

Submitted 14 January, 2019; originally announced January 2019.

Comments: Shorter version was accepted at SIGAPP SAC '19

Showing 1–3 of 3 results for author: Lombard-Platet, M