Search | arXiv e-print repository

Fast Interactive Search with a Scale-Free Comparison Oracle

Authors: Daniyar Chumbalov, Lars Klein, Lucas Maystre, Matthias Grossglauser

Abstract: A comparison-based search algorithm lets a user find a target item $t$ in a database by answering queries of the form, ``Which of items $i$ and $j$ is closer to $t$?'' Instead of formulating an explicit query (such as one or several keywords), the user navigates towards the target via a sequence of such (typically noisy) queries. We propose a scale-free probabilistic oracle model called $γ$-CKL… ▽ More A comparison-based search algorithm lets a user find a target item $t$ in a database by answering queries of the form, ``Which of items $i$ and $j$ is closer to $t$?'' Instead of formulating an explicit query (such as one or several keywords), the user navigates towards the target via a sequence of such (typically noisy) queries. We propose a scale-free probabilistic oracle model called $γ$-CKL for such similarity triplets $(i,j;t)$, which generalizes the CKL triplet model proposed in the literature. The generalization affords independent control over the discriminating power of the oracle and the dimension of the feature space containing the items. We develop a search algorithm with provably exponential rate of convergence under the $γ$-CKL oracle, thanks to a backtracking strategy that deals with the unavoidable errors in updating the belief region around the target. We evaluate the performance of the algorithm both over the posited oracle and over several real-world triplet datasets. We also report on a comprehensive user study, where human subjects navigate a database of face portraits. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:1905.05049 [pdf, other]

Scalable and Efficient Comparison-based Search without Features

Authors: Daniyar Chumbalov, Lucas Maystre, Matthias Grossglauser

Abstract: We consider the problem of finding a target object $t$ using pairwise comparisons, by asking an oracle questions of the form \emph{"Which object from the pair $(i,j)$ is more similar to $t$?"}. Objects live in a space of latent features, from which the oracle generates noisy answers. First, we consider the {\em non-blind} setting where these features are accessible. We propose a new Bayesian compa… ▽ More We consider the problem of finding a target object $t$ using pairwise comparisons, by asking an oracle questions of the form \emph{"Which object from the pair $(i,j)$ is more similar to $t$?"}. Objects live in a space of latent features, from which the oracle generates noisy answers. First, we consider the {\em non-blind} setting where these features are accessible. We propose a new Bayesian comparison-based search algorithm with noisy answers; it has low computational complexity yet is efficient in the number of queries. We provide theoretical guarantees, deriving the form of the optimal query and proving almost sure convergence to the target $t$. Second, we consider the \emph{blind} setting, where the object features are hidden from the search algorithm. In this setting, we combine our search method and a new distributional triplet embedding algorithm into one scalable learning framework called \textsc{Learn2Search}. We show that the query complexity of our approach on two real-world datasets is on par with the non-blind setting, which is not achievable using any of the current state-of-the-art embedding methods. Finally, we demonstrate the efficacy of our framework by conducting an experiment with users searching for movie actors. △ Less

Submitted 3 September, 2020; v1 submitted 13 May, 2019; originally announced May 2019.

arXiv:1511.02899 [pdf, other]

On the Combinatorial Version of the Slepian-Wolf Problem

Authors: Daniyar Chumbalov, Andrei Romashchenko

Abstract: We study the following combinatorial version of the Slepian-Wolf coding scheme. Two isolated Senders are given binary strings $X$ and $Y$ respectively; the length of each string is equal to $n$, and the Hamming distance between the strings is at most $αn$. The Senders compress their strings and communicate the results to the Receiver. Then the Receiver must reconstruct both strings $X$ and $Y$. Th… ▽ More We study the following combinatorial version of the Slepian-Wolf coding scheme. Two isolated Senders are given binary strings $X$ and $Y$ respectively; the length of each string is equal to $n$, and the Hamming distance between the strings is at most $αn$. The Senders compress their strings and communicate the results to the Receiver. Then the Receiver must reconstruct both strings $X$ and $Y$. The aim is to minimise the lengths of the transmitted messages. For an asymmetric variant of this problem (where one of the Senders transmits the input string to the Receiver without compression) with deterministic encoding a nontrivial lower bound was found by A.Orlitsky and K.Viswanathany. In our paper we prove a new lower bound for the schemes with syndrome coding, where at least one of the Senders uses linear encoding of the input string. For the combinatorial Slepian-Wolf problem with randomized encoding the theoretical optimum of communication complexity was recently found by the first author, though effective protocols with optimal lengths of messages remained unknown. We close this gap and present a polynomial time randomized protocol that achieves the optimal communication complexity. △ Less

Submitted 11 June, 2018; v1 submitted 9 November, 2015; originally announced November 2015.

Comments: 20 pages, 14 figures. Accepted to IEEE Transactions on Information Theory (June 2018)

Showing 1–3 of 3 results for author: Chumbalov, D