-
Bridging Dense and Sparse Maximum Inner Product Search
Authors:
Sebastian Bruch,
Franco Maria Nardini,
Amir Ingber,
Edo Liberty
Abstract:
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-$k$ retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask i…
▽ More
Maximum inner product search (MIPS) over dense and sparse vectors have progressed independently in a bifurcated literature for decades; the latter is better known as top-$k$ retrieval in Information Retrieval. This duality exists because sparse and dense vectors serve different end goals. That is despite the fact that they are manifestations of the same mathematical problem. In this work, we ask if algorithms for dense vectors could be applied effectively to sparse vectors, particularly those that violate the assumptions underlying top-$k$ retrieval methods. We study IVF-based retrieval where vectors are partitioned into clusters and only a fraction of clusters are searched during retrieval. We conduct a comprehensive analysis of dimensionality reduction for sparse vectors, and examine standard and spherical KMeans for partitioning. Our experiments demonstrate that IVF serves as an efficient solution for sparse MIPS. As byproducts, we identify two research opportunities and demonstrate their potential. First, we cast the IVF paradigm as a dynamic pruning technique and turn that insight into a novel organization of the inverted index for approximate MIPS for general sparse vectors. Second, we offer a unified regime for MIPS over vectors that have dense and sparse subspaces, and show its robustness to query distributions.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Authors:
Sebastian Bruch,
Franco Maria Nardini,
Amir Ingber,
Edo Liberty
Abstract:
Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing algorithms are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural lan…
▽ More
Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing algorithms are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural languages. We consider, instead, a setup in which collections are streaming -- necessitating dynamic indexing -- and where indexing and retrieval must work with arbitrarily distributed real-valued vectors. As we show, existing algorithms are no longer competitive in this setup, even against naive solutions. We investigate this gap and present a novel approximate solution, called Sinnamon, that can efficiently retrieve the top-k results for sparse real valued vectors drawn from arbitrary distributions. Notably, Sinnamon offers levers to trade-off memory consumption, latency, and accuracy, making the algorithm suitable for constrained applications and systems. We give theoretical results on the error introduced by the approximate nature of the algorithm, and present an empirical evaluation of its performance on two hardware platforms and synthetic and real-valued datasets. We conclude by laying out concrete directions for future research on this general top-k retrieval problem over sparse vectors.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
An Analysis of Fusion Functions for Hybrid Retrieval
Authors:
Sebastian Bruch,
Siyu Gai,
Amir Ingber
Abstract:
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studie…
▽ More
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.
△ Less
Submitted 4 May, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
SDR: Efficient Neural Re-ranking using Succinct Document Representation
Authors:
Nachshon Cohen,
Amit Portnoy,
Besnik Fetahu,
Amir Ingber
Abstract:
BERT based ranking models have achieved superior performance on various information retrieval tasks. However, the large number of parameters and complex self-attention operation come at a significant latency overhead. To remedy this, recent works propose late-interaction architectures, which allow pre-computation of intermediate document representations, thus reducing the runtime latency. Nonethel…
▽ More
BERT based ranking models have achieved superior performance on various information retrieval tasks. However, the large number of parameters and complex self-attention operation come at a significant latency overhead. To remedy this, recent works propose late-interaction architectures, which allow pre-computation of intermediate document representations, thus reducing the runtime latency. Nonetheless, having solved the immediate latency issue, these methods now introduce storage costs and network fetching latency, which limits their adoption in real-life production systems.
In this work, we propose the Succinct Document Representation (SDR) scheme that computes highly compressed intermediate document representations, mitigating the storage/network issue. Our approach first reduces the dimension of token representations by encoding them using a novel autoencoder architecture that uses the document's textual content in both the encoding and decoding phases. After this token encoding step, we further reduce the size of entire document representations using a modern quantization technique.
Extensive evaluations on passage re-reranking on the MSMARCO dataset show that compared to existing approaches using compressed document representations, our method is highly efficient, achieving 4x-11.6x better compression rates for the same ranking quality.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
Strong Successive Refinability and Rate-Distortion-Complexity Tradeoff
Authors:
Albert No,
Amir Ingber,
Tsachy Weissman
Abstract:
We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for str…
▽ More
We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similarly to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for strong successive refinability. We show that any discrete source under Hamming distortion and the Gaussian source under quadratic distortion are strongly successively refinable.
We also demonstrate how successive refinement ideas can be used in point-to-point lossy compression problems in order to reduce complexity. We give two examples, the binary-Hamming and Gaussian-quadratic cases, in which a layered code construction results in a low complexity scheme that attains optimal performance. For example, when the number of layers grows with the block length $n$, we show how to design an $O(n^{\log(n)})$ algorithm that asymptotically achieves the rate-distortion bound.
△ Less
Submitted 15 March, 2016; v1 submitted 10 June, 2015;
originally announced June 2015.
-
Compression for Quadratic Similarity Queries: Finite Blocklength and Practical Schemes
Authors:
Fabian Steiner,
Steffen Dempfle,
Amir Ingber,
Tsachy Weissman
Abstract:
We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes…
▽ More
We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem - the minimal compression rate and the error exponent - were found in a previous work, in this paper we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: The gain (amplitude) of the vector is compressed via scalar quantization and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated and they converge to the asymptotic values as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.
△ Less
Submitted 10 May, 2014; v1 submitted 21 April, 2014;
originally announced April 2014.
-
The Minimal Compression Rate for Similarity Identification
Authors:
Amir Ingber,
Tsachy Weissman
Abstract:
Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case where the goal is to answer similarity queries about the compressed sequence, i.e. to identify whether or not the original sequence is similar to a given query s…
▽ More
Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case where the goal is to answer similarity queries about the compressed sequence, i.e. to identify whether or not the original sequence is similar to a given query sequence. We study the fundamental tradeoff between the compression rate and the reliability of the queries performed on compressed data. For i.i.d. sequences, we characterize the minimal compression rate that allows query answers, that are reliable in the sense of having a vanishing false-positive probability, when false negatives are not allowed. The result is partially based on a previous work by Ahlswede et al., and the inherently typical subset lemma plays a key role in the converse proof. We then characterize the compression rate achievable by schemes that use lossy source codes as a building block, and show that such schemes are, in general, suboptimal. Finally, we tackle the problem of evaluating the minimal compression rate, by converting the problem to a sequence of convex programs that can be solved efficiently.
△ Less
Submitted 7 December, 2013;
originally announced December 2013.
-
Compression for Quadratic Similarity Queries
Authors:
Amir Ingber,
Thomas Courtade,
Tsachy Weissman
Abstract:
The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the ide…
▽ More
The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the identification rate - which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively.
For a general source we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.
△ Less
Submitted 24 July, 2013;
originally announced July 2013.
-
The Dispersion of Joint Source-Channel Coding
Authors:
Da Wang,
Amir Ingber,
Yuval Kochman
Abstract:
In this work we investigate the behavior of the distortion threshold that can be guaranteed in joint source-channel coding, to within a prescribed excess-distortion probability. We show that the gap between this threshold and the optimal average distortion is governed by a constant that we call the joint source-channel dispersion. This constant can be easily computed, since it is the sum of the so…
▽ More
In this work we investigate the behavior of the distortion threshold that can be guaranteed in joint source-channel coding, to within a prescribed excess-distortion probability. We show that the gap between this threshold and the optimal average distortion is governed by a constant that we call the joint source-channel dispersion. This constant can be easily computed, since it is the sum of the source and channel dispersions, previously derived. The resulting performance is shown to be better than that of any separation-based scheme. For the proof, we use unequal error protection channel coding, thus we also evaluate the dispersion of that setting.
△ Less
Submitted 7 December, 2011; v1 submitted 28 September, 2011;
originally announced September 2011.
-
Finite Dimensional Infinite Constellations
Authors:
Amir Ingber,
Ram Zamir,
Meir Feder
Abstract:
In the setting of a Gaussian channel without power constraints, proposed by Poltyrev, the codewords are points in an n-dimensional Euclidean space (an infinite constellation) and the tradeoff between their density and the error probability is considered. The capacity in this setting is the highest achievable normalized log density (NLD) with vanishing error probability. This capacity as well as er…
▽ More
In the setting of a Gaussian channel without power constraints, proposed by Poltyrev, the codewords are points in an n-dimensional Euclidean space (an infinite constellation) and the tradeoff between their density and the error probability is considered. The capacity in this setting is the highest achievable normalized log density (NLD) with vanishing error probability. This capacity as well as error exponent bounds for this setting are known. In this work we consider the optimal performance achievable in the fixed blocklength (dimension) regime. We provide two new achievability bounds, and extend the validity of the sphere bound to finite dimensional infinite constellations. We also provide asymptotic analysis of the bounds: When the NLD is fixed, we provide asymptotic expansions for the bounds that are significantly tighter than the previously known error exponent results. When the error probability is fixed, we show that as n grows, the gap to capacity is inversely proportional (up to the first order) to the square-root of n where the proportion constant is given by the inverse Q-function of the allowed error probability, times the square root of 1/2. In an analogy to similar result in channel coding, the dispersion of infinite constellations is 1/2nat^2 per channel use. All our achievability results use lattices and therefore hold for the maximal error probability as well. Connections to the error exponent of the power constrained Gaussian channel and to the volume-to-noise ratio as a figure of merit are discussed. In addition, we demonstrate the tightness of the results numerically and compare to state-of-the-art coding schemes.
△ Less
Submitted 5 September, 2011; v1 submitted 1 March, 2011;
originally announced March 2011.
-
The Dispersion of Lossy Source Coding
Authors:
Amir Ingber,
Yuval Kochman
Abstract:
In this work we investigate the behavior of the minimal rate needed in order to guarantee a given probability that the distortion exceeds a prescribed threshold, at some fixed finite quantization block length. We show that the excess coding rate above the rate-distortion function is inversely proportional (to the first order) to the square root of the block length. We give an explicit expression f…
▽ More
In this work we investigate the behavior of the minimal rate needed in order to guarantee a given probability that the distortion exceeds a prescribed threshold, at some fixed finite quantization block length. We show that the excess coding rate above the rate-distortion function is inversely proportional (to the first order) to the square root of the block length. We give an explicit expression for the proportion constant, which is given by the inverse Q-function of the allowed excess distortion probability, times the square root of a constant, termed the excess distortion dispersion. This result is the dual of a corresponding channel coding result, where the dispersion above is the dual of the channel dispersion. The work treats discrete memoryless sources, as well as the quadratic-Gaussian case.
△ Less
Submitted 13 February, 2011;
originally announced February 2011.
-
Parallel Bit Interleaved Coded Modulation
Authors:
Amir Ingber,
Meir Feder
Abstract:
A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called Parallel BICM, L identical binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless channel, the new…
▽ More
A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called Parallel BICM, L identical binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless channel, the new scheme is proven to be equivalent to a binary memoryless channel. Therefore the scheme enables one to easily design coded modulation schemes using a simple binary code that was designed for that binary channel. The overall performance of the coded modulation scheme is analytically evaluated based on the performance of the binary code over the binary channel. The new scheme is analyzed from an information theoretic viewpoint, where the capacity, error exponent and channel dispersion are considered. The capacity of the scheme is identical to the BICM capacity. The error exponent of the scheme is numerically compared to a recently proposed mismatched-decoding exponent analysis of BICM.
△ Less
Submitted 17 August, 2010; v1 submitted 8 July, 2010;
originally announced July 2010.