Skip to main content

Showing 1–21 of 21 results for author: Venturini, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18812  [pdf, other

    cs.IR

    Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

    Authors: Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

    Abstract: Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term fre… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2404.02805  [pdf, other

    cs.IR

    Efficient Multi-Vector Dense Retrieval Using Bit Vectors

    Authors: Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

    Abstract: Dense retrieval techniques employ pre-trained large language models to build a high-dimensional representation of queries and passages. These representations compute the relevance of a passage w.r.t. to a query using efficient similarity measures. In this line, multi-vector representations show improved effectiveness at the expense of a one-order-of-magnitude increase in memory footprint and query… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2306.08960  [pdf, other

    cs.CV cs.LG

    Neural Network Compression using Binarization and Few Full-Precision Weights

    Authors: Franco Maria Nardini, Cosimo Rulli, Salvatore Trani, Rossano Venturini

    Abstract: Quantization and pruning are two effective Deep Neural Networks model compression methods. In this paper, we propose Automatic Prune Binarization (APB), a novel compression technique combining quantization with pruning. APB enhances the representational capability of binary networks using a few full-precision weights. Our technique jointly maximizes the accuracy of the network while minimizing its… ▽ More

    Submitted 15 September, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 15 pages, 6 figures, 3 tables

    ACM Class: I.2.6

  4. arXiv:2302.09239  [pdf, ps, other

    cs.DS

    Faster Wavelet Tree Queries

    Authors: Matteo Ceregini, Florian Kurpicz, Rossano Venturini

    Abstract: Given a text, rank and select queries return the number of occurrences of a character up to a position (rank) or the position of a character with a given rank (select). These queries have applications in, e.g., compression, computational geometry, and most notably pattern matching in the form of the backward search -- the backbone of many compressed full-text indices. Currently, in practice, for t… ▽ More

    Submitted 8 November, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

  5. arXiv:2202.10728  [pdf, other

    cs.LG cs.AI cs.IR cs.PF

    Distilled Neural Networks for Efficient Learning to Rank

    Authors: F. M. Nardini, C. Rulli, S. Trani, R. Venturini

    Abstract: Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU.… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. An Optimal Algorithm for Finding Champions in Tournament Graphs

    Authors: Lorenzo Beretta, Franco Maria Nardini, Roberto Trani, Rossano Venturini

    Abstract: A tournament graph is a complete directed graph, which can be used to model a round-robin tournament between $n$ players. In this paper, we address the problem of finding a champion of the tournament, also known as Copeland winner, which is a player that wins the highest number of matches. In detail, we aim to investigate algorithms that find the champion by playing a low number of matches. Solvin… ▽ More

    Submitted 18 April, 2023; v1 submitted 26 November, 2021; originally announced November 2021.

  7. arXiv:2011.07143  [pdf, ps, other

    cs.DS

    Adaptive Learning of Compressible Strings

    Authors: Gabriele Fici, Nicola Prezza, Rossano Venturini

    Abstract: Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $σn/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $σ$ is the size of the alphabet of $S$ and $n$ its length,… ▽ More

    Submitted 19 October, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: Accepted for publication in Theoretical Computer Science

  8. Practical Trade-Offs for the Prefix-Sum Problem

    Authors: Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: Given an integer array A, the prefix-sum problem is to answer sum(i) queries that return the sum of the elements in A[0..i], knowing that the integers in A can be changed. It is a classic problem in data structure design with a wide range of applications in computing from coding to databases. In this work, we propose and compare several and practical solutions to this problem, showing that new tra… ▽ More

    Submitted 6 October, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: Accepted by "Software: Practice and Experience", 2020

    Journal ref: Softw. Pract. Exp. 51(5): 921-949 (2021)

  9. Efficient and Effective Query Auto-Completion

    Authors: Simon Gog, Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact s… ▽ More

    Submitted 10 June, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

    Comments: Published in SIGIR 2020

    Journal ref: SIGIR 2020: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. July 2020. Pages 2271-2280

  10. arXiv:2003.11835  [pdf, ps, other

    cs.DS

    Succinct Dynamic Ordered Sets with Random Access

    Authors: Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: The representation of a dynamic ordered set of $n$ integer keys drawn from a universe of size $m$ is a fundamental data structuring problem. Many solutions to this problem achieve optimal time but take polynomial space, therefore preserving time optimality in the \emph{compressed} space regime is the problem we address in this work. For a polynomial universe $m = n^{Θ(1)}$, we give a solution that… ▽ More

    Submitted 26 March, 2020; originally announced March 2020.

  11. Techniques for Inverted Index Compression

    Authors: Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent performance requirements imposed by the heavy load of queries, the inverted index stores billions of integers that must be searched efficiently. In this scenario,… ▽ More

    Submitted 3 August, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

    Comments: Accepted by ACM Computing Surveys (CSUR), 2020

    Journal ref: ACM Computing Surveys. Volume 53. Issue 6. November 2021. Article No.:125 pp 1-36

  12. Compressed Indexes for Fast Search of Semantic Data

    Authors: Raffaele Perego, Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is devising a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching operations. This problem lies at the heart of delivering good practical performance for the resolution of complex SPARQL queries on large RDF datasets. In this w… ▽ More

    Submitted 27 February, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Published in IEEE Transactions on Knowledge and Data Engineering (TKDE), 14 January 2020

    Journal ref: IEEE Trans. Knowl. Data Eng. 33(9): 3187-3198 (2021)

  13. arXiv:1806.09447  [pdf, other

    cs.IR cs.DB

    Handling Massive N-Gram Datasets Efficiently

    Authors: Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Regarding the problem of indexing, we describe compressed, exa… ▽ More

    Submitted 27 February, 2020; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: Published in ACM Transactions on Information Systems (TOIS), February 2019, Article No: 25

    Journal ref: ACM Trans. Inf. Syst. 37(2): 25:1-25:41 (2019)

  14. On Optimally Partitioning Variable-Byte Codes

    Authors: Giulio Ermanno Pibiri, Rossano Venturini

    Abstract: The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to be compressed are small that is the typical case for inverted indexes. This paper shows that the compression ratio of Variable-Byte can be improved by 2x by adop… ▽ More

    Submitted 27 February, 2020; v1 submitted 29 April, 2018; originally announced April 2018.

    Comments: Published in IEEE Transactions on Knowledge and Data Engineering (TKDE), 15 April 2019

    Journal ref: IEEE Trans. Knowl. Data Eng. 32(9): 1812-1823 (2020)

  15. arXiv:1610.02865  [pdf, other

    cs.DS

    An Encoding for Order-Preserving Matching

    Authors: Travis Gagie, Giovanni Manzini, Rossano Venturini

    Abstract: Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in… ▽ More

    Submitted 17 February, 2017; v1 submitted 10 October, 2016; originally announced October 2016.

  16. arXiv:1312.0526  [pdf, other

    cs.DS

    Cache-Oblivious Peeling of Random Hypergraphs

    Authors: Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, Sebastiano Vigna

    Abstract: The computation of a peeling order in a randomly generated hypergraph is the most time-consuming step in a number of constructions, such as perfect hashing schemes, random $r$-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available i… ▽ More

    Submitted 2 December, 2013; originally announced December 2013.

  17. arXiv:1307.3872  [pdf, other

    cs.IT cs.DS

    Bicriteria data compression

    Authors: Andrea Farruggia, Paolo Ferragina, Antonio Frangioni, Rossano Venturini

    Abstract: The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve effective compression ratio and very efficient decompression speed. Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of its decompres… ▽ More

    Submitted 15 July, 2013; originally announced July 2013.

  18. arXiv:0906.4692  [pdf, ps, other

    cs.DS cs.IT

    On optimally partitioning a text to improve its compression

    Authors: Paolo Ferragina, Igor Nitto, Rossano Venturini

    Abstract: In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in the context of table compression, and then further elaborated and extended to strings and trees. Unfortunately, the literature off… ▽ More

    Submitted 25 June, 2009; originally announced June 2009.

  19. arXiv:0802.0835  [pdf, ps, other

    cs.DS cs.IT

    Bit-Optimal Lempel-Ziv compression

    Authors: Paolo Ferragina, Igor Nitto, Rossano Venturini

    Abstract: One of the most famous and investigated lossless data-compression scheme is the one introduced by Lempel and Ziv about 40 years ago. This compression scheme is known as "dictionary-based compression" and consists of squeezing an input string by replacing some of its substrings with (shorter) codewords which are actually pointers to a dictionary of phrases built as the string is processed. Surpri… ▽ More

    Submitted 6 February, 2008; originally announced February 2008.

  20. arXiv:0712.3360  [pdf, ps, other

    cs.DS

    Compressed Text Indexes:From Theory to Practice!

    Authors: Paolo Ferragina, Rodrigo Gonzalez, Gonzalo Navarro, Rossano Venturini

    Abstract: A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the previous decade, whose indexes required several times the size of the text. Although it is relatively new, this technology has matured up to a point where theoretical research is giving way to practical… ▽ More

    Submitted 20 December, 2007; originally announced December 2007.

    ACM Class: F.2.2; H.2.1; H.3.2; H.3.3

  21. arXiv:0708.3734  [pdf, ps, other

    cs.DC

    Searching for a dangerous host: randomized vs. deterministic

    Authors: Igor Nitto, Rossano Venturini

    Abstract: A Black Hole is an harmful host in a network that destroys incoming agents without leaving any trace of such event. The problem of locating the black hole in a network through a team of agent coordinated by a common protocol is usually referred in literature as the Black Hole Search problem (or BHS for brevity) and it is a consolidated research topic in the area of distributed algorithms. The ai… ▽ More

    Submitted 28 August, 2007; originally announced August 2007.