Skip to main content

Showing 1–7 of 7 results for author: Mallia, A

.
  1. arXiv:2405.17093  [pdf, other

    cs.IR

    DeeperImpact: Optimizing Sparse Learned Index Structures

    Authors: Soyuj Basnet, Jerry Gou, Antonio Mallia, Torsten Suel

    Abstract: A lot of recent work has focused on sparse learned indexes that use deep neural architectures to significantly improve retrieval quality while kee** the efficiency benefits of the inverted index. While such sparse learned structures achieve effectiveness far beyond those of traditional inverted index-based rankers, there is still a gap in effectiveness to the best dense retrievers, or even to sp… ▽ More

    Submitted 6 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2405.01117  [pdf, other

    cs.IR

    Faster Learned Sparse Retrieval with Block-Max Pruning

    Authors: Antonio Mallia, Torten Suel, Nicola Tonellotto

    Abstract: Learned sparse retrieval systems aim to combine the effectiveness of contextualized language models with the scalability of conventional data structures such as inverted indexes. Nevertheless, the indexes generated by these systems exhibit significant deviations from the ones that use traditional retrieval models, leading to a discrepancy in the performance of existing query optimizations that wer… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024 (short paper track)

  3. arXiv:2401.06703  [pdf, other

    cs.IR

    Improved Learned Sparse Retrieval with Corpus-Specific Vocabularies

    Authors: Puxuan Yu, Antonio Mallia, Matthias Petri

    Abstract: We explore leveraging corpus-specific vocabularies that improve both efficiency and effectiveness of learned sparse retrieval systems. We find that pre-training the underlying BERT model on the target corpus, specifically targeting different vocabulary sizes incorporated into the document expansion process, improves retrieval quality by up to 12% while in some scenarios decreasing latency by up to… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: ECIR 2024 Full Paper

  4. arXiv:2204.11314  [pdf, other

    cs.IR

    Faster Learned Sparse Retrieval with Guided Traversal

    Authors: Antonio Mallia, Joel Mackenzie, Torsten Suel, Nicola Tonellotto

    Abstract: Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very expensive to run, making them difficult to deploy under strict latency constraints. To address this limitation, recent studies have proposed new families of learned… ▽ More

    Submitted 24 April, 2022; originally announced April 2022.

    Comments: Accepted at SIGIR 2022

  5. arXiv:2104.12016  [pdf, other

    cs.IR

    Learning Passage Impacts for Inverted Indexes

    Authors: Antonio Mallia, Omar Khattab, Nicola Tonellotto, Torsten Suel

    Abstract: Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as BERT. In this paper, we propose DeepImpact, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index. Compared to exis… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

  6. arXiv:2003.08276  [pdf, other

    cs.IR

    Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

    Authors: Jimmy Lin, Joel Mackenzie, Chris Kamphuis, Craig Macdonald, Antonio Mallia, MichaƂ Siedlaczek, Andrew Trotman, Arjen de Vries

    Abstract: There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers tha… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  7. arXiv:1808.02831  [pdf, other

    cs.CL

    Debunking Fake News One Feature at a Time

    Authors: Melanie Tosik, Antonio Mallia, Kedar Gangopadhyay

    Abstract: Identifying the stance of a news article body with respect to a certain headline is the first step to automated fake news detection. In this paper, we introduce a 2-stage ensemble model to solve the stance detection task. By using only hand-crafted features as input to a gradient boosting classifier, we are able to achieve a score of 9161.5 out of 11651.25 (78.63%) on the official Fake News Challe… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.