Skip to main content

Showing 1–30 of 30 results for author: Hofstätter, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18796  [pdf, other

    cs.CL cs.AI

    Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

    Authors: Pat Verga, Sebastian Hofstatter, Sophia Althammer, Yixuan Su, Aleksandra Piktus, Arkady Arkhangorodsky, Minjie Xu, Naomi White, Patrick Lewis

    Abstract: As Large Language Models (LLMs) have become more advanced, they have outpaced our abilities to accurately evaluate their quality. Not only is finding data to adequately probe particular model properties difficult, but evaluating the correctness of a model's freeform generation alone is a challenge. To address this, many evaluations now rely on using LLMs themselves as judges to score the quality o… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2312.02969  [pdf, other

    cs.CL cs.IR

    Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

    Authors: Xinyu Zhang, Sebastian Hofstätter, Patrick Lewis, Raphael Tang, Jimmy Lin

    Abstract: Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art. However, current works in this direction all depend on the GPT models, making it a single point of failure in scientific reproducibility. Moreover, it raises the concern that the current research findings only hold for GPT models but not LLM in general. In this work, we lift this pre-condition and build for… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  3. arXiv:2309.06131  [pdf, other

    cs.IR cs.CL

    Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

    Authors: Sophia Althammer, Guido Zuccon, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury

    Abstract: Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted at SIGIR-AP 2023

  4. arXiv:2305.15048  [pdf, other

    cs.CL cs.IR

    Ranger: A Toolkit for Effect-Size Based Multi-Task Evaluation

    Authors: Mete Sertkan, Sophia Althammer, Sebastian Hofstätter

    Abstract: In this paper, we introduce Ranger - a toolkit to facilitate the easy use of effect-size-based meta-analysis for multi-task evaluation in NLP and IR. We observed that our communities often face the challenge of aggregating results over incomparable metrics and scenarios, which makes conclusions and take-away messages less reliable. With Ranger, we aim to address this issue by providing a task-agno… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023 (System Demonstrations)

  5. arXiv:2209.14290  [pdf, other

    cs.CL cs.IR

    FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: Retrieval-augmented generation models offer many benefits over standalone language models: besides a textual answer to a given query they provide provenance items retrieved from an updateable knowledge base. However, they are also more complex systems and need to handle long inputs. In this work, we introduce FiD-Light to strongly increase the efficiency of the state-of-the-art retrieval-augmented… ▽ More

    Submitted 28 September, 2022; originally announced September 2022.

  6. TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

    Authors: Sophia Althammer, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury

    Abstract: Robust test collections are crucial for Information Retrieval research. Recently there is a growing interest in evaluating retrieval systems for domain-specific retrieval tasks, however these tasks often lack a reliable test collection with human-annotated relevance assessments following the Cranfield paradigm. In the medical domain, the TripClick collection was recently proposed, which contains c… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

    Comments: To be published at CIKM 2022 as resource paper

  7. arXiv:2207.03030  [pdf, other

    cs.CL cs.IR

    Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

    Authors: Sebastian Hofstätter, Jiecao Chen, Karthik Raman, Hamed Zamani

    Abstract: This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted at the ICML 2022 Workshop on Knowledge Retrieval and Language Models (KRLM)

  8. arXiv:2206.12993  [pdf, other

    cs.IR cs.CL

    Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

    Authors: Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury

    Abstract: Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems. In contrast to term-based matching, DR projects queries and documents into a dense vector space and retrieves results via (approximate) nearest neighbor search. Deploying a new system, such as DR, inevitably involves tradeoffs in aspects of its perf… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

  9. arXiv:2203.13088  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

    Authors: Sebastian Hofstätter, Omar Khattab, Sophia Althammer, Mete Sertkan, Allan Hanbury

    Abstract: Recent progress in neural information retrieval has demonstrated large gains in effectiveness, while often sacrificing the efficiency and interpretability of the neural model compared to classical approaches. This paper proposes ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer's reduction… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

  10. arXiv:2201.01614  [pdf, other

    cs.IR

    PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval

    Authors: Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury

    Abstract: Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retr… ▽ More

    Submitted 14 August, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR 2022

  11. arXiv:2201.00365  [pdf, ps, other

    cs.IR cs.CL

    Establishing Strong Baselines for TripClick Health Retrieval

    Authors: Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

    Abstract: We present strong Transformer-based re-ranking and dense retrieval baselines for the recently released TripClick health ad-hoc retrieval collection. We improve the - originally too noisy - training data with a simple negative sampling policy. We achieve large gains over BM25 in the re-ranking task of TripClick, which were not achieved with the original baselines. Furthermore, we study the impact o… ▽ More

    Submitted 2 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR 2022

  12. arXiv:2110.05601  [pdf

    cs.HC cs.IR

    A Time-Optimized Content Creation Workflow for Remote Teaching

    Authors: Sebastian Hofstätter, Sophia Althammer, Mete Sertkan, Allan Hanbury

    Abstract: We describe our workflow to create an engaging remote learning experience for a university course, while minimizing the post-production time of the educators. We make use of ubiquitous and commonly free services and platforms, so that our workflow is inclusive for all educators and provides polished experiences for students. Our learning materials provide for each lecture: 1) a recorded video, upl… ▽ More

    Submitted 13 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted at SIGCSE-TS 2022

  13. arXiv:2106.05768  [pdf, other

    cs.CL cs.IR

    Linguistically Informed Masking for Representation Learning in the Patent Domain

    Authors: Sophia Althammer, Mark Buckley, Sebastian Hofstätter, Allan Hanbury

    Abstract: Domain-specific contextualized language models have demonstrated substantial effectiveness gains for domain-specific downstream tasks, like similarity matching, entity recognition or information retrieval. However successfully applying such models in highly specific language domains requires domain adaptation of the pre-trained models. In this paper we propose the empirically motivated Linguistica… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: Published at SIGIR 2021 PatentSemTech workshop

  14. arXiv:2105.09816  [pdf, other

    cs.IR cs.CL

    Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

    Authors: Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury

    Abstract: An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the d… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at SIGIR 2021 (Full Paper Track)

  15. arXiv:2104.09393  [pdf, other

    cs.IR cs.AI cs.LG

    Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Tra… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2007.10434

  16. arXiv:2104.06967  [pdf, other

    cs.IR cs.CL

    Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

    Authors: Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

    Abstract: A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows. The neural IR community made great advancements in training effective dual-encoder dense retrieval (DR) models recently. A dense text retrieval model uses a single vector representation per query and passage to score a match, which enables low-… ▽ More

    Submitted 26 May, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: Accepted at SIGIR 2021 (Full Paper track)

  17. arXiv:2101.06980  [pdf, other

    cs.IR cs.CL

    Mitigating the Position Bias of Transformer Models in Passage Re-Ranking

    Authors: Sebastian Hofstätter, Aldo Lipani, Sophia Althammer, Markus Zlabinger, Allan Hanbury

    Abstract: Supervised machine learning models and their evaluation strongly depends on the quality of the underlying dataset. When we search for a relevant piece of information it may appear anywhere in a given passage. However, we observe a bias in the position of the correct answer in the text in two popular Question Answering datasets used for passage re-ranking. The excessive favoring of earlier position… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

    Comments: Accepted at ECIR 2021 (Full paper track)

  18. arXiv:2012.11405  [pdf, other

    cs.IR

    Cross-domain Retrieval in the Legal and Patent Domains: a Reproducibility Study

    Authors: Sophia Althammer, Sebastian Hofstätter, Allan Hanbury

    Abstract: Domain specific search has always been a challenging information retrieval task due to several challenges such as the domain specific language, the unique task setting, as well as the lack of accessible queries and corresponding relevance judgements. In the last years, pretrained language models, such as BERT, revolutionized web and news search. Naturally, the community aims to adapt these advance… ▽ More

    Submitted 19 January, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: Accepted at ECIR 2021 (Reproducibility paper track)

  19. arXiv:2011.07368  [pdf, other

    cs.IR cs.AI cs.LG

    Conformer-Kernel with Query Term Independence at TREC 2020 Deep Learning Track

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track. In particular, we study the impact of incorporating: (i) Explicit term matching to complement matching based on learned representations (i.e., the "Duet principle"), (ii) query term independence (i.e., the "QTI assumption") to scale the model to the full retrieval setting, and (iii)… ▽ More

    Submitted 11 February, 2021; v1 submitted 14 November, 2020; originally announced November 2020.

  20. arXiv:2010.02666  [pdf, other

    cs.IR

    Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation

    Authors: Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, Allan Hanbury

    Abstract: Retrieval and ranking models are the backbone of many applications such as web search, open domain QA, or text-based recommender systems. The latency of neural ranking models at query time is largely dependent on the architecture and deliberate choices by their designers to trade-off effectiveness for higher efficiency. This focus on low query latency of a rising number of efficient ranking archit… ▽ More

    Submitted 22 January, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Updated paper with dense retrieval results and query-level analysis

  21. arXiv:2008.05363  [pdf, other

    cs.IR cs.CL

    Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

    Authors: Sebastian Hofstätter, Markus Zlabinger, Mete Sertkan, Michael Schröder, Allan Hanbury

    Abstract: There are many existing retrieval and question answering datasets. However, most of them either focus on ranked list evaluation or single-candidate question answering. This divide makes it challenging to properly evaluate approaches concerned with ranking documents and providing snippets or answers for a given query. In this work, we present FiRA: a novel dataset of Fine-Grained Relevance Annotati… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: Accepted at CIKM 2020 (Resource Track)

  22. arXiv:2007.10434  [pdf, other

    cs.IR cs.CL cs.LG

    Conformer-Kernel with Query Term Independence for Document Retrieval

    Authors: Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

    Abstract: The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to BERT-based ranking models. In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption. Furthermore, to reduce the memory comp… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

  23. arXiv:2005.08367  [pdf, other

    cs.IR

    DEXA: Supporting Non-Expert Annotators with Dynamic Examples from Experts

    Authors: Markus Zlabinger, Marta Sabou, Sebastian Hofstätter, Mete Sertkan, Allan Hanbury

    Abstract: The success of crowdsourcing based annotation of text corpora depends on ensuring that crowdworkers are sufficiently well-trained to perform the annotation task accurately. To that end, a frequent approach to train annotators is to provide instructions and a few example cases that demonstrate how the task should be performed (referred to as the CONTROL approach). These globally defined "task-level… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: 4 pages, 1 figure, 3 tables, accepted to SIGIR2020

  24. arXiv:2005.04908  [pdf, other

    cs.IR

    Local Self-Attention over Long Text for Efficient Document Retrieval

    Authors: Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury

    Abstract: Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: Accepted at SIGIR 2020 (short paper)

  25. arXiv:2002.01854  [pdf, other

    cs.IR

    Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking

    Authors: Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

    Abstract: Search engines operate under a strict time constraint as a fast response is paramount to user satisfaction. Thus, neural re-ranking models have a limited time-budget to re-rank documents. Given the same amount of time, a faster re-ranking model can incorporate more documents than a less efficient one, leading to a higher effectiveness. To utilize this property, we propose TK (Transformer-Kernel):… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

    Comments: Accepted at ECAI 2020 (full paper). arXiv admin note: text overlap with arXiv:1912.01385

  26. arXiv:2001.05357  [pdf, ps, other

    cs.IR

    DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations

    Authors: Markus Zlabinger, Sebastian Hofstätter, Navid Rekabsaz, Allan Hanbury

    Abstract: The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

    Comments: 7 pages; 3 tables; accepted as short-paper to the 42nd European Conference on Information Retrieval (ECIR), Lisbon 2020

  27. arXiv:1912.04713  [pdf, other

    cs.IR

    Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results

    Authors: Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

    Abstract: In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer includes a categorized overview of the available queries, a… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.

    Comments: Accepted at ECIR 2020 (demo paper)

  28. arXiv:1912.01385  [pdf, other

    cs.IR cs.CL

    TU Wien @ TREC Deep Learning '19 -- Simple Contextualization for Re-ranking

    Authors: Sebastian Hofstätter, Markus Zlabinger, Allan Hanbury

    Abstract: The usage of neural network models puts multiple objectives in conflict with each other: Ideally we would like to create a neural model that is effective, efficient, and interpretable at the same time. However, in most instances we have to choose which property is most important to us. We used the opportunity of the TREC 2019 Deep Learning track to evaluate the effectiveness of a balanced neural r… ▽ More

    Submitted 3 December, 2019; originally announced December 2019.

    Comments: Presented at TREC 2019

  29. arXiv:1907.04614  [pdf, other

    cs.IR

    Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

    Authors: Sebastian Hofstätter, Allan Hanbury

    Abstract: Establishing a docker-based replicability infrastructure offers the community a great opportunity: measuring the run time of information retrieval systems. The time required to present query results to a user is paramount to the users satisfaction. Recent advances in neural IR re-ranking models put the issue of query latency at the forefront. They bring a complex trade-off between performance and… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

    Comments: Position paper @ SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC)

  30. arXiv:1904.12683  [pdf, other

    cs.IR

    On the Effect of Low-Frequency Terms on Neural-IR Models

    Authors: Sebastian Hofstätter, Navid Rekabsaz, Carsten Eickhoff, Allan Hanbury

    Abstract: Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR models. In this paper, we analyze the effects of low-… ▽ More

    Submitted 30 April, 2019; v1 submitted 29 April, 2019; originally announced April 2019.

    Comments: Accepted at SIGIR'19