-
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking
Authors:
Ferdinand Schlatt,
Maik Fröbe,
Harrisen Scells,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon,
Benno Stein,
Martin Potthast,
Matthias Hagen
Abstract:
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss func…
▽ More
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss functions) can help to improve LLM ranker distillation, we construct and release a new distillation dataset: Rank-DistiLLM. In our experiments, cross-encoders trained on Rank-DistiLLM reach the effectiveness of LLMs while being orders of magnitude more efficient. Our code and data is available at https://github.com/webis-de/msmarco-llm-distillation.
△ Less
Submitted 16 June, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders
Authors:
Ferdinand Schlatt,
Maik Fröbe,
Harrisen Scells,
Shengyao Zhuang,
Bevan Koopman,
Guido Zuccon,
Benno Stein,
Martin Potthast,
Matthias Hagen
Abstract:
Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder archi…
▽ More
Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder. In Cranfield-style experiments on TREC Deep Learning and TIREx, the Set-Encoder is as effective as state-of-the-art listwise models while improving efficiency and robustness to input permutations. Interestingly, a pointwise model is similarly effective, but when additionally requiring the models to consider novelty, the Set-Encoder is more effective than its pointwise counterpart and retains its advantageous properties compared to other listwise models. Our code and models are publicly available at https://github.com/webis-de/set-encoder.
△ Less
Submitted 16 June, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Investigating the Effects of Sparse Attention on Cross-Encoders
Authors:
Ferdinand Schlatt,
Maik Fröbe,
Matthias Hagen
Abstract:
Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token in…
▽ More
Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token interactions can be reduced without harming the re-ranking effectiveness. Experimenting with asymmetric attention and different window sizes, we find that the query tokens do not need to attend to the passage or document tokens for effective re-ranking and that very small window sizes suffice. In our experiments, even windows of 4 tokens still yield effectiveness on par with previous cross-encoders while reducing the memory requirements by at least 22% / 59% and being 1% / 43% faster at inference time for passages / documents.
△ Less
Submitted 20 March, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.