Skip to main content

Showing 1–15 of 15 results for author: Scells, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.07920  [pdf, other

    cs.IR

    A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking

    Authors: Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

    Abstract: Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss func… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  2. arXiv:2404.06912  [pdf, other

    cs.IR

    Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

    Authors: Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

    Abstract: Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder archi… ▽ More

    Submitted 16 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  3. arXiv:2401.06320  [pdf, other

    cs.IR cs.CL

    Zero-shot Generative Large Language Models for Systematic Review Screening Automation

    Authors: Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon

    Abstract: Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions. Conducting such reviews is often resource- and time-intensive, especially in the screening phase, where abstracts of publications are assessed for inclusion in a review. This study investigates the effectiveness of using zero-shot large language models~(LLMs… ▽ More

    Submitted 31 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted to ECIR2024 full paper (findings)

  4. Evaluating Generative Ad Hoc Information Retrieval

    Authors: Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guido Zuccon, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the establishe… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 14 pages, 6 figures, 1 table. Published at SIGIR'24 perspective paper track

  5. arXiv:2309.05238  [pdf, other

    cs.IR cs.AI

    Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation

    Authors: Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon

    Abstract: Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries. Prioritising the most important documents ensures that subsequent review steps can be carried out more efficiently and effectively. The current state of the art uses the final title of the review as a query to rank the documents using BERT-based neural rankers. However, th… ▽ More

    Submitted 23 November, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Preprints for Accepted paper in SIGIR-AP-2023, note that this is updated from ACM published paper. The working title was wrong in the ACM-published version due to a bug in data preprocessing; however, this does not have any influence on the final conclusion/observation made from the paper

  6. arXiv:2306.16668  [pdf, other

    cs.IR

    Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models

    Authors: Guido Zuccon, Harrisen Scells, Shengyao Zhuang

    Abstract: As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search. This interest has become particularly relevant as the energy consumption of information retrieval models has risen with new neural models based on large language models, leading to an associated… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  7. The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

    Authors: Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish the… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

    Comments: SIGIR 2023 resource paper, 13 pages

  8. arXiv:2302.03495  [pdf, other

    cs.IR cs.AI

    Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

    Authors: Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

    Abstract: Systematic reviews are comprehensive reviews of the literature for a highly focused research question. These reviews are often treated as the highest form of evidence in evidence-based medicine, and are the key strategy to answer research questions in the medical field. To create a high-quality systematic review, complex Boolean queries are often constructed to retrieve studies for the review topi… ▽ More

    Submitted 9 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  9. arXiv:2212.09017  [pdf, other

    cs.IR cs.AI cs.LG

    Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search

    Authors: Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

    Abstract: Medical systematic reviews typically require assessing all the documents retrieved by a search. The reason is two-fold: the task aims for ``total recall''; and documents retrieved using Boolean search are an unordered set, and thus it is unclear how an assessor could examine only a subset. Screening prioritisation is the process of ranking the (unordered) set of retrieved documents, allowing asses… ▽ More

    Submitted 18 December, 2022; originally announced December 2022.

  10. arXiv:2211.15833  [pdf, other

    cs.CL cs.AI

    Guiding Neural Entity Alignment with Compatibility

    Authors: Bing Liu, Harrisen Scells, Wen Hua, Guido Zuccon, Genghong Zhao, Xia Zhang

    Abstract: Entity Alignment (EA) aims to find equivalent entities between two Knowledge Graphs (KGs). While numerous neural EA models have been devised, they are mainly learned using labelled data only. In this work, we argue that different entities within one KG should have compatible counterparts in the other KG due to the potential dependencies among the entities. Making compatible predictions thus should… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  11. arXiv:2209.08687  [pdf, other

    cs.IR cs.AI cs.LG

    Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search

    Authors: Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

    Abstract: High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in develo** the search queries. Queries in this conte… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: This paper is currently in submission with Intelligent Systems with Applications Journal Technology-Assisted Review Systems Special issue and is under peer review. arXiv admin note: text overlap with arXiv:2112.00277

  12. From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search

    Authors: Shuai Wang, Harrisen Scells, Justin Clark, Bevan Koopman, Guido Zuccon

    Abstract: Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called `seed studies', prior to query formulation. Seed studies help verify the effectiveness of a q… ▽ More

    Submitted 24 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted and To be appeared in SIGIR 2022 proceeding

  13. arXiv:2112.04090  [pdf, other

    cs.IR

    Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

    Authors: Shuai Wang, Harrisen Scells, Ahmed Mourad, Guido Zuccon

    Abstract: Screening or assessing studies is critical to the quality and outcomes of a systematic review. Typically, a Boolean query retrieves the set of studies to screen. As the set of studies retrieved is unordered, screening all retrieved studies is usually required for high-quality systematic reviews. Screening prioritisation, or in other words, ranking the set of studies, enables downstream activities… ▽ More

    Submitted 7 December, 2021; originally announced December 2021.

    Comments: To be published in the 44th European Conference on Information Retrieval

  14. MeSH Term Suggestion for Systematic Review Literature Search

    Authors: Shuai Wang, Hang Li, Harrisen Scells, Daniel Locke, Guido Zuccon

    Abstract: High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in develo** the search queries. Queries in this conte… ▽ More

    Submitted 2 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: To be published in Australasian Document Computing Symposium 2021, Melbourne, Australia

  15. arXiv:2110.06474  [pdf, other

    cs.CL cs.AI

    ActiveEA: Active Learning for Neural Entity Alignment

    Authors: Bing Liu, Harrisen Scells, Guido Zuccon, Wen Hua, Genghong Zhao

    Abstract: Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion. Current mainstream methods -- neural EA models -- rely on training with seed alignment, i.e., a set of pre-aligned entity pairs which are very costly to annotate. In this paper, we devise a novel Active Learning (AL) framework for neural EA, aiming to create highly… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.