Skip to main content

Showing 1–8 of 8 results for author: Lu, X H

.
  1. arXiv:2407.03618  [pdf, other

    cs.IR cs.CL

    BM25S: Orders of magnitude faster lexical search via eager sparse scoring

    Authors: Xing Han Lù

    Abstract: We introduce BM25S, an efficient Python-based implementation of BM25 that only depends on Numpy and Scipy. BM25S achieves up to a 500x speedup compared to the most popular Python-based framework by eagerly computing BM25 scores during indexing and storing them into sparse matrices. It also achieves considerable speedups compared to highly optimized Java-based implementations, which are used by pop… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Technical Report

  2. arXiv:2402.05930  [pdf, other

    cs.CL cs.CV cs.LG

    WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

    Authors: Xing Han Lù, Zdeněk Kasner, Siva Reddy

    Abstract: We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2307.16877  [pdf, other

    cs.CL cs.AI

    Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

    Authors: Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, Siva Reddy

    Abstract: Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and flue… ▽ More

    Submitted 17 April, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: accepted at TACL

  4. arXiv:2305.05379  [pdf, other

    cs.SE cs.LG

    TASTY: A Transformer based Approach to Space and Time complexity

    Authors: Kaushik Moudgalya, Ankit Ramakrishnan, Vamsikrishna Chemudupati, Xing Han Lu

    Abstract: Code based Language Models (LMs) have shown very promising results in the field of software engineering with applications such as code refinement, code completion and generation. However, the task of time and space complexity classification from code has not been extensively explored due to a lack of datasets, with prior endeavors being limited to Java. In this project, we aim to address these gap… ▽ More

    Submitted 24 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  5. arXiv:2304.01412  [pdf, other

    cs.CL

    The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

    Authors: Xing Han Lu, Siva Reddy, Harm de Vries

    Abstract: We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of releva… ▽ More

    Submitted 4 April, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted at EACL 2023

    Journal ref: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. (2023) 2799-2829

  6. Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

    Authors: Zichao Li, Prakhar Sharma, Xing Han Lu, Jackie C. K. Cheung, Siva Reddy

    Abstract: Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment. In this paper, we ask the question: Can we improve QA systems further \emph{post-}deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctnes… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: ACL 2022 Findings

    Journal ref: Findings of the Association for Computational Linguistics: ACL (2022) 926-937

  7. MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining

    Authors: Zhi Wen, Xing Han Lu, Siva Reddy

    Abstract: One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and emp… ▽ More

    Submitted 27 December, 2020; originally announced December 2020.

    Comments: EMNLP 2020 Clinical NLP

    Journal ref: In Proceedings of the 3rd Clinical Natural Language Processing Workshop, pp. 130-135. 2020

  8. arXiv:1401.2038  [pdf, other

    physics.soc-ph cs.MA

    Crowd Research at School: Crossing Flows

    Authors: Johanna Bamberger, Anna-Lena Geßler, Peter Heitzelmann, Sara Korn, Rene Kahlmeyer, Xue Hao Lu, Qi Hao Sang, Zhi Jie Wang, Guan Zong Yuan, Michael Gauß, Tobias Kretz

    Abstract: It has become widely known that when two flows of pedestrians cross stripes emerge spontaneously by which the pedestrians of the two walking directions manage to pass each other in an orderly manner. In this work, we report about the results of an experiment on crossing flows which has been carried out at a German school. These results include that previously reported high flow volumes on the cros… ▽ More

    Submitted 9 January, 2014; originally announced January 2014.

    Comments: contribution to proceedings of Traffic and Granular Flow 2013 held in Jülich, Germany