Skip to main content

Showing 1–6 of 6 results for author: Borchert, P

.
  1. arXiv:2406.17385  [pdf, other

    cs.CL

    Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance

    Authors: Manon Reusens, Philipp Borchert, Jochen De Weerdt, Bart Baesens

    Abstract: Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whether the quality of LLM responses varies depending on the demographic profile of users. Considering English as the global lingua franca, along with the diversity of its dialects among speakers of different native… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.12739  [pdf, other

    cs.CL

    Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages

    Authors: Fabian David Schmidt, Philipp Borchert, Ivan Vulić, Goran Glavaš

    Abstract: LLMs have become a go-to solution not just for text generation, but also for natural language understanding (NLU) tasks. Acquiring extensive knowledge through language modeling on web-scale corpora, they excel on English NLU, yet struggle to extend their NLU capabilities to underrepresented languages. In contrast, machine translation models (MT) produce excellent multilingual representations, resu… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2403.16543  [pdf, other

    cs.CL cs.AI

    Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning

    Authors: Philipp Borchert, Jochen De Weerdt, Marie-Francine Moens

    Abstract: Differentiating relationships between entity pairs with limited labeled instances poses a significant challenge in few-shot relation classification. Representations of textual data extract rich information spanning the domain, entities, and relations. In this paper, we introduce a novel approach to enhance information extraction combining multiple sentence representations and contrastive learning.… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  4. arXiv:2310.12024  [pdf, other

    cs.CL

    CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation

    Authors: Philipp Borchert, Jochen De Weerdt, Kristof Coussement, Arno De Caigny, Marie-Francine Moens

    Abstract: We introduce CORE, a dataset for few-shot relation classification (RC) focused on company relations and business entities. CORE includes 4,708 instances of 12 relation types with corresponding textual evidence extracted from company Wikipedia pages. Company names and business entities pose a challenge for few-shot RC models due to the rich and diverse information associated with them. For example,… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 main conference

  5. arXiv:2310.10310  [pdf, other

    cs.CL

    Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

    Authors: Manon Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt, Bart Baesens

    Abstract: This paper investigates the transferability of debiasing techniques across different languages within multilingual models. We examine the applicability of these techniques in English, French, German, and Dutch. Using multilingual BERT (mBERT), we demonstrate that cross-lingual transfer of debiasing techniques is not only feasible but also yields promising results. Surprisingly, our findings reveal… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 main conference

  6. arXiv:2310.06675  [pdf, other

    cs.CL

    SEER : A Knapsack approach to Exemplar Selection for In-Context HybridQA

    Authors: Jonathan Tonglet, Manon Reusens, Philipp Borchert, Bart Baesens

    Abstract: Question answering over hybrid contexts is a complex task, which requires the combination of information extracted from unstructured texts and structured tables in various ways. Recently, In-Context Learning demonstrated significant performance advances for reasoning tasks. In this paradigm, a large language model performs predictions based on a small set of supporting exemplars. The performance o… ▽ More

    Submitted 20 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Camera ready revision for EMNLP 2023 main conference. Code available at https://github.com/jtonglet/SEER