Skip to main content

Showing 1–5 of 5 results for author: Scholman, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18264  [pdf, other

    cs.CL cs.AI

    Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

    Authors: Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg

    Abstract: Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted to LREC-COLING 2024 Main Conference

  2. arXiv:2402.04918  [pdf, other

    cs.CL cs.AI

    Prompting Implicit Discourse Relation Annotation

    Authors: Frances Yung, Mansoor Ahmad, Merel Scholman, Vera Demberg

    Abstract: Pre-trained large language models, such as ChatGPT, archive outstanding performance in various reasoning tasks without supervised training and were found to have outperformed crowdsourcing workers. Nonetheless, ChatGPT's performance in the task of implicit discourse relation classification, prompted by a standard multiple-choice question, is still far from satisfactory and considerably inferior to… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: To appear at the Linguistic Annotation Workshop 2024

  3. arXiv:2307.00382  [pdf, other

    cs.CL

    Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

    Authors: Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel Scholman

    Abstract: Develo** effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lin… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: To appear in INTERSPEECH 2023

  4. arXiv:2304.00815  [pdf, other

    cs.CL

    Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

    Authors: Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg

    Abstract: Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias: task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of laymen annotators. For this purp… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to TACL, pre-MIT Press publication version

  5. arXiv:1704.08893  [pdf, other

    cs.CL

    How compatible are our discourse annotations? Insights from map** RST-DT and PDTB annotations

    Authors: Vera Demberg, Fatemeh Torabi Asr, Merel Scholman

    Abstract: Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes comparison of the annotations difficult, thereby also preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapp… ▽ More

    Submitted 15 March, 2018; v1 submitted 28 April, 2017; originally announced April 2017.