Skip to main content

Showing 1–15 of 15 results for author: van Noord, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.08354  [pdf, other

    cs.CL

    Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks

    Authors: Xiao Zhang, Chunliu Wang, Rik van Noord, Johan Bos

    Abstract: The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing with a focus on semantic parsing and text generation. Currently, we witness an excellent performance of neural parsers and generators on the PMB. This might suggest that such semantic processing tasks have by and large been solved. We argue that this is not the case and that performance scores from the past on the PMB are i… ▽ More

    Submitted 7 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  2. arXiv:2404.05428  [pdf, other

    cs.CL

    Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

    Authors: Nikola Ljubešić, Vít Suchomel, Peter Rupnik, Taja Kuzman, Rik van Noord

    Abstract: The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, encoder models of up to 1 billion parameters are still very much needed, their primary usage being in enriching large collections of data with metadata necessary for downstream research. We investigat… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  3. arXiv:2403.08693  [pdf, other

    cs.CL

    Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

    Authors: Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez, Antonio Toral

    Abstract: Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA and XLM-RoBERTa models. However, despite this importance, relatively little attention has been given to the quality of these corpora. In this paper, we compare four of the currently most relevant larg… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024 (long)

  4. arXiv:2310.02053  [pdf, other

    cs.CL

    Controlling Topic-Focus Articulation in Meaning-to-Text Generation using Graph Neural Networks

    Authors: Chunliu Wang, Rik van Noord, Johan Bos

    Abstract: A bare meaning representation can be expressed in various ways using natural language, depending on how the information is structured on the surface level. We are interested in finding ways to control topic-focus articulation when generating text from meaning. We focus on distinguishing active and passive voice for sentences with transitive verbs. The idea is to add pragmatic information such as t… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  5. arXiv:2305.19757  [pdf, other

    cs.CL

    Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

    Authors: Malina Chichirau, Rik van Noord, Antonio Toral

    Abstract: We tackle the task of automatically discriminating between human and machine translations. As opposed to most previous work, we perform experiments in a multilingual setting, considering multiple languages and multilingual pretrained language models. We show that a classifier trained on parallel data with a single source language (in our case German-English) can still perform well on English trans… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted at EAMT2023

  6. arXiv:2012.14854  [pdf, other

    cs.CL

    The Parallel Meaning Bank: A Framework for Semantically Annotating Multiple Languages

    Authors: Lasha Abzianidze, Rik van Noord, Chunliu Wang, Johan Bos

    Abstract: This paper gives a general description of the ideas behind the Parallel Meaning Bank, a framework with the aim to provide an easy way to annotate compositional semantics for texts written in languages other than English. The annotation procedure is semi-automatic, and comprises seven layers of linguistic information: segmentation, symbolisation, semantic tagging, word sense disambiguation, syntact… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

    Comments: 13 pages, 5 figures, 1 table

    MSC Class: 68T50 ACM Class: I.2.7

  7. arXiv:2011.04308  [pdf, other

    cs.CL

    Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT

    Authors: Rik van Noord, Antonio Toral, Johan Bos

    Abstract: We combine character-level and contextual language model representations to improve performance on Discourse Representation Structure parsing. Character representations can easily be added in a sequence-to-sequence model in either one encoder or as a fully separate encoder, with improvements that are robust to different language models, languages and data sets. For English, these improvements are… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020 (long)

  8. arXiv:2005.13399  [pdf, other

    cs.CL cs.AI cs.LG

    The First Shared Task on Discourse Representation Structure Parsing

    Authors: Lasha Abzianidze, Rik van Noord, Hessel Haagsma, Johan Bos

    Abstract: The paper presents the IWCS 2019 shared task on semantic parsing where the goal is to produce Discourse Representation Structures (DRSs) for English sentences. DRSs originate from Discourse Representation Theory and represent scoped meaning representations that capture the semantics of negation, modals, quantification, and presupposition triggers. Additionally, concepts and event-participants in D… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: International Conference on Computational Semantics (IWCS)

    ACM Class: I.2.7

    Journal ref: Proceedings of the IWCS Shared Task on Semantic Parsing, IWCS, SIGSEM, 2019, Association for Computational Linguistics

  9. arXiv:1905.09866  [pdf, other

    cs.CL

    Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor

    Authors: Malvina Nissim, Rik van Noord, Rob van der Goot

    Abstract: Analogies such as "man is to king as woman is to X" are often used to illustrate the amazing power of word embeddings. Concurrently, they have also been used to expose how strongly human biases are encoded in vector spaces built on natural language, like "man is to computer programmer as woman is to homemaker". Recent work has shown that analogies are in fact not such a diagnostic for bias, and ot… ▽ More

    Submitted 9 November, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

  10. arXiv:1810.12579  [pdf, other

    cs.CL

    Exploring Neural Methods for Parsing Discourse Representation Structures

    Authors: Rik van Noord, Lasha Abzianidze, Antonio Toral, Johan Bos

    Abstract: Neural methods have had several recent successes in semantic parsing, though they have yet to face the challenge of producing meaning representations based on formal semantics. We present a sequence-to-sequence neural semantic parser that is able to produce Discourse Representation Structures (DRSs) for English sentences with high accuracy, outperforming traditional DRS parsers. To facilitate the… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: to appear in TACL 2018

  11. arXiv:1805.10824  [pdf, ps, other

    cs.CL

    UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish

    Authors: Marloes Kuijper, Mike van Lenthe, Rik van Noord

    Abstract: The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Comments: Accepted at SemEval 2018

  12. arXiv:1802.08599  [pdf, other

    cs.CL

    Evaluating Scoped Meaning Representations

    Authors: Rik van Noord, Lasha Abzianidze, Hessel Haagsma, Johan Bos

    Abstract: Semantic parsing offers many opportunities to improve natural language understanding. We present a semantically annotated parallel corpus for English, German, Italian, and Dutch where sentences are aligned with scoped meaning representations in order to capture the semantics of negation, modals, quantification, and presupposition triggers. The semantic formalism is based on Discourse Representatio… ▽ More

    Submitted 10 April, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Camera-ready for LREC 2018

  13. arXiv:1705.09980  [pdf, ps, other

    cs.CL

    Neural Semantic Parsing by Character-based Translation: Experiments with Abstract Meaning Representations

    Authors: Rik van Noord, Johan Bos

    Abstract: We evaluate the character-level translation method for neural semantic parsing on a large corpus of sentences annotated with Abstract Meaning Representations (AMRs). Using a sequence-to-sequence model, and some trivial preprocessing and postprocessing of AMRs, we obtain a baseline accuracy of 53.1 (F-score on AMR-triples). We examine five different approaches to improve this baseline result: (i) r… ▽ More

    Submitted 9 October, 2017; v1 submitted 28 May, 2017; originally announced May 2017.

    Comments: Camera ready for CLIN 2017 journal

  14. arXiv:1704.02156  [pdf, other

    cs.CL

    The Meaning Factory at SemEval-2017 Task 9: Producing AMRs with Neural Semantic Parsing

    Authors: Rik van Noord, Johan Bos

    Abstract: We evaluate a semantic parser based on a character-based sequence-to-sequence model in the context of the SemEval-2017 shared task on semantic parsing for AMRs. With data augmentation, super characters, and POS-tagging we gain major improvements in performance compared to a baseline character-level model. Although we improve on previous character-based neural semantic parsing models, the overall a… ▽ More

    Submitted 19 April, 2017; v1 submitted 7 April, 2017; originally announced April 2017.

    Comments: To appear in Proceedings of SemEval, 2017 (camera-ready)

  15. arXiv:1702.03964  [pdf, other

    cs.CL

    The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations

    Authors: Lasha Abzianidze, Johannes Bjerva, Kilian Evang, Hessel Haagsma, Rik van Noord, Pierre Ludmann, Duc-Duy Nguyen, Johan Bos

    Abstract: The Parallel Meaning Bank is a corpus of translations annotated with shared, formal meaning representations comprising over 11 million words divided over four languages (English, German, Italian, and Dutch). Our approach is based on cross-lingual projection: automatically produced (and manually corrected) semantic annotations for English sentences are mapped onto their word-aligned translations, a… ▽ More

    Submitted 13 February, 2017; originally announced February 2017.

    Comments: To appear at EACL 2017