Skip to main content

Showing 1–47 of 47 results for author: Monz, C

.
  1. arXiv:2407.02208  [pdf, other

    cs.CL cs.AI

    How to Learn in a Noisy World? Self-Correcting the Real-World Data Noise on Machine Translation

    Authors: Yan Meng, Di Wu, Christof Monz

    Abstract: The massive amounts of web-mined parallel data contain large amounts of noise. Semantic misalignment, as the primary source of the noise, poses a challenge for training machine translation systems. In this paper, we first study the impact of real-world hard-to-detect misalignment noise by proposing a process to simulate the realistic misalignment controlled by semantic similarity. After quantitati… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2406.19999  [pdf, other

    cs.CL

    The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models

    Authors: Xinyi Chen, Baohao Liao, Jirui Qi, Panagiotis Eustratiadis, Christof Monz, Arianna Bisazza, Maarten de Rijke

    Abstract: Following multiple instructions is a crucial ability for large language models (LLMs). Evaluating this ability comes with significant challenges: (i) limited coherence between multiple instructions, (ii) positional bias where the order of instructions affects model performance, and (iii) a lack of objectively verifiable tasks. To address these issues, we introduce a benchmark designed to evaluate… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.14267  [pdf, other

    cs.CL cs.AI

    On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

    Authors: Rochelle Choenni, Sara Rajaee, Christof Monz, Ekaterina Shutova

    Abstract: While multilingual language models (MLMs) have been trained on 100+ languages, they are typically only evaluated across a handful of them due to a lack of available test data in most languages. This is particularly problematic when assessing MLM's potential for low-resource and unseen languages. In this paper, we present an analysis of existing evaluation frameworks in multilingual NLP, discuss th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2405.20089  [pdf, other

    cs.CL

    The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities

    Authors: David Stap, Eva Hasler, Bill Byrne, Christof Monz, Ke Tran

    Abstract: Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality. However, it is unclear what is the impact of fine-tuning on desirable LLM behaviors that are not present in neural machine translation models, such as steerability, inherent document-level translation abilities, and the ability to produce less literal translations. We perform an… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 (long, main)

  5. arXiv:2404.11201  [pdf, other

    cs.CL

    Neuron Specialization: Leveraging intrinsic task modularity for multilingual machine translation

    Authors: Shaomu Tan, Di Wu, Christof Monz

    Abstract: Training a unified multilingual model promotes knowledge transfer but inevitably introduces negative interference. Language-specific modeling methods show promise in reducing interference. However, they often rely on heuristics to distribute capacity and struggle to foster cross-lingual transfer via isolated modules. In this paper, we explore intrinsic task modularity within multilingual networks… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2402.12102  [pdf, other

    cs.CL cs.AI

    Is It a Free Lunch for Removing Outliers during Pretraining?

    Authors: Baohao Liao, Christof Monz

    Abstract: With the growing size of large language models, the role of quantization becomes increasingly significant. However, outliers present in weights or activations notably influence the performance of quantized models. Recently, \citet{qtransformer} introduced a novel softmax function aimed at pretraining models in an outlier-free manner, thereby enhancing their suitability for quantization. Interestin… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures, 1 table

  7. arXiv:2402.05147  [pdf, other

    cs.LG cs.CL

    ApiQ: Finetuning of 2-Bit Quantized Large Language Model

    Authors: Baohao Liao, Christian Herold, Shahram Khadivi, Christof Monz

    Abstract: Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the effectiveness of these methods compared to full finetuning. Despite the advancements, current strategies for memory-efficient finetuning, such as QLoRA, exhibit inconsistent performance across di… ▽ More

    Submitted 21 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: more benchmarks and new method, block-wise ApiQ. code: https://github.com/BaohaoLiao/ApiQ

  8. arXiv:2402.02099  [pdf, other

    cs.CL cs.AI cs.LG

    Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

    Authors: Sara Rajaee, Christof Monz

    Abstract: Recent advances in training multilingual language models on large datasets seem to have shown promising results in knowledge transfer across languages and achieve high performance on downstream tasks. However, we question to what extent the current evaluation benchmarks and setups accurately measure zero-shot cross-lingual knowledge transfer. In this work, we challenge the assumption that high zer… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted to EACL 2024

  9. arXiv:2402.01772  [pdf, other

    cs.CL cs.AI cs.LG

    Disentangling the Roles of Target-Side Transfer and Regularization in Multilingual Machine Translation

    Authors: Yan Meng, Christof Monz

    Abstract: Multilingual Machine Translation (MMT) benefits from knowledge transfer across different language pairs. However, improvements in one-to-many translation compared to many-to-one translation are only marginal and sometimes even negligible. This performance discrepancy raises the question of to what extent positive transfer plays a role on the target-side for one-to-many MT. In this paper, we conduc… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  10. arXiv:2401.12413  [pdf, other

    cs.CL cs.LG

    How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

    Authors: Di Wu, Shaomu Tan, Yan Meng, David Stap, Christof Monz

    Abstract: Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is largely considered an open problem. A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus. In this paper, we show that for an English-centric model, surprisingly large zero-shot improveme… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: 15 pages, 5 figures

  11. arXiv:2310.14644  [pdf, other

    cs.CL

    Multilingual k-Nearest-Neighbor Machine Translation

    Authors: David Stap, Christof Monz

    Abstract: k-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages in… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP

  12. arXiv:2310.13469  [pdf, other

    cs.CL cs.AI

    Ask Language Model to Clean Your Noisy Translation Data

    Authors: Quinten Bolding, Baohao Liao, Brandon James Denis, Jun Luo, Christof Monz

    Abstract: Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023, Findings

  13. arXiv:2310.10385  [pdf, other

    cs.CL cs.LG

    Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance

    Authors: Shaomu Tan, Christof Monz

    Abstract: Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the causes of overall low ZS performance, our work introduces a fresh perspective: the presence of high variations in ZS performance. This suggests that MNMT does not uniformly exhibit poor ZS capability; instead, certain trans… ▽ More

    Submitted 31 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: This paper is accepted by the EMNLP 2023 Main Conference

  14. arXiv:2310.09946  [pdf, other

    cs.CL cs.LG

    UvA-MT's Participation in the WMT23 General Translation Shared Task

    Authors: Di Wu, Shaomu Tan, David Stap, Ali Araabi, Christof Monz

    Abstract: This paper describes the UvA-MT's submission to the WMT 2023 shared task on general machine translation. We participate in the constrained track in two directions: English <-> Hebrew. In this competition, we show that by using one model to handle bidirectional tasks, as a minimal setting of Multilingual Machine Translation (MMT), it is possible to achieve comparable results with that of traditiona… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by the WMT2023 Conference

  15. arXiv:2307.12835  [pdf, other

    cs.CL

    Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

    Authors: Ali Araabi, Vlad Niculae, Christof Monz

    Abstract: Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, re… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted at MT Summit 2023

    MSC Class: 68T50 ACM Class: I.2.7

  16. arXiv:2306.00477  [pdf, other

    cs.CL cs.AI cs.LG

    Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

    Authors: Baohao Liao, Shaomu Tan, Christof Monz

    Abstract: Parameter-efficient fine-tuning (PEFT) of pre-trained language models (PLMs) has emerged as a highly successful approach, with training only a small number of parameters without sacrificing performance and becoming the de-facto learning paradigm with the increasing size of PLMs. However, existing PEFT methods are not memory-efficient, because they still require caching most of the intermediate act… ▽ More

    Submitted 19 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: NeurIPS camera-ready version. Code at https://github.com/BaohaoLiao/mefts

  17. arXiv:2305.16742  [pdf, other

    cs.CL cs.AI cs.LG

    Parameter-Efficient Fine-Tuning without Introducing New Latency

    Authors: Baohao Liao, Yan Meng, Christof Monz

    Abstract: Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently demonstrated remarkable achievements, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters, and consequently addressing the storage and communication constraints. Nonetheless, various PEFT methods are limited by their inherent characteristics. In the case… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL2023 camera-ready version

  18. arXiv:2305.14189  [pdf, other

    cs.CL cs.LG

    Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

    Authors: Di Wu, Christof Monz

    Abstract: Using a vocabulary that is shared across languages is common practice in Multilingual Neural Machine Translation (MNMT). In addition to its simple design, shared tokens play an important role in positive knowledge transfer, assuming that shared tokens refer to similar meanings across languages. However, when word overlap is small, especially due to different writing systems, transfer is inhibited.… ▽ More

    Submitted 20 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 15 pages, 3 figures

  19. arXiv:2305.11550  [pdf, other

    cs.CL

    Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

    Authors: David Stap, Vlad Niculae, Christof Monz

    Abstract: We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly co… ▽ More

    Submitted 4 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 Findings

  20. arXiv:2211.04898  [pdf, other

    cs.CL cs.AI

    Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

    Authors: Baohao Liao, David Thulke, Sanjika Hewavitharana, Hermann Ney, Christof Monz

    Abstract: The pre-training of masked language models (MLMs) consumes massive computation to achieve good results on downstream NLP tasks, resulting in a large carbon footprint. In the vanilla MLM, the virtual tokens, [MASK]s, act as placeholders and gather the contextualized information from unmasked tokens to restore the corrupted information. It raises the question of whether we can append [MASK]s at a la… ▽ More

    Submitted 15 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Code available at: https://github.com/BaohaoLiao/3ml

  21. arXiv:2208.05225  [pdf, other

    cs.CL

    How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?

    Authors: Ali Araabi, Christof Monz, Vlad Niculae

    Abstract: Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results… ▽ More

    Submitted 17 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 14 pages, 6 figures, 1 table, To be published in AMTA 2022 conference

    MSC Class: 68T50 ACM Class: I.2.7

  22. arXiv:2011.02266  [pdf, other

    cs.CL cs.LG

    Optimizing Transformer for Low-Resource Neural Machine Translation

    Authors: Ali Araabi, Christof Monz

    Abstract: Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. While the Transformer model has achieved significant improvements for many language pairs and has become the de facto mainstream architecture, its capability under low-resource conditions has not been fully investigated yet. Our experiments on different sub… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: To be published in COLING 2020

    ACM Class: I.2.7

  23. arXiv:2005.12398  [pdf, other

    cs.CL

    The Unreasonable Volatility of Neural Machine Translation Models

    Authors: Marzieh Fadaee, Christof Monz

    Abstract: Recent works have shown that Neural Machine Translation (NMT) models achieve impressive performance, however, questions about understanding the behavior of these models remain unanswered. We investigate the unexpected volatility of NMT models where the input is semantically and syntactically correct. We discover that with trivial modifications of source sentences, we can identify cases where \text… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: Accepted to Neural Generation and Translation Workshop (WNGT) at ACL 2020

  24. arXiv:2004.14162  [pdf, other

    cs.IR cs.AI

    Conversations with Search Engines: SERP-based Conversational Response Generation

    Authors: Pengjie Ren, Zhumin Chen, Zhaochun Ren, Evangelos Kanoulas, Christof Monz, Maarten de Rijke

    Abstract: In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agen… ▽ More

    Submitted 18 May, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: published in TOIS 2021

  25. arXiv:2003.11963  [pdf, other

    cs.CL

    TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation

    Authors: Shaojie Jiang, Thomas Wolf, Christof Monz, Maarten de Rijke

    Abstract: Natural Language Generation (NLG) models are prone to generating repetitive utterances. In this work, we study the repetition problem for encoder-decoder models, using both recurrent neural network (RNN) and transformer architectures. To this end, we consider the chit-chat task, where the problem is more prominent than in other tasks that need encoder-decoder architectures. We first study the infl… ▽ More

    Submitted 9 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

    Comments: 9 pages, 4 figures, 1 table

  26. arXiv:1911.08151  [pdf, other

    cs.CL cs.AI cs.IR

    Retrospective and Prospective Mixture-of-Generators for Task-oriented Dialogue Response Generation

    Authors: Jiahuan Pei, Pengjie Ren, Christof Monz, Maarten de Rijke

    Abstract: Dialogue response generation (DRG) is a critical component of task-oriented dialogue systems (TDSs). Its purpose is to generate proper natural language responses given some context, e.g., historical utterances, system states, etc. State-of-the-art work focuses on how to better tackle DRG in an end-to-end way. Typically, such studies assume that each token is drawn from a single distribution over t… ▽ More

    Submitted 19 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: The paper is accepted by 24th European Conference on Artificial Intelligence

  27. arXiv:1910.02655  [pdf, other

    cs.CL

    BERT for Evidence Retrieval and Claim Verification

    Authors: Amir Soleimani, Christof Monz, Marcel Worring

    Abstract: Motivated by the promising performance of pre-trained language models, we investigate BERT in an evidence retrieval and claim verification pipeline for the FEVER fact extraction and verification challenge. To this end, we propose to use two BERT models, one for retrieving potential evidence sentences supporting or rejecting claims, and another for verifying claims based on the predicted evidence s… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  28. arXiv:1908.09528  [pdf, other

    cs.CL cs.AI

    Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation

    Authors: Pengjie Ren, Zhumin Chen, Christof Monz, Jun Ma, Maarten de Rijke

    Abstract: Background Based Conversations (BBCs) have been introduced to help conversational systems avoid generating overly generic responses. In a BBC, the conversation is grounded in a knowledge source. A key challenge in BBCs is Knowledge Selection (KS): given a conversational context, try to find the appropriate background knowledge (a text fragment containing related facts or comments, etc.) based on w… ▽ More

    Submitted 21 November, 2019; v1 submitted 26 August, 2019; originally announced August 2019.

    Comments: accepted by AAAI 2020

  29. arXiv:1908.06449  [pdf, other

    cs.CL cs.AI cs.LG

    RefNet: A Reference-aware Network for Background Based Conversation

    Authors: Chuan Meng, Pengjie Ren, Zhumin Chen, Christof Monz, Jun Ma, Maarten de Rijke

    Abstract: Existing conversational systems tend to generate generic responses. Recently, Background Based Conversations (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, they either cannot generate natural responses or have difficulty in locating the right… ▽ More

    Submitted 23 November, 2019; v1 submitted 18 August, 2019; originally announced August 2019.

    Comments: Accepted to AAAI 2020 (Oral)

  30. arXiv:1907.03885  [pdf, other

    cs.CL cs.LG cs.NE

    An Intrinsic Nearest Neighbor Analysis of Neural Machine Translation Architectures

    Authors: Hamidreza Ghader, Christof Monz

    Abstract: Earlier approaches indirectly studied the information captured by the hidden states of recurrent and non-recurrent neural machine translation models by feeding them into different classifiers. In this paper, we look at the encoder hidden states of both transformer and recurrent machine translation models from the nearest neighbors perspective. We investigate to what extent the nearest neighbors sh… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

    Comments: To be presented at Machine Translation Summit 2019 (MTSUMMIT XVII), Dublin, Ireland

  31. arXiv:1902.09191  [pdf, other

    cs.IR cs.CL cs.LG

    Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss

    Authors: Shaojie Jiang, Pengjie Ren, Christof Monz, Maarten de Rijke

    Abstract: Sequence-to-Sequence (Seq2Seq) models have achieved encouraging performance on the dialogue response generation task. However, existing Seq2Seq-based response generation methods suffer from a low-diversity problem: they frequently generate generic responses, which make the conversation less interesting. In this paper, we address the low-diversity problem by investigating its connection with model… ▽ More

    Submitted 25 February, 2019; originally announced February 2019.

    Comments: Will appear at The Web Conference 2019

  32. arXiv:1808.09006  [pdf, other

    cs.CL

    Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation

    Authors: Marzieh Fadaee, Christof Monz

    Abstract: Neural Machine Translation has achieved state-of-the-art performance for several language pairs using a combination of parallel and synthetic data. Synthetic data is often generated by back-translating sentences randomly sampled from monolingual data using a reverse translation model. While back-translation has been shown to be very effective in many cases, it is not entirely clear why. In this wo… ▽ More

    Submitted 21 September, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: 11 pages, 2 figures. Accepted at EMNLP 2018

  33. arXiv:1803.03585  [pdf, other

    cs.CL

    The Importance of Being Recurrent for Modeling Hierarchical Structure

    Authors: Ke Tran, Arianna Bisazza, Christof Monz

    Abstract: Recent work has shown that recurrent neural networks (RNNs) can implicitly capture and exploit hierarchical information when trained to solve common natural language processing tasks such as language modeling (Linzen et al., 2016) and neural machine translation (Shi et al., 2016). In contrast, the ability to model structured data with non-recurrent neural networks has received little attention des… ▽ More

    Submitted 28 August, 2018; v1 submitted 9 March, 2018; originally announced March 2018.

    Comments: EMNLP 2018

  34. arXiv:1802.04681  [pdf, other

    cs.CL

    Examining the Tip of the Iceberg: A Data Set for Idiom Translation

    Authors: Marzieh Fadaee, Arianna Bisazza, Christof Monz

    Abstract: Neural Machine Translation (NMT) has been widely used in recent years with significant improvements for many language pairs. Although state-of-the-art NMT systems are generating progressively better translations, idiom translation remains one of the open challenges in this field. Idioms, a category of multiword expressions, are an interesting language phenomenon where the overall meaning of the ex… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

    Comments: Accepted at LREC 2018

  35. arXiv:1710.03348  [pdf, other

    cs.CL

    What does Attention in Neural Machine Translation Pay Attention to?

    Authors: Hamidreza Ghader, Christof Monz

    Abstract: Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is simil… ▽ More

    Submitted 9 October, 2017; originally announced October 2017.

    Comments: To appear in IJCNLP 2017

  36. arXiv:1708.00712  [pdf, other

    cs.CL

    Dynamic Data Selection for Neural Machine Translation

    Authors: Marlies van der Wees, Arianna Bisazza, Christof Monz

    Abstract: Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection (Ax… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

    Comments: Accepted at EMNLP2017

  37. Learning Topic-Sensitive Word Representations

    Authors: Marzieh Fadaee, Arianna Bisazza, Christof Monz

    Abstract: Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for ea… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: 5 pages, 1 figure, Accepted at ACL 2017

  38. Data Augmentation for Low-Resource Neural Machine Translation

    Authors: Marzieh Fadaee, Arianna Bisazza, Christof Monz

    Abstract: The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthe… ▽ More

    Submitted 1 May, 2017; originally announced May 2017.

    Comments: 5 pages, 2 figures, Accepted at ACL 2017

  39. arXiv:1610.03708  [pdf, other

    cs.CV cs.CL

    Generating captions without looking beyond objects

    Authors: Hendrik Heuer, Christof Monz, Arnold W. M. Smeulders

    Abstract: This paper explores new evaluation perspectives for image captioning and introduces a noun translation task that achieves comparative image caption generation performance by translating from a set of nouns to captions. This implies that in image captioning, all word categories other than nouns can be evoked by a powerful language model without sacrificing performance on n-gram precision. The paper… ▽ More

    Submitted 18 October, 2016; v1 submitted 12 October, 2016; originally announced October 2016.

    Comments: This paper was presented at the ECCV2016 2nd Workshop on Storytelling with Images and Videos (VisStory)

  40. arXiv:1601.01272  [pdf, other

    cs.CL

    Recurrent Memory Networks for Language Modeling

    Authors: Ke Tran, Arianna Bisazza, Christof Monz

    Abstract: Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allo… ▽ More

    Submitted 22 April, 2016; v1 submitted 6 January, 2016; originally announced January 2016.

    Comments: 8 pages, 6 figures. Accepted at NAACL 2016

  41. arXiv:cs/0009019  [pdf

    cs.AI cs.CL

    Computing Presuppositions by Contextual Reasoning

    Authors: Christof Monz

    Abstract: This paper describes how automated deduction methods for natural language processing can be applied more efficiently by encoding context in a more elaborate way. Our work is based on formal approaches to context, and we provide a tableau calculus for contextual reasoning. This is explained by considering an example from the problem area of presupposition projection.

    Submitted 21 September, 2000; originally announced September 2000.

    Comments: 5 pages

    ACM Class: F.4.1; I.2.7

    Journal ref: In: P. Brezillon, R. Turner, J-C. Pomerol and E. Turner (Eds.) Proceedings of the AAAI-99 Workshop on Reasoning in Context for AI Applications, AAAI Press, 1999, pp. 75-79

  42. arXiv:cs/0009018  [pdf

    cs.CL cs.AI

    A Resolution Calculus for Dynamic Semantics

    Authors: Christof Monz, Maarten de Rijke

    Abstract: This paper applies resolution theorem proving to natural language semantics. The aim is to circumvent the computational complexity triggered by natural language ambiguities like pronoun binding, by interleaving pronoun binding with resolution deduction. Therefore disambiguation is only applied to expression that actually occur during derivations.

    Submitted 21 September, 2000; originally announced September 2000.

    Comments: 15 pages

    ACM Class: F.4.1; I.2.7

    Journal ref: In: J. Dix, L. Farinas del Cerro, and U. Furbach (eds.) Logics in Artificial Intelligence (JELIA'98). Lecture Notes in Artificial Intelligence 1489, Springer, 1998, pp. 184-198

  43. arXiv:cs/0009017  [pdf

    cs.CL cs.AI

    A Tableau Calculus for Pronoun Resolution

    Authors: Christof Monz, Maarten de Rijke

    Abstract: We present a tableau calculus for reasoning in fragments of natural language. We focus on the problem of pronoun resolution and the way in which it complicates automated theorem proving for natural language processing. A method for explicitly manipulating contextual information during deduction is proposed, where pronouns are resolved against this context during deduction. As a result, pronoun r… ▽ More

    Submitted 21 September, 2000; originally announced September 2000.

    Comments: 16 pages

    ACM Class: F.4.1; I.2.7

    Journal ref: In: N.V. Murray (ed.) Automated Reasoning with Analytic Tableaux and Related Methods. Lecture Notes in Artificial Intelligence 1617, Springer, 1999, pages 247-262

  44. arXiv:cs/0009016  [pdf

    cs.CL cs.AI

    Contextual Inference in Computational Semantics

    Authors: Christof Monz

    Abstract: In this paper, an application of automated theorem proving techniques to computational semantics is considered. In order to compute the presuppositions of a natural language discourse, several inference tasks arise. Instead of treating these inferences independently of each other, we show how integrating techniques from formal approaches to context into deduction can help to compute presuppositi… ▽ More

    Submitted 20 September, 2000; originally announced September 2000.

    ACM Class: F.4.1; I.2.7

    Journal ref: In: P. Bouquet, P. Brezillon, L. Serafini, M. Benerecetti, F. Castellani (Eds.) 2nd International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT'99). Lecture Notes in Artificial Intelligence 1688, Springer, 1999, pages 242-255

  45. arXiv:cs/0009015  [pdf

    cs.CL

    A Tableaux Calculus for Ambiguous Quantification

    Authors: Christof Monz, Maarten de Rijke

    Abstract: Co** with ambiguity has recently received a lot of attention in natural language processing. Most work focuses on the semantic representation of ambiguous expressions. In this paper we complement this work in two ways. First, we provide an entailment relation for a language with ambiguous expressions. Second, we give a sound and complete tableaux calculus for reasoning with statements involvin… ▽ More

    Submitted 20 September, 2000; originally announced September 2000.

    Comments: In: H. de Swart (editor). Automated Reasoning with Analytic Tableaux and Related Methods, Tableaux'98 LNAI 1397, Springer, 1998, pp. 232-246

    ACM Class: F.4.1 I.2.7

  46. arXiv:cs/0009014  [pdf

    cs.CL cs.DL

    Combining Linguistic and Spatial Information for Document Analysis

    Authors: Marco Aiello, Christof Monz, Leon Todoran

    Abstract: We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used. To infer relations of the logical layout we res… ▽ More

    Submitted 20 September, 2000; originally announced September 2000.

    Comments: Appeared in: J. Mariani and D. Harman (Eds.) Proceedings of RIAO'2000 Content-Based Multimedia Information Access, CID, 2000. pp. 266-275

    ACM Class: H.3.5; H.3.6; H.3.7; I.2.7; I.7

  47. arXiv:cs/0009012  [pdf

    cs.CL cs.AI cs.MA

    Modeling Ambiguity in a Multi-Agent System

    Authors: Christof Monz

    Abstract: This paper investigates the formal pragmatics of ambiguous expressions by modeling ambiguity in a multi-agent system. Such a framework allows us to give a more refined notion of the kind of information that is conveyed by ambiguous expressions. We analyze how ambiguity affects the knowledge of the dialog participants and, especially, what they know about each other after an ambiguous sentence ha… ▽ More

    Submitted 19 September, 2000; originally announced September 2000.

    Comments: 7 pages

    ACM Class: F.4.1; I.2.7