Skip to main content

Showing 1–50 of 54 results for author: Duh, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18797  [pdf, other

    cs.IR

    Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval

    Authors: Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh

    Abstract: Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora. PSQ is a strong baseline for efficient CLIR using sparse indexing. It is, therefore, useful as the first stage in a cascaded neural CLIR system whose second stage is more effective but too inefficient to be used on its own to… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 11 pages, 5 figures

  2. arXiv:2404.09383  [pdf, other

    cs.CL

    Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields

    Authors: Ryan Cotterell, Kevin Duh

    Abstract: Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world's languages, it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: IJCNLP 2017

  3. arXiv:2403.04510  [pdf, other

    cs.CL cs.AI

    Where does In-context Translation Happen in Large Language Models

    Authors: Suzanna Sia, David Mueller, Kevin Duh

    Abstract: Self-supervised large language models have demonstrated the ability to perform Machine Translation (MT) via in-context learning, but little is known about where the model performs the task with respect to prompt instructions and demonstration examples. In this work, we attempt to characterize the region where large language models transition from in-context learners to translation models. Through… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages. Under Review

  4. arXiv:2311.15507  [pdf, other

    cs.CL cs.AI

    Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

    Authors: Elijah Rippeth, Marine Carpuat, Kevin Duh, Matt Post

    Abstract: Lexical ambiguity is a challenging and pervasive problem in machine translation (\mt). We introduce a simple and scalable approach to resolve translation ambiguity by incorporating a small amount of extra-sentential context in neural \mt. Our approach requires no sense annotation and no change to standard model architectures. Since actual document context is not available for the vast majority of… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  5. arXiv:2311.08324  [pdf, other

    cs.CL cs.AI

    Anti-LM Decoding for Zero-shot In-context Machine Translation

    Authors: Suzanna Sia, Alexandra DeLucia, Kevin Duh

    Abstract: Zero-shot In-context learning is the phenomenon where models can perform the task simply given the instructions. However, pre-trained large language models are known to be poorly calibrated for this task. One of the most effective approaches to handling this bias is to adopt a contrastive decoding objective, which accounts for the prior probability of generating the next token by conditioning on s… ▽ More

    Submitted 2 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL Findings 2024

  6. arXiv:2306.11252  [pdf, other

    cs.CL cs.LG

    HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech Translation

    Authors: Cihan Xiao, Henry Li Xinyuan, **yi Yang, Dongji Gao, Matthew Wiesner, Kevin Duh, Sanjeev Khudanpur

    Abstract: We introduce HK-LegiCoST, a new three-way parallel corpus of Cantonese-English translations, containing 600+ hours of Cantonese audio, its standard traditional Chinese transcript, and English translation, segmented and aligned at the sentence level. We describe the notable challenges in corpus preparation: segmentation, alignment of long audio recordings, and sentence-level alignment with non-verb… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

  7. arXiv:2306.07198  [pdf, other

    cs.CL

    A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

    Authors: Jeremy Gwinnup, Kevin Duh

    Abstract: Large language models such as BERT and the GPT series started a paradigm shift that calls for building general-purpose models via pre-training on large datasets, followed by fine-tuning on task-specific datasets. There is now a plethora of large pre-trained models for Natural Language Processing and Computer Vision. Recently, we have seen rapid developments in the joint Vision-Language space as we… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 10 pages

  8. arXiv:2305.14230  [pdf, other

    cs.CL

    Exploring Representational Disparities Between Multilingual and Bilingual Translation Models

    Authors: Neha Verma, Kenton Murray, Kevin Duh

    Abstract: Multilingual machine translation has proven immensely useful for both parameter efficiency and overall performance across many language pairs via complete multilingual parameter sharing. However, some language pairs in multilingual models can see worse performance than in bilingual models, especially in the one-to-many translation setting. Motivated by their empirical differences, we examine the g… ▽ More

    Submitted 26 March, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: LREC-COLING 2024

  9. arXiv:2305.03573  [pdf, other

    cs.CL cs.AI

    In-context Learning as Maintaining Coherency: A Study of On-the-fly Machine Translation Using Large Language Models

    Authors: Suzanna Sia, Kevin Duh

    Abstract: The phenomena of in-context learning has typically been thought of as "learning from examples". In this work which focuses on Machine Translation, we present a perspective of in-context learning as the desired generation task maintaining coherency with its context, i.e., the prompt examples. We first investigate randomly sampled prompts across 4 domains, and find that translation performance impro… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: 9 pages

  10. arXiv:2210.14378  [pdf, other

    cs.CL cs.LG

    Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

    Authors: Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Camera-Ready

  11. arXiv:2210.05098  [pdf, other

    cs.CL cs.LG

    IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

    Authors: Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn

    Abstract: The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism." We address the root-cause of faulty cross-lingual map**: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into t… ▽ More

    Submitted 4 July, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Updated EMNLP2022 Camera Ready (citation correction, removed references to dimensionality reduction [was not used here].)

  12. arXiv:2201.08471  [pdf, other

    cs.IR cs.CL

    Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models

    Authors: Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard

    Abstract: The advent of transformer-based models such as BERT has led to the rise of neural ranking models. These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25. While monolingual retrieval tasks have benefited from large-scale training collections such as MS MARCO and advances in neural architectures, cross-language retrieval tasks… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR 2022 (Full paper)

  13. arXiv:2109.12640  [pdf, other

    cs.CL

    An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces

    Authors: Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn

    Abstract: Much recent work in bilingual lexicon induction (BLI) views word embeddings as vectors in Euclidean space. As such, BLI is typically solved by finding a linear transformation that maps embeddings to a common space. Alternatively, word embeddings may be understood as nodes in a weighted graph. This framing allows us to examine a node's graph neighborhood without assuming a linear transform, and exp… ▽ More

    Submitted 26 September, 2021; originally announced September 2021.

    Comments: EMNLP Findings 2021 Camera-Ready

  14. arXiv:2109.04411  [pdf, other

    eess.AS cs.CL cs.SD

    Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

    Authors: Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

    Abstract: This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. End-to-end speech translation models have several advantages over traditional cascade systems such as inference latency reduction. However, conventional AR decoding methods are not fast enough because each token is generated incrementally. NAR models, however, can accelera… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

  15. arXiv:2107.00636  [pdf, other

    eess.AS cs.CL cs.SD

    ESPnet-ST IWSLT 2021 Offline Speech Translation System

    Authors: Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe

    Abstract: This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track. This year we made various efforts on training data, architecture, and audio segmentation. On the data side, we investigated sequence-level knowledge distillation (SeqKD) for end-to-end (E2E) speech translation. Specifically, we used multi-referenced SeqKD from multiple teachers trained on diff… ▽ More

    Submitted 6 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: IWSLT 2021

  16. arXiv:2105.04475  [pdf, other

    cs.CL cs.AI

    Self-Guided Curriculum Learning for Neural Machine Translation

    Authors: Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda

    Abstract: In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we… ▽ More

    Submitted 27 August, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: IWSLT 2021

  17. arXiv:2101.10877  [pdf, other

    eess.AS cs.SD

    Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

    Authors: Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe

    Abstract: "Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, es… ▽ More

    Submitted 5 March, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Comments: Accepted by EACL2021

  18. arXiv:2010.13047  [pdf, other

    cs.CL cs.SD eess.AS

    Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

    Authors: Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

    Abstract: Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems. End-to-end (E2E) models based on the encoder-decoder architecture are more suitable for this goal than traditional cascaded systems, but their effectiveness regarding decoding speed has not been explored so far. Inspired by recent progress in non-autoregressive (NAR) methods in text-based tr… ▽ More

    Submitted 18 February, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

    Comments: Accepted at IEEE ICASSP 2021

  19. arXiv:2008.07772  [pdf, other

    cs.CL

    Very Deep Transformers for Neural Machine Translation

    Authors: Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

    Abstract: We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve… ▽ More

    Submitted 14 October, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: 6 pages, 3 figures and 4 tables. V2 includes the back-translation results

  20. arXiv:2005.03932  [pdf, other

    cs.IR cs.LG

    Modeling Document Interactions for Learning to Rank with Regularized Self-Attention

    Authors: Shuo Sun, Kevin Duh

    Abstract: Learning to rank is an important task that has been successfully deployed in many real-world information retrieval systems. Most existing methods compute relevance judgments of documents independently, without holistically considering the entire set of competing documents. In this paper, we explore modeling documents interactions with self-attention based neural networks. Although self-attention n… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: 5 pages,5 figures

  21. arXiv:2004.10234  [pdf, ps, other

    cs.CL cs.SD eess.AS

    ESPnet-ST: All-in-One Speech Translation Toolkit

    Authors: Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe

    Abstract: We present ESPnet-ST, which is designed for the quick development of speech-to-speech translation systems in a single framework. ESPnet-ST is a new project inside end-to-end speech processing toolkit, ESPnet, which integrates or newly implements automatic speech recognition, machine translation, and text-to-speech functions for speech translation. We provide all-in-one recipes including data pre-p… ▽ More

    Submitted 30 September, 2020; v1 submitted 21 April, 2020; originally announced April 2020.

    Comments: Accepted at ACL 2020 System Demonstration (update Table1, fix typo)

  22. arXiv:2004.05516  [pdf, other

    cs.CL

    When Does Unsupervised Machine Translation Work?

    Authors: Kelly Marchisio, Kevin Duh, Philipp Koehn

    Abstract: Despite the reported success of unsupervised machine translation (MT), the field has yet to examine the conditions under which these methods succeed, and where they fail. We conduct an extensive empirical evaluation of unsupervised MT using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages. We find that performance rapidly deteriorates when sourc… ▽ More

    Submitted 18 November, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: WMT20 Camera Ready

  23. arXiv:2003.02877  [pdf, other

    cs.CL

    Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

    Authors: Mitchell A. Gordon, Kevin Duh

    Abstract: We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting. While both domain adaptation and knowledge distillation are widely-used, their interaction remains little understood. Our large-scale empirical results in machine translation (on three language pairs with three domains each) suggest… ▽ More

    Submitted 23 June, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

    Comments: Accepted to WNGT 2020 Workshop at ACL 2020 Conference. Code is at http://github.com/mitchellgordon95/kd-aug

  24. arXiv:2002.09646  [pdf, other

    cs.CL

    Machine Translation System Selection from Bandit Feedback

    Authors: Jason Naradowsky, Xuan Zhang, Kevin Duh

    Abstract: Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of fine-grained feedback (such as correct translations) typically used for improving the system. Moreover, different users have different translation needs, and even a single user's needs may change over time. In this work we take a different approach, tre… ▽ More

    Submitted 2 September, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: Accepted to AMTA 2020

  25. arXiv:2002.08307  [pdf, other

    cs.CL

    Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

    Authors: Mitchell A. Gordon, Kevin Duh, Nicholas Andrews

    Abstract: Pre-trained universal feature extractors, such as BERT for natural language processing and VGG for computer vision, have become effective methods for improving deep learning models without requiring more labeled data. While effective, feature extractors like BERT may be prohibitively large for some deployment scenarios. We explore weight pruning for BERT and ask: how does compression during pre-tr… ▽ More

    Submitted 14 May, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted to Rep4NLP 2020 Workshop at ACL 2020 Conference

  26. arXiv:1912.03334  [pdf, other

    cs.CL

    Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation

    Authors: Mitchell A. Gordon, Kevin Duh

    Abstract: Sequence-level knowledge distillation (SLKD) is a model compression technique that leverages large, accurate teacher models to train smaller, under-parameterized student models. Why does pre-processing MT data with SLKD help us train smaller models? We test the common hypothesis that SLKD addresses a capacity deficiency in students by "simplifying" noisy data points and find it unlikely in our cas… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  27. arXiv:1910.00254  [pdf, ps, other

    cs.CL eess.AS

    Multilingual End-to-End Speech Translation

    Authors: Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

    Abstract: In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture. While multilingual models have shown to be useful for automatic speech recognition (ASR) and machine translation (MT), this is the fi… ▽ More

    Submitted 31 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted to ASRU 2019

  28. arXiv:1909.02607  [pdf, other

    cs.CL

    Broad-Coverage Semantic Parsing as Transduction

    Authors: Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme

    Abstract: We unify different broad-coverage semantic parsing tasks under a transduction paradigm, and propose an attention-based neural framework that incrementally builds a meaning representation via a sequence of semantic relations. By leveraging multiple attention mechanisms, the transducer can be effectively trained without relying on a pre-trained aligner. Experiments conducted on three separate broad-… ▽ More

    Submitted 4 November, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: Accepted at EMNLP 2019

  29. arXiv:1905.10453  [pdf, ps, other

    cs.CL

    A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

    Authors: Shuoyang Ding, Adithya Renduchintala, Kevin Duh

    Abstract: Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration on different numbers of BPE merge operations to understand how it interacts with the model architecture, the stra… ▽ More

    Submitted 24 June, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

    Comments: Accepted to MT Summit 2019

  30. arXiv:1905.08704  [pdf, other

    cs.CL

    AMR Parsing as Sequence-to-Graph Transduction

    Authors: Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme

    Abstract: We propose an attention-based model that treats AMR parsing as sequence-to-graph transduction. Unlike most AMR parsers that rely on pre-trained aligners, external semantic resources, or data augmentation, our proposed parser is aligner-free, and it can be effectively trained with limited amounts of labeled AMR data. Our experimental results outperform all previously reported SMATCH scores, on both… ▽ More

    Submitted 23 June, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: Accepted at ACL 2019

  31. arXiv:1905.05816  [pdf, other

    cs.CL

    Curriculum Learning for Domain Adaptation in Neural Machine Translation

    Authors: Xuan Zhang, Pamela Shapiro, Gaurav Kumar, Paul McNamee, Marine Carpuat, Kevin Duh

    Abstract: We introduce a curriculum learning approach to adapt generic neural machine translation models to a specific domain. Samples are grouped by their similarities to the domain of interest and each group is fed to the training algorithm with a particular schedule. This approach is simple to implement on top of any neural framework or architecture, and consistently outperforms both unadapted and adapte… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

  32. arXiv:1904.07982  [pdf, other

    cs.IR cs.CL

    Query Expansion for Cross-Language Question Re-Ranking

    Authors: Muhammad Mahbubur Rahman, Sorami Hisamoto, Kevin Duh

    Abstract: Community question-answering (CQA) platforms have become very popular forums for asking and answering questions daily. While these forums are rich repositories of community knowledge, they present challenges for finding relevant answers and similar questions, due to the open-ended nature of informal discussions. Further, if the platform allows questions and answers in multiple languages, we are fa… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

    Comments: 5 pages, 3 figures

  33. arXiv:1904.05506  [pdf, other

    cs.LG cs.CL stat.ML

    Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

    Authors: Sorami Hisamoto, Matt Post, Kevin Duh

    Abstract: Data privacy is an important issue for "machine learning as a service" providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model's API, determine whether the sample existed in the model's training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications… ▽ More

    Submitted 16 March, 2020; v1 submitted 10 April, 2019; originally announced April 2019.

    Journal ref: Tansactions of the Association for Computational Linguistics (TACL) Volume 8, 2020 p.49-63

  34. arXiv:1811.00739  [pdf, other

    cs.CL cs.LG

    An Empirical Exploration of Curriculum Learning for Neural Machine Translation

    Authors: Xuan Zhang, Gaurav Kumar, Huda Khayrallah, Kenton Murray, Jeremy Gwinnup, Marianna J Martindale, Paul McNamee, Kevin Duh, Marine Carpuat

    Abstract: Machine translation systems based on deep neural networks are expensive to train. Curriculum learning aims to address this issue by choosing the order in which samples are presented during training to help train better models faster. We adopt a probabilistic view of curriculum learning, which lets us flexibly evaluate the impact of curricula design, and perform an extensive exploration on a German… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

  35. arXiv:1810.12885  [pdf, other

    cs.CL

    ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension

    Authors: Sheng Zhang, Xiaodong Liu, **g**g Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme

    Abstract: We present a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning. Experiments on this dataset demonstrate that the performance of state-of-the-art MRC systems fall far behind human performance. ReCoRD represents a challenge for future research to bridge the gap between human and machine commonsense reading comprehension. ReCoRD is available at http://nlp.… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

    Comments: 14 pages

  36. arXiv:1809.09194  [pdf, other

    cs.CL

    Stochastic Answer Networks for SQuAD 2.0

    Authors: Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao

    Abstract: This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not. The extended SAN contains two components: a span detector and a binary classifier for judging whether the question is unanswerable, and both components are jointly optimized. Experiments show that S… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

    Comments: 6 pages, 2 figures and 2 tables

  37. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

    Authors: Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

    Abstract: To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surpri… ▽ More

    Submitted 15 January, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

    Comments: presented at WMT 2018. Please cite using the bib entry from here: http://www.statmt.org/wmt18/bib/WMT013.bib

    Journal ref: Proceedings of the Third Conference on Machine Translation: Research Papers (2018) 124-132

  38. arXiv:1809.02223  [pdf, other

    cs.CL

    Character-Aware Decoder for Translation into Morphologically Rich Languages

    Authors: Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn

    Abstract: Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology. We present a character-aware decoder designed to capture such patterns when translating into morphologically rich languages. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder model with convolutional… ▽ More

    Submitted 18 June, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: 9 pages (12 including Appendix), 5 figures, Accepted at MT Summit 2019

  39. arXiv:1809.01301  [pdf, other

    cs.CL

    BPE and CharCNNs for Translation of Morphology: A Cross-Lingual Comparison and Analysis

    Authors: Pamela Shapiro, Kevin Duh

    Abstract: Neural Machine Translation (NMT) in low-resource settings and of morphologically rich languages is made difficult in part by data sparsity of vocabulary words. Several methods have been used to help reduce this sparsity, notably Byte-Pair Encoding (BPE) and a character-based CNN layer (charCNN). However, the charCNN has largely been neglected, possibly because it has only been compared to BPE rath… ▽ More

    Submitted 8 September, 2018; v1 submitted 4 September, 2018; originally announced September 2018.

  40. arXiv:1806.01515  [pdf, other

    cs.CL

    How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

    Authors: Shuoyang Ding, Kevin Duh

    Abstract: Using pre-trained word embeddings as input layer is a common practice in many natural language processing (NLP) tasks, but it is largely neglected for neural machine translation (NMT). In this paper, we conducted a systematic analysis on the effect of using pre-trained source-side monolingual word embedding in NMT. We compared several strategies, such as fixing or updating the embeddings during NM… ▽ More

    Submitted 14 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: 10 pages, 4 figures

  41. arXiv:1805.08271  [pdf, other

    cs.CL

    Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction

    Authors: Hongyuan Mei, Sheng Zhang, Kevin Duh, Benjamin Van Durme

    Abstract: Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios. To tackle this challenge, we propose a training method, called Halo, which enforces the local region of each hidden state of a neural model to only generate target tokens with the same semantic structure tag. This simple but powerful technique enables a neural model to learn sema… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

    Comments: *SEM 2018 camera-ready

  42. arXiv:1804.08037  [pdf, other

    cs.CL

    Cross-lingual Semantic Parsing

    Authors: Sheng Zhang, Kevin Duh, Benjamin Van Durme

    Abstract: We introduce the task of cross-lingual semantic parsing: map** content provided in a source language into a meaning representation based on a target language. We present: (1) a meaning representation designed to allow systems to target varying levels of structural complexity (shallow to deep analysis), (2) an evaluation metric to measure the similarity between system output and reference meaning… ▽ More

    Submitted 21 April, 2018; originally announced April 2018.

  43. arXiv:1804.08000  [pdf, other

    cs.CL

    Fine-grained Entity Ty** through Increased Discourse Context and Adaptive Classification Thresholds

    Authors: Sheng Zhang, Kevin Duh, Benjamin Van Durme

    Abstract: Fine-grained entity ty** is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context -- both document and sentence level information -- than prior work. We find that additional context improves performance, with further improvements gained by uti… ▽ More

    Submitted 21 April, 2018; originally announced April 2018.

    Comments: Accepted to StarSem 2018

  44. arXiv:1804.07888  [pdf, other

    cs.CL

    Stochastic Answer Networks for Natural Language Inference

    Authors: Xiaodong Liu, Kevin Duh, Jianfeng Gao

    Abstract: We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference. Rather than directly predicting the results given the inputs, the model maintains a state and iteratively refines its predictions. Our experiments show that SAN achieves the state-of-the-art results on three benchmarks: Stanford Natural Language Inference (SNLI) dataset, MultiGenr… ▽ More

    Submitted 30 March, 2019; v1 submitted 21 April, 2018; originally announced April 2018.

    Comments: 6 pages, 1 figures

  45. arXiv:1712.03556  [pdf, other

    cs.CL

    Stochastic Answer Networks for Machine Reading Comprehension

    Authors: Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao

    Abstract: We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We sh… ▽ More

    Submitted 15 May, 2018; v1 submitted 10 December, 2017; originally announced December 2017.

    Comments: 11 pages, 5 figures, Accepted to ACL 2018

  46. arXiv:1711.03230  [pdf, other

    cs.CL

    An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

    Authors: Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao

    Abstract: Reading comprehension (RC) is a challenging task that requires synthesis of information across sentences and multiple turns of reasoning. Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets. The RC model is an end-to-end neural network with iterative attention, and uses reinforcement learning to… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

  47. arXiv:1704.07463  [pdf, other

    cs.CL

    Streaming Word Embeddings with the Space-Saving Algorithm

    Authors: Chandler May, Kevin Duh, Benjamin Van Durme, Ashwin Lall

    Abstract: We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring the cosine similarity between word pairs under each algorithm and by applying each algorithm in the downstream task of hashtag prediction on a two-month interval… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.

    Comments: 16 pages

    ACM Class: I.2.7; I.2.6

  48. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages

  49. arXiv:1611.00601  [pdf, other

    cs.CL

    Ordinal Common-sense Inference

    Authors: Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme

    Abstract: Humans have the capacity to draw common-sense inferences from natural language: various things that are likely but not certain to hold based on established discourse, and are rarely stated explicitly. We propose an evaluation of automated common-sense inference based on an extension of recognizing textual entailment: predicting ordinal human responses on the subjective likelihood of an inference h… ▽ More

    Submitted 2 June, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

  50. arXiv:1608.02214  [pdf, other

    cs.CL

    Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network

    Authors: Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme

    Abstract: Language processing mechanism by humans is generally more robust than computers. The Cmabrigde Uinervtisy (Cambridge University) effect from the psycholinguistics literature has demonstrated such a robust word processing mechanism, where jumbled words (e.g. Cmabrigde / Cambridge) are recognized with little cost. On the other hand, computational models for word recognition (e.g. spelling checkers)… ▽ More

    Submitted 7 February, 2017; v1 submitted 7 August, 2016; originally announced August 2016.