Skip to main content

Showing 1–50 of 72 results for author: Gimpel, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.13433  [pdf, other

    cs.CL cs.DS

    Structured Tree Alignment for Evaluation of (Speech) Constituency Parsing

    Authors: Freda Shi, Kevin Gimpel, Karen Livescu

    Abstract: We present the structured average intersection-over-union ratio (STRUCT-IOU), a similarity metric between constituency parse trees motivated by the problem of evaluating speech parsers. STRUCT-IOU enables comparison between a constituency parse tree (over automatically recognized spoken word boundaries) with the ground-truth parse (over written words). To compute the metric, we project the ground-… ▽ More

    Submitted 19 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ACL 2024 camera-ready

  2. arXiv:2311.09517  [pdf, other

    cs.CL

    GEE! Grammar Error Explanation with Large Language Models

    Authors: Yixiao Song, Kalpesh Krishna, Rajesh Bhatt, Kevin Gimpel, Mohit Iyyer

    Abstract: Grammatical error correction tools are effective at correcting grammatical errors in users' input sentences but do not provide users with \textit{natural language} explanations about their errors. Such explanations are essential for hel** users learn the language by gaining a deeper understanding of its grammatical rules (DeKeyser, 2003; Ellis et al., 2006). To address this gap, we propose the t… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Preprint, 24 pages, code and data available in https://github.com/Yixiao-Song/GEE-with-LLMs

  3. arXiv:2311.08817  [pdf, other

    cs.CL cs.AI cs.LG

    MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy

    Authors: Davis Yoshida, Kartik Goyal, Kevin Gimpel

    Abstract: It has been widely observed that exact or approximate MAP (mode-seeking) decoding from natural language generation (NLG) models consistently leads to degenerate outputs (Stahlberg and Byrne, 2019, Holtzman et al., 2019). This has generally been attributed to either a fundamental inadequacy of modes in models or weaknesses in language modeling. Contrastingly in this work, we emphasize that degenera… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 49 pages, 3 figures

  4. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  5. arXiv:2305.02239  [pdf, other

    cs.CL cs.AI

    The Benefits of Label-Description Training for Zero-Shot Text Classification

    Authors: Lingyu Gao, Debanjan Ghosh, Kevin Gimpel

    Abstract: Pretrained language models have improved zero-shot text classification by allowing the transfer of semantic knowledge from the training data in order to classify among specific label sets in downstream tasks. We propose a simple way to further improve zero-shot accuracies with minimal effort. We curate small finetuning datasets intended to describe the labels for a task. Unlike typical finetuning… ▽ More

    Submitted 23 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at the EMNLP 2023 main conference (long paper)

  6. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, AdriĆ  Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  7. arXiv:2205.08056  [pdf, other

    cs.CL cs.AI cs.LG

    "What makes a question inquisitive?" A Study on Type-Controlled Inquisitive Question Generation

    Authors: Lingyu Gao, Debanjan Ghosh, Kevin Gimpel

    Abstract: We propose a type-controlled framework for inquisitive question generation. We annotate an inquisitive question dataset with question types, train question type classifiers, and finetune models for type-controlled question generation. Empirical results demonstrate that we can generate a variety of questions that adhere to specific types while drawing from the source texts. We also investigate stra… ▽ More

    Submitted 19 May, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

    Comments: Accepted at the 11th Joint Conference on Lexical and Computational Semantics (*SEM) Conference, NAACL 2022

  8. arXiv:2112.08653  [pdf, other

    cs.CL

    Reconsidering the Past: Optimizing Hidden States in Language Models

    Authors: Davis Yoshida, Kevin Gimpel

    Abstract: We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models at inference time. Similar to dynamic evaluation (Krause et al., 2018), HSO computes the gradient of the log-probability the language model assigns to an evaluation text, but uses it to update the cached hidden states rather than the model parameters. We test HSO with pr… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Findings of EMNLP version

    Journal ref: Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4099-4105

  9. arXiv:2110.08538  [pdf, other

    cs.CL

    Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing

    Authors: Haoyue Shi, Kevin Gimpel, Karen Livescu

    Abstract: We present substructure distribution projection (SubDP), a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately. Models for the target domains can be then trained, using the projected distributions as soft silver labels. We evaluate SubDP on zero-shot cross-lingual dependency parsing, taking dependency arcs as substruc… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

  10. arXiv:2109.09667  [pdf, other

    cs.CL

    On Generalization in Coreference Resolution

    Authors: Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel

    Abstract: While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and meta… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: CRAC 2021

  11. arXiv:2109.08833  [pdf, other

    cs.CL

    TVStoryGen: A Dataset for Generating Stories with Character Descriptions

    Authors: Mingda Chen, Kevin Gimpel

    Abstract: We introduce TVStoryGen, a story generation dataset that requires generating detailed TV show episode recaps from a brief summary and a set of documents describing the characters involved. Unlike other story generation datasets, TVStoryGen contains stories that are authored by professional screen-writers and that feature complex interactions among multiple characters. Generating stories in TVStory… ▽ More

    Submitted 9 October, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

  12. arXiv:2104.15114  [pdf, other

    cs.CL

    Paraphrastic Representations at Scale

    Authors: John Wieting, Kevin Gimpel, Graham Neubig, Taylor Berg-Kirkpatrick

    Abstract: We present a system that allows users to train their own state-of-the-art paraphrastic sentence representations in a variety of languages. We also release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese. We train these models on large amounts of data, achieving significantly improved performance from the original papers proposing the methods on a suite of… ▽ More

    Submitted 4 June, 2023; v1 submitted 30 April, 2021; originally announced April 2021.

    Comments: Published as a demo paper at EMNLP 2022

  13. arXiv:2104.07091  [pdf, other

    cs.CL

    SummScreen: A Dataset for Abstractive Screenplay Summarization

    Authors: Mingda Chen, Zewei Chu, Sam Wiseman, Kevin Gimpel

    Abstract: We introduce SummScreen, a summarization dataset comprised of pairs of TV series transcripts and human written recaps. The dataset provides a challenging testbed for abstractive summarization for several reasons. Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript. These details must be found and integrated to form the succi… ▽ More

    Submitted 6 June, 2022; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: ACL 2022

  14. arXiv:2102.13249  [pdf, other

    cs.CL cs.AI

    Chess as a Testbed for Language Model State Tracking

    Authors: Shubham Toshniwal, Sam Wiseman, Karen Livescu, Kevin Gimpel

    Abstract: Transformer language models have made tremendous strides in natural language understanding tasks. However, the complexity of natural language makes it challenging to ascertain how accurately these models are tracking the world state underlying the text. Motivated by this issue, we consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simp… ▽ More

    Submitted 13 May, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: AAAI 2022 extended version with supplementary material

  15. arXiv:2101.00411  [pdf, other

    cs.CL

    Substructure Substitution: Structured Data Augmentation for NLP

    Authors: Haoyue Shi, Karen Livescu, Kevin Gimpel

    Abstract: We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks. SUB2 generates new examples by substituting substructures (e.g., subtrees or subsequences) with ones with the same label, which can be applied to many structured NLP tasks such as part-of-speech tagging and parsing. For more general tasks (e.g., text classification) which… ▽ More

    Submitted 2 January, 2021; originally announced January 2021.

  16. arXiv:2012.14919  [pdf, other

    cs.CL

    WikiTableT: A Large-Scale Data-to-Text Dataset for Generating Wikipedia Article Sections

    Authors: Mingda Chen, Sam Wiseman, Kevin Gimpel

    Abstract: Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we cast generating Wikipedia sections as a data-to-text generation task and create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of… ▽ More

    Submitted 1 June, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Findings of ACL 2021, camera-ready version

  17. arXiv:2012.04194  [pdf, other

    cs.CL

    Unsupervised Label Refinement Improves Dataless Text Classification

    Authors: Zewei Chu, Karl Stratos, Kevin Gimpel

    Abstract: Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader appli… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

  18. arXiv:2010.12784  [pdf, other

    cs.CL

    Deep Clustering of Text Representations for Supervision-free Probing of Syntax

    Authors: Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan

    Abstract: We explore deep clustering of text representations for unsupervised model interpretation and induction of syntax. As these representations are high-dimensional, out-of-the-box methods like KMeans do not work well. Thus, our approach jointly transforms the representations into a lower-dimensional cluster-friendly space and clusters them. We consider two notions of syntax: Part of speech Induction (… ▽ More

    Submitted 1 December, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

  19. arXiv:2010.05856  [pdf, ps, other

    cs.CL

    Exemplar-Controllable Paraphrasing and Translation using Bitext

    Authors: Mingda Chen, Sam Wiseman, Kevin Gimpel

    Abstract: Most prior work on exemplar-based syntactically controlled paraphrase generation relies on automatically-constructed large-scale paraphrase datasets, which are costly to create. We sidestep this prerequisite by adapting models from prior work to be able to learn solely from bilingual text (bitext). Despite only using bitext for training, and in near zero-shot conditions, our single proposed model… ▽ More

    Submitted 17 September, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  20. arXiv:2010.03760  [pdf, other

    cs.CL cs.LG

    Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference

    Authors: Xiaoan Ding, Tianyu Liu, Baobao Chang, Zhifang Sui, Kevin Gimpel

    Abstract: While discriminative neural network classifiers are generally preferred, recent work has shown advantages of generative classifiers in term of data efficiency and robustness. In this paper, we focus on natural language inference (NLI). We propose GenNLI, a generative classifier for NLI tasks, and empirically characterize its performance by comparing it to five baselines, including discriminative m… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: 14 pages, EMNLP 2020, the first two authors contributed equally

  21. arXiv:2010.02807  [pdf, other

    cs.CL cs.LG

    Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks

    Authors: Shubham Toshniwal, Sam Wiseman, Allyson Ettinger, Karen Livescu, Kevin Gimpel

    Abstract: Long document coreference resolution remains a challenging task due to the large memory and runtime requirements of current models. Recent work doing incremental coreference resolution using just the global representation of entities shows practical benefits but requires kee** all entities in memory, which can be impractical for long documents. We argue that kee** all entities in memory is unn… ▽ More

    Submitted 16 November, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Post EMNLP 2020 camera ready updates

  22. arXiv:2010.02789  [pdf, other

    cs.CL

    An Exploration of Arbitrary-Order Sequence Labeling via Energy-Based Inference Networks

    Authors: Lifu Tu, Tianyu Liu, Kevin Gimpel

    Abstract: Many tasks in natural language processing involve predicting structured outputs, e.g., sequence labeling, semantic role labeling, parsing, and machine translation. Researchers are increasingly applying deep representation learning to these problems, but the structured component of these approaches is usually quite simplistic. In this work, we propose several high-order energy terms to capture comp… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020. The first two authors contributed equally

  23. arXiv:2010.02423  [pdf, other

    cs.CL

    On the Role of Supervision in Unsupervised Constituency Parsing

    Authors: Haoyue Shi, Karen Livescu, Kevin Gimpel

    Abstract: We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1,700 sentences). We introduce strong baselines for them, by training an existing supervised parsing model (Kitaev and Klein, 2018) on the same labeled examples they access. When training on the 1,700 examples, or even when us… ▽ More

    Submitted 6 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020. Project page: https://ttic.uchicago.edu/~freda/project/rsucp/

  24. arXiv:2010.01239  [pdf, other

    cs.CL

    Mining Knowledge for Natural Language Inference from Wikipedia Categories

    Authors: Mingda Chen, Zewei Chu, Karl Stratos, Kevin Gimpel

    Abstract: Accurate lexical entailment (LE) and natural language inference (NLI) often require large quantities of costly annotations. To alleviate the need for labeled data, we introduce WikiNLI: a resource for improving model performance on NLI and LE tasks. It contains 428,899 pairs of phrases constructed from naturally annotated category hierarchies in Wikipedia. We show that we can improve strong baseli… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020

  25. arXiv:2009.14335  [pdf, other

    cs.CL

    NatCat: Weakly Supervised Text Classification with Naturally Annotated Resources

    Authors: Zewei Chu, Karl Stratos, Kevin Gimpel

    Abstract: We describe NatCat, a large-scale resource for text classification constructed from three data sources: Wikipedia, Stack Exchange, and Reddit. NatCat consists of document-category pairs derived from manual curation that occurs naturally within online communities. To demonstrate its usefulness, we build general purpose text classifiers by training on NatCat and evaluate them on a suite of 11 text c… ▽ More

    Submitted 19 September, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: AKBC 2021

  26. arXiv:2008.07027  [pdf, other

    cs.CL

    Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

    Authors: Davis Yoshida, Allyson Ettinger, Kevin Gimpel

    Abstract: Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years. While the results from these models are impressive, applying them can be extremely computationally expensive, as is pretraining new models with the latest architectures. We present a novel method for applying pretrained transformer language models which lowers their memory requirem… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

    Comments: 12 pages, 5 figures

  27. arXiv:2006.03866  [pdf, other

    cs.CL

    A Cross-Task Analysis of Text Span Representations

    Authors: Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel

    Abstract: Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words and sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluat… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

    Comments: RepL4NLP 2020

  28. arXiv:2005.08105  [pdf, ps, other

    cs.CL

    Learning Probabilistic Sentence Representations from Paraphrases

    Authors: Mingda Chen, Kevin Gimpel

    Abstract: Probabilistic word embeddings have shown effectiveness in capturing notions of generality and entailment, but there is very little work on doing the analogous type of investigation for sentences. In this paper we define probabilistic models that produce distributions for sentences. Our best-performing model treats each word as a linear transformation operator applied to a multivariate Gaussian dis… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Repl4NLP at ACL 2020, short paper

  29. arXiv:2005.02990  [pdf, other

    cs.CL cs.LG

    PeTra: A Sparsely Supervised Memory Model for People Tracking

    Authors: Shubham Toshniwal, Allyson Ettinger, Kevin Gimpel, Karen Livescu

    Abstract: We propose PeTra, a memory-augmented neural network designed to track entities in its memory slots. PeTra is trained using sparse annotation from the GAP pronoun resolution dataset and outperforms a prior memory model on the task while using a simpler architecture. We empirically compare key modeling choices, finding that we can simplify several aspects of the design of the memory module while ret… ▽ More

    Submitted 6 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  30. arXiv:2005.00850  [pdf, other

    cs.CL cs.LG

    ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation

    Authors: Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, Kevin Gimpel

    Abstract: We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model. In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled cor… ▽ More

    Submitted 12 May, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: ACL 2020 camera-ready version

  31. arXiv:1911.09247  [pdf, ps, other

    cs.CL

    How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

    Authors: Zewei Chu, Mingda Chen, **g Chen, Miaosen Wang, Kevin Gimpel, Manaal Faruqui, Xiance Si

    Abstract: We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting MQR dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate.… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

  32. arXiv:1911.02891  [pdf, other

    cs.CL cs.LG

    Improving Joint Training of Inference Networks and Structured Prediction Energy Networks

    Authors: Lifu Tu, Richard Yuanzhe Pang, Kevin Gimpel

    Abstract: Deep energy-based models are powerful, but pose challenges for learning and inference (Belanger and McCallum, 2016). Tu and Gimpel (2018) developed an efficient framework for energy-based models by training "inference networks" to approximate structured inference instead of using gradient descent. However, their alternating optimization approach suffers from instabilities during training, requirin… ▽ More

    Submitted 10 October, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: EMNLP 2020 Workshop on Structured Prediction for NLP (SPNLP)

  33. arXiv:1910.00382  [pdf, other

    cs.CL cs.LG

    Latent-Variable Generative Models for Data-Efficient Text Classification

    Authors: Xiaoan Ding, Kevin Gimpel

    Abstract: Generative classifiers offer potential advantages over their discriminative counterparts, namely in the areas of data efficiency, robustness to data shift and adversarial examples, and zero-shot learning (Ng and Jordan,2002; Yogatama et al., 2017; Lewis and Fan,2019). In this paper, we improve generative text classifiers by introducing discrete latent variables into the generative story, and explo… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: 11 pages, EMNLP 2019

  34. arXiv:1909.13872  [pdf, other

    cs.CL

    Simple and Effective Paraphrastic Similarity from Parallel Translations

    Authors: John Wieting, Kevin Gimpel, Graham Neubig, Taylor Berg-Kirkpatrick

    Abstract: We present a model and methodology for learning paraphrastic sentence embeddings directly from bitext, removing the time-consuming intermediate step of creating paraphrase corpora. Further, we show that the resulting model can be applied to cross-lingual tasks where it both outperforms and is orders of magnitude faster than more complex state-of-the-art baselines.

    Submitted 30 September, 2019; originally announced September 2019.

    Comments: Published as a short paper at ACL 2019

  35. arXiv:1909.13434  [pdf, other

    cs.CL cs.AI

    Generating Diverse Story Continuations with Controllable Semantics

    Authors: Lifu Tu, Xiaoan Ding, Dong Yu, Kevin Gimpel

    Abstract: We propose a simple and effective modeling framework for controlled generation of multiple, diverse outputs. We focus on the setting of generating the next sentence of a story given its context. As controllable dimensions, we consider several sentence attributes, including sentiment, length, predicates, frames, and automatically-induced clusters. Our empirical results demonstrate: (1) our framewor… ▽ More

    Submitted 1 June, 2020; v1 submitted 29 September, 2019; originally announced September 2019.

    Comments: WNGT 2019

  36. arXiv:1909.11942  [pdf, other

    cs.CL cs.AI

    ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

    Authors: Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

    Abstract: Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Compr… ▽ More

    Submitted 8 February, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

  37. arXiv:1909.06694  [pdf, other

    cs.CL

    Beyond BLEU: Training Neural Machine Translation with Semantic Similarity

    Authors: John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, Graham Neubig

    Abstract: While most neural machine translation (NMT) systems are still trained using maximum likelihood estimation, recent work has demonstrated that optimizing systems to directly improve evaluation metrics such as BLEU can substantially improve final translation accuracy. However, training with BLEU has some limitations: it doesn't assign partial credit, it has a limited range of output values, and it ca… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: Published as a long paper at ACL 2019

  38. arXiv:1909.00142  [pdf, other

    cs.CL

    Evaluation Benchmarks and Learning Criteria for Discourse-Aware Sentence Representations

    Authors: Mingda Chen, Zewei Chu, Kevin Gimpel

    Abstract: Prior work on pretrained sentence embeddings and benchmarks focus on the capabilities of stand-alone sentences. We propose DiscoEval, a test suite of tasks to evaluate whether sentence representations include broader context information. We also propose a variety of training objectives that makes use of natural annotations from Wikipedia to build sentence encoders capable of modeling discourse. We… ▽ More

    Submitted 7 November, 2019; v1 submitted 31 August, 2019; originally announced September 2019.

    Comments: EMNLP 2019. Updated results and fixed typos

  39. arXiv:1909.00137  [pdf, other

    cs.CL

    EntEval: A Holistic Evaluation Benchmark for Entity Representations

    Authors: Mingda Chen, Zewei Chu, Yang Chen, Karl Stratos, Kevin Gimpel

    Abstract: Rich entity representations are useful for a wide class of problems involving entities. Despite their importance, there is no standardized benchmark that evaluates the overall quality of entity representations. In this work, we propose EntEval: a test suite of diverse tasks that require nontrivial understanding of entities including entity ty**, entity similarity, entity relation prediction, and… ▽ More

    Submitted 11 November, 2019; v1 submitted 31 August, 2019; originally announced September 2019.

    Comments: EMNLP 2019. Fixed typo

  40. arXiv:1906.09535  [pdf, other

    cs.CL

    Variational Sequential Labelers for Semi-Supervised Learning

    Authors: Mingda Chen, Qingming Tang, Karen Livescu, Kevin Gimpel

    Abstract: We introduce a family of multitask variational methods for semi-supervised sequence labeling. Our model family consists of a latent-variable generative model and a discriminative labeler. The generative models use latent variables to define the conditional probability of a word given its context, drawing inspiration from word prediction objectives commonly used in learning word embeddings. The lab… ▽ More

    Submitted 22 June, 2019; originally announced June 2019.

    Comments: Appeared in EMNLP 2018 Long

  41. arXiv:1906.09532  [pdf, other

    cs.CL

    Smaller Text Classifiers with Discriminative Cluster Embeddings

    Authors: Mingda Chen, Kevin Gimpel

    Abstract: Word embedding parameters often dominate overall model sizes in neural methods for natural language processing. We reduce deployed model sizes of text classifiers by learning a hard word clustering in an end-to-end manner. We use the Gumbel-Softmax distribution to maximize over the latent clustering while minimizing the task loss. We propose variations that selectively assign additional parameters… ▽ More

    Submitted 22 June, 2019; originally announced June 2019.

    Comments: Appeared in NAACL 2018 short

  42. Visually Grounded Neural Syntax Acquisition

    Authors: Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu

    Abstract: We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking at natural images and reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes representations for constituents, and matches them with images. We define concreteness… ▽ More

    Submitted 24 September, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

    Comments: ACL 2019. Project page: https://ttic.uchicago.edu/~freda/project/vgnsl/

  43. arXiv:1906.00565  [pdf, other

    cs.CL

    Controllable Paraphrase Generation with a Syntactic Exemplar

    Authors: Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

    Abstract: Prior work on controllable text generation usually assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sentence is controlled rather by a sentential exemplar. To evaluate quantitatively with standard metrics, we create a novel dataset with human annotations. We also develop a variation… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: ACL 2019 Long

  44. arXiv:1904.03111  [pdf, other

    cs.CL

    PoMo: Generating Entity-Specific Post-Modifiers in Context

    Authors: Jun Seok Kang, Robert L. Logan IV, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian

    Abstract: We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually… ▽ More

    Submitted 8 April, 2019; v1 submitted 5 April, 2019; originally announced April 2019.

    Comments: NAACL-HLT 2019

  45. arXiv:1904.01173  [pdf, other

    cs.CL

    A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations

    Authors: Mingda Chen, Qingming Tang, Sam Wiseman, Kevin Gimpel

    Abstract: We propose a generative model for a sentence that uses two latent variables, with one intended to represent the syntax of the sentence and the other to represent its semantics. We show we can achieve better disentanglement between semantic and syntactic representations by training with multiple losses, including losses that exploit aligned paraphrastic sentences and word-order information. We also… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: NAACL 2019 Long paper

    Journal ref: NAACL 2019

  46. arXiv:1904.01138  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Benchmarking Approximate Inference Methods for Neural Structured Prediction

    Authors: Lifu Tu, Kevin Gimpel

    Abstract: Exact structured inference with neural network scoring functions is computationally challenging but several methods have been proposed for approximating inference. One approach is to perform gradient descent with respect to the output structure directly (Belanger and McCallum, 2016). Another approach, proposed recently, is to train a neural network (an "inference network") to perform inference (Tu… ▽ More

    Submitted 6 July, 2019; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: NAACL2019 camera-ready version

  47. arXiv:1810.11878  [pdf, other

    cs.CL cs.AI

    Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer

    Authors: Richard Yuanzhe Pang, Kevin Gimpel

    Abstract: We consider the problem of automatically generating textual paraphrases with modified attributes or properties, focusing on the setting without parallel data (Hu et al., 2017; Shen et al., 2017). This setting poses challenges for evaluation. We show that the metric of post-transfer classification accuracy is insufficient on its own, and propose additional metrics based on semantic preservation and… ▽ More

    Submitted 30 September, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

    Comments: EMNLP 2019 Workshop on Neural Generation and Translation (WNGT)

  48. arXiv:1804.06059  [pdf, other

    cs.CL

    Adversarial Example Generation with Syntactically Controlled Paraphrase Networks

    Authors: Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer

    Abstract: We propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples. Given a sentence and a target syntactic form (e.g., a constituency parse), SCPNs are trained to produce a paraphrase of the sentence with the desired syntax. We show it is possible to create training data for this task by first doing backtranslation at a very large scale, and then using a… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: NAACL 2018

  49. arXiv:1803.03376  [pdf, other

    cs.CL cs.LG stat.ML

    Learning Approximate Inference Networks for Structured Prediction

    Authors: Lifu Tu, Kevin Gimpel

    Abstract: Structured prediction energy networks (SPENs; Belanger & McCallum 2016) use neural network architectures to define energy functions that can capture arbitrary dependencies among parts of structured outputs. Prior work used gradient descent for inference, relaxing the structured output to a set of continuous variables and then optimizing the energy with respect to them. We replace this use of gradi… ▽ More

    Submitted 8 March, 2018; originally announced March 2018.

    Comments: accepted by ICLR2018

  50. arXiv:1802.05300  [pdf, other

    cs.LG cs.CL cs.CV cs.NE

    Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

    Authors: Dan Hendrycks, Mantas Mazeika, Duncan Wilson, Kevin Gimpel

    Abstract: The growing importance of massive datasets used for deep learning makes robustness to label noise a critical property for classifiers to have. Sources of label noise include automatic labeling, non-expert labeling, and label corruption by data poisoning adversaries. Numerous previous works assume that no source of labels can be trusted. We relax this assumption and assume that a small subset of th… ▽ More

    Submitted 28 January, 2019; v1 submitted 14 February, 2018; originally announced February 2018.

    Comments: NeurIPS 2018. PyTorch code available at https://github.com/mmazeika/glc