Skip to main content

Showing 51–100 of 115 results for author: Socher, R

.
  1. arXiv:1910.00164  [pdf, other

    stat.ML cs.LG

    Predicting with High Correlation Features

    Authors: Devansh Arpit, Caiming Xiong, Richard Socher

    Abstract: It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art performance on such test sets, they achieve poor generalization on out of distribution (OOD) samples where the IID (independent, identical distribution) assumptio… ▽ More

    Submitted 16 November, 2019; v1 submitted 30 September, 2019; originally announced October 2019.

  2. arXiv:1909.05858  [pdf, other

    cs.CL

    CTRL: A Conditional Transformer Language Model for Controllable Generation

    Authors: Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher

    Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw t… ▽ More

    Submitted 20 September, 2019; v1 submitted 11 September, 2019; originally announced September 2019.

  3. arXiv:1909.05378  [pdf, other

    cs.CL cs.AI

    CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

    Authors: Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter S Lasecki, Dragomir Radev

    Abstract: We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert re… ▽ More

    Submitted 11 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019, long paper

  4. arXiv:1909.03290  [pdf, other

    cs.CY cs.AI cs.LG

    Pretrained AI Models: Performativity, Mobility, and Change

    Authors: Lav R. Varshney, Nitish Shirish Keskar, Richard Socher

    Abstract: The paradigm of pretrained deep learning models has recently emerged in artificial intelligence practice, allowing deployment in numerous societal settings with limited computational resources, but also embedding biases and enabling unintended negative uses. In this paper, we treat pretrained models as objects of study and discuss the ethical impacts of their sociological position. We discuss how… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

  5. arXiv:1909.03223  [pdf, other

    cs.CL

    Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression

    Authors: Tong Niu, Caiming Xiong, Richard Socher

    Abstract: Text compression has diverse applications such as Summarization, Reading Comprehension and Text Editing. However, almost all existing approaches require either hand-crafted features, syntactic labels or parallel data. Even for one that achieves this task in an unsupervised setting, its architecture necessitates a task-specific autoencoder. Moreover, these models only generate one compressed senten… ▽ More

    Submitted 7 September, 2019; originally announced September 2019.

    Comments: 5 pages, 1 figure (presented @ WeCNLP)

  6. arXiv:1909.00786  [pdf, other

    cs.CL

    Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions

    Authors: Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev

    Abstract: We focus on the cross-domain context-dependent text-to-SQL generation task. Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality. Our editing mechanism views SQL as sequences and reuses gene… ▽ More

    Submitted 9 September, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  7. arXiv:1909.00239  [pdf, other

    cs.CV

    WSLLN: Weakly Supervised Natural Language Localization Networks

    Authors: Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong

    Abstract: We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries. To learn the correspondence between visual segments and texts, most previous methods require temporal coordinates (start and end times) of events for training, which leads to high costs of annotation. WSLLN relieves the annotation burden by training with only video… ▽ More

    Submitted 31 August, 2019; originally announced September 2019.

    Comments: accepted by EMNLP2019

  8. arXiv:1908.08960  [pdf, other

    cs.CL

    Neural Text Summarization: A Critical Evaluation

    Authors: Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

    Abstract: Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document. Despite increased interest in the community and notable research effort, progress on benchmark datasets has stagnated. We critically evaluate key ingredients of the current research setup: datasets, evaluation metrics, and models, and highlight three primary sho… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: To appear in EMNLP 2019, 13 pages, 2 figures, 6 tables

  9. arXiv:1907.00664  [pdf, other

    cs.LG stat.ML

    Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

    Authors: Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: In many real-world scenarios, an autonomous agent often encounters various tasks within a single complex environment. We propose to build a graph abstraction over the environment structure to accelerate the learning of these tasks. Here, nodes are important points of interest (pivotal states) and edges represent feasible traversals between them. Our approach has two stages. First, we jointly train… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  10. arXiv:1906.02361  [pdf, other

    cs.CL

    Explain Yourself! Leveraging Language Models for Commonsense Reasoning

    Authors: Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher

    Abstract: Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world-knowledge or reasoning over information not immediately present in the input. We collect human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations (CoS-E). We use CoS-E… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL, 11 pages total

    Journal ref: In Proceedings of the Association for Computational Linguistics (ACL), 2019. Florence, Italy

  11. arXiv:1906.02285  [pdf, other

    cs.CL cs.AI

    SParC: Cross-Domain Semantic Parsing in Context

    Authors: Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev

    Abstract: We present SParC, a dataset for cross-domainSemanticParsing inContext that consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries). It is obtained from controlled user interactions with 200 complex databases over 138 domains. We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets. SParC demonstr… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019, long paper

  12. arXiv:1905.12654  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On the Generalization Gap in Reparameterizable Reinforcement Learning

    Authors: Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher

    Abstract: Understanding generalization in reinforcement learning (RL) is a significant challenge, as many common assumptions of traditional supervised learning theory do not apply. We focus on the special class of reparameterizable RL problems, where the trajectory distribution can be decomposed using the reparametrization trick. For this problem class, estimating the expected return is efficient and the tr… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Journal ref: Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019

  13. arXiv:1905.11471  [pdf, other

    cs.CL cs.AI cs.LG

    XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

    Authors: Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-l… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  14. arXiv:1905.08743  [pdf, other

    cs.CL cs.AI

    Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems

    Authors: Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung

    Abstract: Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking. Existing approaches generally fall short in tracking unknown slot values during inference and often have difficulties in adapting to new domains. In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue… ▽ More

    Submitted 26 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

    Comments: The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)

  15. arXiv:1904.09286  [pdf, other

    cs.CL

    Unifying Question Answering, Text Classification, and Regression via Span Extraction

    Authors: Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

    Abstract: Even as pre-trained language encoders such as BERT are shared across many tasks, the output layers of question answering, text classification, and regression models are significantly different. Span decoders are frequently used for question answering, fixed-class, classification layers for text classification, and similarity-scoring layers for regression tasks, We show that this distinction is not… ▽ More

    Submitted 20 September, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

    Comments: updating paper to also include regression tasks

  16. Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands

    Authors: Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, Monica S. Lam

    Abstract: To understand diverse natural language commands, virtual assistants today are trained with numerous labor-intensive, manually annotated sentences. This paper presents a methodology and the Genie toolkit that can handle new compound commands with significantly less manual effort. We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and us… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: To appear in PLDI 2019

  17. arXiv:1904.00310  [pdf, other

    cs.LG cs.CV

    Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

    Authors: Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, Caiming Xiong

    Abstract: Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework… ▽ More

    Submitted 21 May, 2019; v1 submitted 30 March, 2019; originally announced April 2019.

  18. arXiv:1903.09868  [pdf, other

    cs.CV

    StartNet: Online Detection of Action Start in Untrimmed Videos

    Authors: Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong

    Abstract: We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos. Previous methods aim to localize action starts by learning feature representations that can directly separate the start point from its preceding background. It is challenging due to the subtle appearance difference near the action s… ▽ More

    Submitted 23 March, 2019; originally announced March 2019.

  19. arXiv:1902.00528  [pdf, other

    cs.LG stat.ML

    Competitive Experience Replay

    Authors: Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong

    Abstract: Deep learning has achieved remarkable successes in solving challenging reinforcement learning (RL) problems when dense reward function is provided. However, in sparse reward environment it still often suffers from the need to carefully shape reward function to guide policy optimization. This limits the applicability of RL in the real world since both reinforcement learning and domain-specific know… ▽ More

    Submitted 16 February, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: Published as a conference paper at Seventh International Conference on Learning Representations(ICLR 2019)

  20. arXiv:1901.04713  [pdf, other

    cs.CL cs.AI

    Global-to-local Memory Pointer Networks for Task-Oriented Dialogue

    Authors: Chien-Sheng Wu, Richard Socher, Caiming Xiong

    Abstract: End-to-end task-oriented dialogue is challenging since knowledge bases are usually large, dynamic and hard to incorporate into a learning framework. We propose the global-to-local memory pointer (GLMP) networks to address this issue. In our model, a global memory encoder and a local memory decoder are proposed to share external knowledge. The encoder encodes dialogue history, modifies global conte… ▽ More

    Submitted 29 March, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

    Comments: ICLR 2019

  21. arXiv:1901.03035  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

    Authors: Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong

    Abstract: The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary… ▽ More

    Submitted 10 January, 2019; originally announced January 2019.

    Comments: ICLR 2019, code is available at https://github.com/chihyaoma/selfmonitoring-agent

  22. arXiv:1901.00603  [pdf, other

    cs.CL cs.AI

    Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering

    Authors: Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher

    Abstract: End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a single document. In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents. The CF… ▽ More

    Submitted 13 May, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

    Comments: ICLR 2019; 9 pages, 7 figures

  23. arXiv:1811.12432  [pdf, other

    cs.CV

    AdaFrame: Adaptive Frame Selection for Fast Video Recognition

    Authors: Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis

    Abstract: We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition. AdaFrame contains a Long Short-Term Memory network augmented with a global memory that provides context information for searching which frames to use over time. Trained with policy gradient methods, AdaFrame generates a prediction, determines which frame to observe next, and co… ▽ More

    Submitted 10 April, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: CVPR 2019

  24. arXiv:1810.13243  [pdf, other

    cs.LG stat.ML

    A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation

    Authors: Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: The convergence rate and final performance of common deep learning models have significantly benefited from heuristics such as learning rate schedules, knowledge distillation, skip connections, and normalization layers. In the absence of theoretical underpinnings, controlled experiments aimed at explaining these strategies can aid our understanding of deep learning landscapes and the training dyna… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: We use empirical tools of mode connectivity and SVCCA to investigate neural network training heuristics of learning rate restarts, warmup and knowledge distillation. arXiv admin note: text overlap with arXiv:1806.06977

  25. arXiv:1809.07402  [pdf, other

    cs.LG stat.ML

    Identifying Generalization Properties in Neural Networks

    Authors: Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: While it has not yet been proven, empirical evidence suggests that model generalization is related to local properties of the optima which can be described via the Hessian. We connect model generalization with the local property of a solution under the PAC-Bayes paradigm. In particular, we prove that model generalization ability is related to the Hessian, the higher-order "smoothness" terms charac… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: 23 pages

  26. arXiv:1808.10568  [pdf, other

    cs.AI cs.CL cs.LG

    Multi-Hop Knowledge Graph Reasoning with Reward Sha**

    Authors: Xi Victoria Lin, Richard Socher, Caiming Xiong

    Abstract: Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training… ▽ More

    Submitted 11 September, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

    Comments: Accepted to EMNLP 2018, 12 pages

  27. arXiv:1808.07913  [pdf, other

    cs.CL

    Improving Abstraction in Text Summarization

    Authors: Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher

    Abstract: Abstractive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document. However, the level of actual abstraction as measured by novel phrases that do not appear in the source document remains low in existing approaches. We propose two techniques to improve the level of abstraction of generated summaries. First… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

  28. arXiv:1807.00374  [pdf, other

    cs.LG stat.ML

    Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation

    Authors: Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

    Abstract: Training a model to perform a task typically requires a large amount of data from the domains in which the task will be applied. However, it is often the case that data are abundant in some domains but scarce in others. Domain adaptation deals with the challenge of adapting a model trained from a data-rich source domain to perform well in a data-poor target domain. In general, this requires learni… ▽ More

    Submitted 23 January, 2019; v1 submitted 1 July, 2018; originally announced July 2018.

    Comments: 14 pages, 5 figures, 8 tables; Accepted as a conference paper at ICLR 2019

  29. arXiv:1806.08730  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    The Natural Language Decathlon: Multitask Learning as Question Answering

    Authors: Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: Deep learning has improved performance on many natural language processing (NLP) tasks individually. However, general NLP models cannot emerge within a paradigm that focuses on the particularities of a single metric, dataset, and task. We introduce the Natural Language Decathlon (decaNLP), a challenge that spans ten tasks: question answering, machine translation, summarization, natural language in… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

  30. arXiv:1806.06977  [pdf, ps, other

    cs.LG stat.ML

    Using Mode Connectivity for Loss Landscape Analysis

    Authors: Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

    Abstract: Mode connectivity is a recently introduced frame- work that empirically establishes the connected- ness of minima by finding a high accuracy curve between two independently trained models. To investigate the limits of this setup, we examine the efficacy of this technique in extreme cases where the input models are trained or initialized differently. We find that the procedure is resilient to such… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Accepted as a workshop paper at ICML's Workshop on Modern Trends in Nonconvex Optimization for Machine Learning, 2018

  31. arXiv:1805.09655  [pdf, other

    cs.CL cs.AI

    Global-Locally Self-Attentive Dialogue State Tracker

    Authors: Victor Zhong, Caiming Xiong, Richard Socher

    Abstract: Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems. In this paper, we propose the Global-Locally Self-Attentive Dialogue State Tracker (GLAD), which learns representations of the user utterance and previous system actions with global-local modules. Our model uses global modules to share parameters betw… ▽ More

    Submitted 6 September, 2018; v1 submitted 19 May, 2018; originally announced May 2018.

    Comments: ACL 2018. 10 pages, 5 figures. Source code: https://github.com/salesforce/glad

  32. arXiv:1805.08092  [pdf, other

    cs.CL

    Efficient and Robust Question Answering from Minimal Context over Documents

    Authors: Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong

    Abstract: Neural models for question answering (QA) over documents have achieved significant performance improvements. Although effective, these models do not scale to large corpora due to their complex modeling of interactions between the document and the question. Moreover, recent work has shown that such models are sensitive to adversarial inputs. In this paper, we study the minimal context required to a… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

    Comments: Published as a conference paper at ACL 2018 (long paper)

  33. arXiv:1804.00819  [pdf, other

    cs.CV

    End-to-End Dense Video Captioning with Masked Transformer

    Authors: Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong

    Abstract: Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevents d… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

    Comments: To appear at CVPR18

  34. arXiv:1804.00522  [pdf, other

    cs.CL cs.LG

    A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

    Authors: Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

    Abstract: Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources. We propose a novel generative model based on cyclic-consistent generative adversarial network (CycleGAN) for unsupervised non-parallel speech domain adaptation. The proposed model employs multiple independent discriminators on the power spectrogram, each in charge of differe… ▽ More

    Submitted 9 July, 2018; v1 submitted 27 March, 2018; originally announced April 2018.

    Comments: Accepted to Interspeech 2018

  35. arXiv:1803.08493  [pdf, other

    cs.CL

    Contextual Salience for Fast and Accurate Sentence Vectors

    Authors: Eric Zelikman, Richard Socher

    Abstract: Unsupervised vector representations of sentences or documents are a major building block for many language tasks such as sentiment classification. However, current methods are uninterpretable and slow or require large training datasets. Recent word vector-based proposals implicitly assume that distances in a word embedding space are equally important, regardless of context. We introduce contextual… ▽ More

    Submitted 2 November, 2020; v1 submitted 22 March, 2018; originally announced March 2018.

    ACM Class: I.2.7

  36. arXiv:1803.08240  [pdf, other

    cs.CL cs.AI cs.NE

    An Analysis of Neural Language Modeling at Multiple Scales

    Authors: Stephen Merity, Nitish Shirish Keskar, Richard Socher

    Abstract: Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-… ▽ More

    Submitted 22 March, 2018; originally announced March 2018.

  37. arXiv:1712.08697  [pdf, other

    cs.AI cs.CL cs.CV

    Interpretable Counting for Visual Question Answering

    Authors: Alexander Trott, Caiming Xiong, Richard Socher

    Abstract: Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process and… ▽ More

    Submitted 1 March, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: ICLR 2018

  38. arXiv:1712.07628  [pdf, other

    cs.LG math.OC

    Improving Generalization Performance by Switching from Adam to SGD

    Authors: Nitish Shirish Keskar, Richard Socher

    Abstract: Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switches… ▽ More

    Submitted 20 December, 2017; originally announced December 2017.

  39. arXiv:1712.07316  [pdf, other

    cs.CL cs.LG stat.ML

    A Flexible Approach to Automated RNN Architecture Generation

    Authors: Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher

    Abstract: The process of designing neural architectures requires expert knowledge and extensive trial and error. While automated architecture search may simplify these requirements, the recurrent neural network (RNN) architectures generated by existing methods are limited in both flexibility and components. We propose a domain-specific language (DSL) for use in automated architecture search which can produc… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  40. arXiv:1712.07296  [pdf, other

    cs.LG cs.AI stat.ML

    Block-diagonal Hessian-free Optimization for Training Neural Networks

    Authors: Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

    Abstract: Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of the… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    Comments: 10 pages, 3 figures

  41. arXiv:1712.07294  [pdf, other

    cs.AI

    Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning

    Authors: Tianmin Shu, Caiming Xiong, Richard Socher

    Abstract: Learning policies for complex tasks that require multiple different skills is a major challenge in reinforcement learning (RL). It is also a requirement for its deployment in real-world scenarios. This paper proposes a novel framework for efficient multi-task reinforcement learning. Our framework trains agents to employ hierarchical policies that decide when to use a previously learned policy and… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

    Comments: 14 pages, 6 figures

  42. arXiv:1712.07108  [pdf, other

    cs.CL cs.SD eess.AS stat.ML

    Improved Regularization Techniques for End-to-End Speech Recognition

    Authors: Yingbo Zhou, Caiming Xiong, Richard Socher

    Abstract: Regularization is important for end-to-end speech models, since the models are highly flexible and easy to overfit. Data augmentation and dropout has been important for improving end-to-end models in other domains. However, they are relatively under explored for end-to-end speech models. Therefore, we investigate the effectiveness of both methods for end-to-end trainable, deep speech recognition m… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  43. arXiv:1712.07101  [pdf, other

    cs.CL cs.SD eess.AS stat.ML

    Improving End-to-End Speech Recognition with Policy Learning

    Authors: Yingbo Zhou, Caiming Xiong, Richard Socher

    Abstract: Connectionist temporal classification (CTC) is widely used for maximum likelihood learning in end-to-end speech recognition models. However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e.g., word error rate (WER). This results in a mismatch between the objective function and metric during training. We show that the abo… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  44. arXiv:1712.05483  [pdf, other

    cs.CL

    Learning when to skim and when to read

    Authors: Alexander Rosenberg Johansen, Richard Socher

    Abstract: Many recent advances in deep learning for natural language processing have come at increasing computational cost, but the power of these state-of-the-art models is not needed for every example in a dataset. We demonstrate two approaches to reducing unnecessary computation in cases where a fast but weak baseline classier and a stronger, slower model are both available. Applying an AUC-based metric… ▽ More

    Submitted 14 December, 2017; originally announced December 2017.

    Comments: 8 pages (4 article, 1 references, 3 appendix), 11 figures, 3 tables, published at ACL2017 workshop Repl4NLP

  45. arXiv:1711.02281  [pdf, other

    cs.CL cs.LG

    Non-Autoregressive Neural Machine Translation

    Authors: Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher

    Abstract: Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we ac… ▽ More

    Submitted 8 March, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: Accepted by ICLR 2018

  46. arXiv:1711.02132  [pdf, other

    cs.AI cs.CL

    Weighted Transformer Network for Machine Translation

    Authors: Karim Ahmed, Nitish Shirish Keskar, Richard Socher

    Abstract: State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine tran… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  47. arXiv:1711.00106  [pdf, other

    cs.CL cs.AI

    DCN+: Mixed Objective and Deep Residual Coattention for Question Answering

    Authors: Caiming Xiong, Victor Zhong, Richard Socher

    Abstract: Traditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlap** answers that are sometimes equally accurate. We propose a mixed objective that combines cross entropy loss with self-critical policy learning. The objective uses rewards derived from word overlap to solve the misalignment between evaluation met… ▽ More

    Submitted 10 November, 2017; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: 10 pages, 6 figures

  48. arXiv:1709.01915  [pdf, other

    cs.CL cs.AI

    Towards Neural Machine Translation with Latent Tree Attention

    Authors: James Bradbury, Richard Socher

    Abstract: Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing. We introduce such a model for the task of machine translation, pairing a recurrent neural network grammar encoder with a novel attentional RNNG decoder and applying policy gradient reinforcement learning to induce unsupervised tree structu… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

    Comments: Presented at SPNLP 2017

  49. arXiv:1709.00103  [pdf, other

    cs.CL cs.AI

    Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

    Authors: Victor Zhong, Caiming Xiong, Richard Socher

    Abstract: A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL, a deep neural network for translating natural language questions to corresponding SQL queries. Our model leverages the structure of SQL queries to significantly… ▽ More

    Submitted 9 November, 2017; v1 submitted 31 August, 2017; originally announced September 2017.

    Comments: 12 pages, 5 figures

  50. arXiv:1708.02182  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Regularizing and Optimizing LSTM Language Models

    Authors: Stephen Merity, Nitish Shirish Keskar, Richard Socher

    Abstract: Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose th… ▽ More

    Submitted 7 August, 2017; originally announced August 2017.