Skip to main content

Showing 101–115 of 115 results for author: Socher, R

.
  1. arXiv:1708.01009  [pdf, ps, other

    cs.CL cs.NE

    Revisiting Activation Regularization for Language RNNs

    Authors: Stephen Merity, Bryan McCann, Richard Socher

    Abstract: Recurrent neural networks (RNNs) serve as a fundamental building block for many sequence tasks across natural language processing. Recent research has focused on recurrent dropout techniques or custom RNN cells in order to improve performance. Both of these can require substantial modifications to the machine learning model or to the underlying RNN configurations. We revisit traditional regulariza… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

  2. arXiv:1708.00107  [pdf, other

    cs.CL cs.AI cs.LG

    Learned in Translation: Contextualized Word Vectors

    Authors: Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher

    Abstract: Computer vision has benefited from initializing multiple deep layers with weights pretrained on large supervised training sets like ImageNet. Natural language processing (NLP) typically sees initialization of only the lowest layer of deep models with pretrained word vectors. In this paper, we use a deep LSTM encoder from an attentional sequence-to-sequence model trained for machine translation (MT… ▽ More

    Submitted 20 June, 2018; v1 submitted 31 July, 2017; originally announced August 2017.

  3. arXiv:1705.04304  [pdf, other

    cs.CL

    A Deep Reinforced Model for Abstractive Summarization

    Authors: Romain Paulus, Caiming Xiong, Richard Socher

    Abstract: Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new tr… ▽ More

    Submitted 13 November, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

  4. arXiv:1612.01887  [pdf, other

    cs.CV cs.AI

    Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

    Authors: Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher

    Abstract: Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as "the" and "of". Other words that may seem visual can often be predicted reliably just from the language mode… ▽ More

    Submitted 6 June, 2017; v1 submitted 6 December, 2016; originally announced December 2016.

    Comments: 12 pages, 11 figures, CVPR2017 camera ready

  5. arXiv:1611.05104  [pdf, other

    cs.CL cs.AI

    A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

    Authors: Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher

    Abstract: LSTMs have become a basic building block for many deep NLP models. In recent years, many improvements and variations have been proposed for deep sequence models in general, and LSTMs in particular. We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets. We observe compounding improvements on traditional… ▽ More

    Submitted 17 December, 2016; v1 submitted 15 November, 2016; originally announced November 2016.

  6. arXiv:1611.01604  [pdf, other

    cs.CL cs.AI

    Dynamic Coattention Networks For Question Answering

    Authors: Caiming Xiong, Victor Zhong, Richard Socher

    Abstract: Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on r… ▽ More

    Submitted 6 March, 2018; v1 submitted 5 November, 2016; originally announced November 2016.

    Comments: 14 pages, 7 figures, International Conference on Learning Representations 2017

  7. arXiv:1611.01587  [pdf, other

    cs.CL cs.AI

    A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

    Authors: Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher

    Abstract: Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layer… ▽ More

    Submitted 24 July, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

    Comments: Accepted as a full paper at the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017)

  8. arXiv:1611.01576  [pdf, other

    cs.NE cs.AI cs.CL cs.LG

    Quasi-Recurrent Neural Networks

    Authors: James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

    Abstract: Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps… ▽ More

    Submitted 21 November, 2016; v1 submitted 4 November, 2016; originally announced November 2016.

    Comments: Submitted to conference track at ICLR 2017

  9. arXiv:1611.01462  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

    Authors: Hakan Inan, Khashayar Khosravi, Richard Socher

    Abstract: Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against one-hot targets, and each word is represented both as an input and as an output in isolation. This causes inefficiencies in learning both in terms of utilizing all… ▽ More

    Submitted 11 March, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

  10. arXiv:1609.07843  [pdf, other

    cs.CL cs.AI

    Pointer Sentinel Mixture Models

    Authors: Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

    Abstract: Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either… ▽ More

    Submitted 26 September, 2016; originally announced September 2016.

  11. arXiv:1603.01417  [pdf, other

    cs.NE cs.CL cs.CV

    Dynamic Memory Networks for Visual and Textual Question Answering

    Authors: Caiming Xiong, Stephen Merity, Richard Socher

    Abstract: Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training… ▽ More

    Submitted 4 March, 2016; originally announced March 2016.

  12. arXiv:1506.07285  [pdf, other

    cs.CL cs.LG cs.NE

    Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

    Authors: Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher

    Abstract: Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the… ▽ More

    Submitted 5 March, 2016; v1 submitted 24 June, 2015; originally announced June 2015.

  13. arXiv:1503.00075  [pdf, other

    cs.CL cs.AI cs.LG

    Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

    Authors: Kai Sheng Tai, Richard Socher, Christopher D. Manning

    Abstract: Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properti… ▽ More

    Submitted 30 May, 2015; v1 submitted 28 February, 2015; originally announced March 2015.

    Comments: Accepted for publication at ACL 2015

  14. arXiv:1301.3666  [pdf, other

    cs.CV cs.LG

    Zero-Shot Learning Through Cross-Modal Transfer

    Authors: Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng

    Abstract: This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learn… ▽ More

    Submitted 19 March, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

  15. arXiv:1301.3618  [pdf, ps, other

    cs.CL cs.LG

    Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors

    Authors: Danqi Chen, Richard Socher, Christopher D. Manning, Andrew Y. Ng

    Abstract: Knowledge bases provide applications with the benefit of easily accessible, systematic relational knowledge but often suffer in practice from their incompleteness and lack of knowledge of new entities and relations. Much work has focused on building or extending them by finding patterns in large unannotated text corpora. In contrast, here we mainly aim to complete a knowledge base by predicting ad… ▽ More

    Submitted 15 March, 2013; v1 submitted 16 January, 2013; originally announced January 2013.