-
Corpus-level Fine-grained Entity Ty** Using Contextual Information
Authors:
Yadollah Yaghoobzadeh,
Hinrich Schütze
Abstract:
This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i)…
▽ More
This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i) a global model that scores based on aggregated contextual information of an entity and (ii) a context model that first scores the individual occurrences of an entity and then aggregates the scores. In our evaluation, FIGMENT strongly outperforms an approach to entity ty** that relies on relations obtained by an open information extraction system.
△ Less
Submitted 25 June, 2016;
originally announced June 2016.
-
Simple Question Answering by Attentive Convolutional Neural Network
Authors:
Wenpeng Yin,
Mo Yu,
Bing Xiang,
Bowen Zhou,
Hinrich Schütze
Abstract:
This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity menti…
▽ More
This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity mention in the question by a character-level convolutional neural network (char-CNN), and match the predicate in that fact with the question by a word-level CNN (word-CNN). This work makes two main contributions. (i) A simple and effective entity linker over Freebase is proposed. Our entity linker outperforms the state-of-the-art entity linker over SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so that the predicate representation can be matched with the predicate-focused question representation more effectively. Experiments show that our system sets new state-of-the-art in this task.
△ Less
Submitted 11 October, 2016; v1 submitted 10 June, 2016;
originally announced June 2016.
-
Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection
Authors:
Katharina Kann,
Hinrich Schütze
Abstract:
Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low…
▽ More
Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low-resource languages. We further present a new automatic correction method for the outputs based on edit trees.
△ Less
Submitted 2 June, 2016;
originally announced June 2016.
-
Combining Recurrent and Convolutional Neural Networks for Relation Classification
Authors:
Ngoc Thang Vu,
Heike Adel,
Pankaj Gupta,
Hinrich Schütze
Abstract:
This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks. For both models, we demonstrate the effect of different architectural choices. We present a new context representation for convolutional neural networks for relation classification (extended middle context). Furthermore, we propose connect…
▽ More
This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks. For both models, we demonstrate the effect of different architectural choices. We present a new context representation for convolutional neural networks for relation classification (extended middle context). Furthermore, we propose connectionist bi-directional recurrent neural networks and introduce ranking loss for their optimization. Finally, we show that combining convolutional and recurrent neural networks using a simple voting scheme is accurate enough to improve results. Our neural models achieve state-of-the-art results on the SemEval 2010 relation classification task.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Why and How to Pay Different Attention to Phrase Alignments of Different Intensities
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible gran…
▽ More
This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible granularity are alignments between two single words, between a single word and a phrase and between a short phrase and a long phrase. By intensity we roughly mean the degree of match, it ranges from identity over surface-form co-occurrence, rephrasing and other semantic relatedness to unrelated words as in lots of parenthesis text. Prior work (i) has limitations in phrase generation and representation, or (ii) conducts alignment at word and phrase levels by handcrafted features or (iii) utilizes a single attention mechanism over alignment intensities without considering the characteristics of specific tasks, which limits the system's effectiveness across tasks. We propose an architecture based on Gated Recurrent Unit that supports (i) representation learning of phrases of arbitrary granularity and (ii) task-specific focusing of phrase alignments between two sentences by attention pooling. Experimental results on TE and AS match our observation and are state-of-the-art.
△ Less
Submitted 13 June, 2016; v1 submitted 23 April, 2016;
originally announced April 2016.
-
Discriminative Phrase Embedding for Paraphrase Identification
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
This work, concerning paraphrase identification task, on one hand contributes to expanding deep learning embeddings to include continuous and discontinuous linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN to learn the discriminative weights of words and phrases specific to paraphrase task, so that a weighted sum of embeddings can represent sentences more effectively.…
▽ More
This work, concerning paraphrase identification task, on one hand contributes to expanding deep learning embeddings to include continuous and discontinuous linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN to learn the discriminative weights of words and phrases specific to paraphrase task, so that a weighted sum of embeddings can represent sentences more effectively. Based on these two innovations we get competitive state-of-the-art performance on paraphrase identification.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.
-
Online Updating of Word Representations for Part-of-Speech Tagging
Authors:
Wenpeng Yin,
Tobias Schnabel,
Hinrich Schütze
Abstract:
We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA.
We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.
-
Comparing Convolutional Neural Networks to Traditional Models for Slot Filling
Authors:
Heike Adel,
Benjamin Roth,
Hinrich Schütze
Abstract:
We address relation classification in the context of slot filling, the task of finding and evaluating fillers like "Steve Jobs" for the slot X in "X founded Apple". We propose a convolutional neural network which splits the input sentence into three parts according to the relation arguments and compare it to state-of-the-art and traditional approaches of relation classification. Finally, we combin…
▽ More
We address relation classification in the context of slot filling, the task of finding and evaluating fillers like "Steve Jobs" for the slot X in "X founded Apple". We propose a convolutional neural network which splits the input sentence into three parts according to the relation arguments and compare it to state-of-the-art and traditional approaches of relation classification. Finally, we combine different methods and show that the combination is better than individual approaches. We also analyze the effect of genre differences on performance.
△ Less
Submitted 4 April, 2016; v1 submitted 16 March, 2016;
originally announced March 2016.
-
Multichannel Variable-Size Convolution for Sentence Classification
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
We propose MVCNN, a convolution neural network (CNN) architecture for sentence classification. It (i) combines diverse versions of pretrained word embeddings and (ii) extracts features of multigranular phrases with variable-size convolution filters. We also show that pretraining MVCNN is critical for good performance. MVCNN achieves state-of-the-art performance on four tasks: on small-scale binary…
▽ More
We propose MVCNN, a convolution neural network (CNN) architecture for sentence classification. It (i) combines diverse versions of pretrained word embeddings and (ii) extracts features of multigranular phrases with variable-size convolution filters. We also show that pretraining MVCNN is critical for good performance. MVCNN achieves state-of-the-art performance on four tasks: on small-scale binary, small-scale multi-class and largescale Twitter sentiment prediction and on subjectivity classification.
△ Less
Submitted 14 March, 2016;
originally announced March 2016.
-
Ultradense Word Embeddings by Orthogonal Transformation
Authors:
Sascha Rothe,
Sebastian Ebert,
Hinrich Schütze
Abstract:
Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER…
▽ More
Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER reach state of the art on a lexicon creation task in which words are annotated with three types of lexical information - sentiment, concreteness and frequency. On the SemEval2015 10B sentiment analysis task we show that no information is lost when the ultradense subspace is used, but training is an order of magnitude more efficient due to the compactness of the ultradense space.
△ Less
Submitted 8 May, 2016; v1 submitted 24 February, 2016;
originally announced February 2016.
-
Attention-Based Convolutional Neural Network for Machine Comprehension
Authors:
Wenpeng Yin,
Sebastian Ebert,
Hinrich Schütze
Abstract:
Understanding open-domain text is one of the primary challenges in natural language processing (NLP). Machine comprehension benchmarks evaluate the system's ability to understand text based on the text content only. In this work, we investigate machine comprehension on MCTest, a question answering (QA) benchmark. Prior work is mainly based on feature engineering approaches. We come up with a neura…
▽ More
Understanding open-domain text is one of the primary challenges in natural language processing (NLP). Machine comprehension benchmarks evaluate the system's ability to understand text based on the text content only. In this work, we investigate machine comprehension on MCTest, a question answering (QA) benchmark. Prior work is mainly based on feature engineering approaches. We come up with a neural network framework, named hierarchical attention-based convolutional neural network (HABCNN), to address this task without any manually designed features. Specifically, we explore HABCNN for this task by two routes, one is through traditional joint modeling of passage, question and answer, one is through textual entailment. HABCNN employs an attention mechanism to detect key phrases, key sentences and key snippets that are relevant to answering the question. Experiments show that HABCNN outperforms prior deep learning approaches by a big margin.
△ Less
Submitted 13 February, 2016;
originally announced February 2016.
-
ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs
Authors:
Wenpeng Yin,
Hinrich Schütze,
Bing Xiang,
Bowen Zhou
Abstract:
How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, t…
▽ More
How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNN achieves state-of-the-art performance on AS, PI and TE tasks.
△ Less
Submitted 25 June, 2018; v1 submitted 16 December, 2015;
originally announced December 2015.
-
Learning Meta-Embeddings by Using Ensembles of Embedding Sets
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embeddin…
▽ More
Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embedding sets with the aim of learning meta-embeddings. Experiments on word similarity and analogy tasks and on part-of-speech tagging show better performance of meta-embeddings compared to individual embedding sets. One advantage of meta-embeddings is the increased vocabulary coverage. We will release our meta-embeddings publicly.
△ Less
Submitted 30 December, 2015; v1 submitted 18 August, 2015;
originally announced August 2015.
-
AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes
Authors:
Sascha Rothe,
Hinrich Schütze
Abstract:
We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resour…
▽ More
We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, but AutoExtend can be easily applied to other resources like Freebase. AutoExtend achieves state-of-the-art performance on word similarity and word sense disambiguation tasks.
△ Less
Submitted 4 July, 2015;
originally announced July 2015.
-
Distributional Models and Deep Learning Embeddings: Combining the Best of Both Worlds
Authors:
Irina Sergienya,
Hinrich Schütze
Abstract:
There are two main approaches to the distributed representation of words: low-dimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine these two approaches by learning embeddings based on distributional-model vectors - as opposed to one-hot vectors as is standardly done in deep learning. We sho…
▽ More
There are two main approaches to the distributed representation of words: low-dimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine these two approaches by learning embeddings based on distributional-model vectors - as opposed to one-hot vectors as is standardly done in deep learning. We show that the combined approach has better performance on a word relatedness judgment task.
△ Less
Submitted 18 February, 2014; v1 submitted 19 December, 2013;
originally announced December 2013.
-
Deep Learning Embeddings for Discontinuous Linguistic Units
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although a number of recent papers have extended this to other linguistic units like morphemes and phrases. In this paper, we argue that learning embeddings for discontinuous linguistic units should also be considered. In an experimental evaluation on co…
▽ More
Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although a number of recent papers have extended this to other linguistic units like morphemes and phrases. In this paper, we argue that learning embeddings for discontinuous linguistic units should also be considered. In an experimental evaluation on coreference resolution, we show that such embeddings perform better than word form embeddings.
△ Less
Submitted 19 December, 2013; v1 submitted 18 December, 2013;
originally announced December 2013.
-
Two SVDs produce more focal deep learning representations
Authors:
Hinrich Schuetze,
Christian Scheible
Abstract:
A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities. Much recent progress in the field has focused on efficient and effective methods for computing representations. In this paper, we propose an alternative method that is mor…
▽ More
A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities. Much recent progress in the field has focused on efficient and effective methods for computing representations. In this paper, we propose an alternative method that is more efficient than prior work and produces representations that have a property we call focality -- a property we hypothesize to be important for neural network representations. The method consists of a simple application of two consecutive SVDs and is inspired by Anandkumar (2012).
△ Less
Submitted 11 May, 2013; v1 submitted 16 January, 2013;
originally announced January 2013.
-
Cutting Recursive Autoencoder Trees
Authors:
Christian Scheible,
Hinrich Schuetze
Abstract:
Deep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an anal…
▽ More
Deep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an analysis of the Semi-Supervised Recursive Autoencoder, a well-known model that produces structural representations of text. We show that for certain tasks, the structure of the autoencoder can be significantly reduced without loss of classification accuracy and we evaluate the produced structures using human judgment.
△ Less
Submitted 26 April, 2013; v1 submitted 13 January, 2013;
originally announced January 2013.
-
Automatic Detection of Text Genre
Authors:
Brett Kessler,
Geoffrey Nunberg,
Hinrich Schuetze
Abstract:
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection ba…
▽ More
As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural properties.
△ Less
Submitted 8 July, 1997;
originally announced July 1997.
-
Distributional Part-of-Speech Tagging
Authors:
Hinrich Schuetze
Abstract:
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
△ Less
Submitted 8 March, 1995;
originally announced March 1995.