Search | arXiv e-print repository

Corpus-level Fine-grained Entity Ty** Using Contextual Information

Authors: Yadollah Yaghoobzadeh, Hinrich Schütze

Abstract: This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i)… ▽ More This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding-based and combines (i) a global model that scores based on aggregated contextual information of an entity and (ii) a context model that first scores the individual occurrences of an entity and then aggregates the scores. In our evaluation, FIGMENT strongly outperforms an approach to entity ty** that relies on relations obtained by an open information extraction system. △ Less

Submitted 25 June, 2016; originally announced June 2016.

Comments: Accepted at EMNLP2015, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

arXiv:1606.03391 [pdf, other]

Simple Question Answering by Attentive Convolutional Neural Network

Authors: Wenpeng Yin, Mo Yu, Bing Xiang, Bowen Zhou, Hinrich Schütze

Abstract: This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity menti… ▽ More This work focuses on answering single-relation factoid questions over Freebase. Each question can acquire the answer from a single fact of form (subject, predicate, object) in Freebase. This task, simple question answering (SimpleQA), can be addressed via a two-step pipeline: entity linking and fact selection. In fact selection, we match the subject entity in a fact candidate with the entity mention in the question by a character-level convolutional neural network (char-CNN), and match the predicate in that fact with the question by a word-level CNN (word-CNN). This work makes two main contributions. (i) A simple and effective entity linker over Freebase is proposed. Our entity linker outperforms the state-of-the-art entity linker over SimpleQA task. (ii) A novel attentive maxpooling is stacked over word-CNN, so that the predicate representation can be matched with the predicate-focused question representation more effectively. Experiments show that our system sets new state-of-the-art in this task. △ Less

Submitted 11 October, 2016; v1 submitted 10 June, 2016; originally announced June 2016.

Comments: Accepted as an oral long paper by COLING'2016

arXiv:1606.00589 [pdf, other]

Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

Authors: Katharina Kann, Hinrich Schütze

Abstract: Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low… ▽ More Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag. We propose a new way of modeling this task with neural encoder-decoder models. Our approach reduces the amount of required training data for this architecture and achieves state-of-the-art results, making encoder-decoder models applicable to morphological reinflection even for low-resource languages. We further present a new automatic correction method for the outputs based on edit trees. △ Less

Submitted 2 June, 2016; originally announced June 2016.

Comments: Accepted at ACL 2016

arXiv:1605.07333 [pdf, other]

Combining Recurrent and Convolutional Neural Networks for Relation Classification

Authors: Ngoc Thang Vu, Heike Adel, Pankaj Gupta, Hinrich Schütze

Abstract: This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks. For both models, we demonstrate the effect of different architectural choices. We present a new context representation for convolutional neural networks for relation classification (extended middle context). Furthermore, we propose connect… ▽ More This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks. For both models, we demonstrate the effect of different architectural choices. We present a new context representation for convolutional neural networks for relation classification (extended middle context). Furthermore, we propose connectionist bi-directional recurrent neural networks and introduce ranking loss for their optimization. Finally, we show that combining convolutional and recurrent neural networks using a simple voting scheme is accurate enough to improve results. Our neural models achieve state-of-the-art results on the SemEval 2010 relation classification task. △ Less

Submitted 24 May, 2016; originally announced May 2016.

Comments: NAACL 2016

arXiv:1604.06896 [pdf, other]

Why and How to Pay Different Attention to Phrase Alignments of Different Intensities

Authors: Wenpeng Yin, Hinrich Schütze

Abstract: This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible gran… ▽ More This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible granularity are alignments between two single words, between a single word and a phrase and between a short phrase and a long phrase. By intensity we roughly mean the degree of match, it ranges from identity over surface-form co-occurrence, rephrasing and other semantic relatedness to unrelated words as in lots of parenthesis text. Prior work (i) has limitations in phrase generation and representation, or (ii) conducts alignment at word and phrase levels by handcrafted features or (iii) utilizes a single attention mechanism over alignment intensities without considering the characteristics of specific tasks, which limits the system's effectiveness across tasks. We propose an architecture based on Gated Recurrent Unit that supports (i) representation learning of phrases of arbitrary granularity and (ii) task-specific focusing of phrase alignments between two sentences by attention pooling. Experimental results on TE and AS match our observation and are state-of-the-art. △ Less

Submitted 13 June, 2016; v1 submitted 23 April, 2016; originally announced April 2016.

Comments: 10 pages, 5 figures

arXiv:1604.00503 [pdf, ps, other]

Discriminative Phrase Embedding for Paraphrase Identification

Authors: Wenpeng Yin, Hinrich Schütze

Abstract: This work, concerning paraphrase identification task, on one hand contributes to expanding deep learning embeddings to include continuous and discontinuous linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN to learn the discriminative weights of words and phrases specific to paraphrase task, so that a weighted sum of embeddings can represent sentences more effectively.… ▽ More This work, concerning paraphrase identification task, on one hand contributes to expanding deep learning embeddings to include continuous and discontinuous linguistic phrases. On the other hand, it comes up with a new scheme TF-KLD-KNN to learn the discriminative weights of words and phrases specific to paraphrase task, so that a weighted sum of embeddings can represent sentences more effectively. Based on these two innovations we get competitive state-of-the-art performance on paraphrase identification. △ Less

Submitted 2 April, 2016; originally announced April 2016.

Comments: NAACL'2015

arXiv:1604.00502 [pdf, other]

Online Updating of Word Representations for Part-of-Speech Tagging

Authors: Wenpeng Yin, Tobias Schnabel, Hinrich Schütze

Abstract: We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA. We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA. △ Less

Submitted 2 April, 2016; originally announced April 2016.

Comments: EMNLP'2015. Released POS tagger "FLORS" for online domain adaptation

arXiv:1603.05157 [pdf, other]

Comparing Convolutional Neural Networks to Traditional Models for Slot Filling

Authors: Heike Adel, Benjamin Roth, Hinrich Schütze

Abstract: We address relation classification in the context of slot filling, the task of finding and evaluating fillers like "Steve Jobs" for the slot X in "X founded Apple". We propose a convolutional neural network which splits the input sentence into three parts according to the relation arguments and compare it to state-of-the-art and traditional approaches of relation classification. Finally, we combin… ▽ More We address relation classification in the context of slot filling, the task of finding and evaluating fillers like "Steve Jobs" for the slot X in "X founded Apple". We propose a convolutional neural network which splits the input sentence into three parts according to the relation arguments and compare it to state-of-the-art and traditional approaches of relation classification. Finally, we combine different methods and show that the combination is better than individual approaches. We also analyze the effect of genre differences on performance. △ Less

Submitted 4 April, 2016; v1 submitted 16 March, 2016; originally announced March 2016.

Comments: NAACL 2016

arXiv:1603.04513 [pdf, other]

Multichannel Variable-Size Convolution for Sentence Classification

Authors: Wenpeng Yin, Hinrich Schütze

Abstract: We propose MVCNN, a convolution neural network (CNN) architecture for sentence classification. It (i) combines diverse versions of pretrained word embeddings and (ii) extracts features of multigranular phrases with variable-size convolution filters. We also show that pretraining MVCNN is critical for good performance. MVCNN achieves state-of-the-art performance on four tasks: on small-scale binary… ▽ More We propose MVCNN, a convolution neural network (CNN) architecture for sentence classification. It (i) combines diverse versions of pretrained word embeddings and (ii) extracts features of multigranular phrases with variable-size convolution filters. We also show that pretraining MVCNN is critical for good performance. MVCNN achieves state-of-the-art performance on four tasks: on small-scale binary, small-scale multi-class and largescale Twitter sentiment prediction and on subjectivity classification. △ Less

Submitted 14 March, 2016; originally announced March 2016.

Comments: in Proceeding of CoNLL2015

arXiv:1602.07572 [pdf, other]

doi 10.18653/v1/N16-1091

Ultradense Word Embeddings by Orthogonal Transformation

Authors: Sascha Rothe, Sebastian Ebert, Hinrich Schütze

Abstract: Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER… ▽ More Embeddings are generic representations that are useful for many NLP tasks. In this paper, we introduce DENSIFIER, a method that learns an orthogonal transformation of the embedding space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We show that ultradense embeddings generated by DENSIFIER reach state of the art on a lexicon creation task in which words are annotated with three types of lexical information - sentiment, concreteness and frequency. On the SemEval2015 10B sentiment analysis task we show that no information is lost when the ultradense subspace is used, but training is an order of magnitude more efficient due to the compactness of the ultradense space. △ Less

Submitted 8 May, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

arXiv:1602.04341 [pdf, other]

Attention-Based Convolutional Neural Network for Machine Comprehension

Authors: Wenpeng Yin, Sebastian Ebert, Hinrich Schütze

Abstract: Understanding open-domain text is one of the primary challenges in natural language processing (NLP). Machine comprehension benchmarks evaluate the system's ability to understand text based on the text content only. In this work, we investigate machine comprehension on MCTest, a question answering (QA) benchmark. Prior work is mainly based on feature engineering approaches. We come up with a neura… ▽ More Understanding open-domain text is one of the primary challenges in natural language processing (NLP). Machine comprehension benchmarks evaluate the system's ability to understand text based on the text content only. In this work, we investigate machine comprehension on MCTest, a question answering (QA) benchmark. Prior work is mainly based on feature engineering approaches. We come up with a neural network framework, named hierarchical attention-based convolutional neural network (HABCNN), to address this task without any manually designed features. Specifically, we explore HABCNN for this task by two routes, one is through traditional joint modeling of passage, question and answer, one is through textual entailment. HABCNN employs an attention mechanism to detect key phrases, key sentences and key snippets that are relevant to answering the question. Experiments show that HABCNN outperforms prior deep learning approaches by a big margin. △ Less

Submitted 13 February, 2016; originally announced February 2016.

Comments: 7 pages, 4 figures

arXiv:1512.05193 [pdf, other]

ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs

Authors: Wenpeng Yin, Hinrich Schütze, Bing Xiang, Bowen Zhou

Abstract: How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, t… ▽ More How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNN achieves state-of-the-art performance on AS, PI and TE tasks. △ Less

Submitted 25 June, 2018; v1 submitted 16 December, 2015; originally announced December 2015.

Comments: TACL Camera-ready

arXiv:1508.04257 [pdf, other]

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Authors: Wenpeng Yin, Hinrich Schütze

Abstract: Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embeddin… ▽ More Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embedding sets with the aim of learning meta-embeddings. Experiments on word similarity and analogy tasks and on part-of-speech tagging show better performance of meta-embeddings compared to individual embedding sets. One advantage of meta-embeddings is the increased vocabulary coverage. We will release our meta-embeddings publicly. △ Less

Submitted 30 December, 2015; v1 submitted 18 August, 2015; originally announced August 2015.

Comments: 10 pages, 6 figures

arXiv:1507.01127 [pdf, other]

doi 10.3115/v1/P15-1173

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes

Authors: Sascha Rothe, Hinrich Schütze

Abstract: We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resour… ▽ More We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, but AutoExtend can be easily applied to other resources like Freebase. AutoExtend achieves state-of-the-art performance on word similarity and word sense disambiguation tasks. △ Less

Submitted 4 July, 2015; originally announced July 2015.

arXiv:1312.5559 [pdf, ps, other]

Distributional Models and Deep Learning Embeddings: Combining the Best of Both Worlds

Authors: Irina Sergienya, Hinrich Schütze

Abstract: There are two main approaches to the distributed representation of words: low-dimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine these two approaches by learning embeddings based on distributional-model vectors - as opposed to one-hot vectors as is standardly done in deep learning. We sho… ▽ More There are two main approaches to the distributed representation of words: low-dimensional deep learning embeddings and high-dimensional distributional models, in which each dimension corresponds to a context word. In this paper, we combine these two approaches by learning embeddings based on distributional-model vectors - as opposed to one-hot vectors as is standardly done in deep learning. We show that the combined approach has better performance on a word relatedness judgment task. △ Less

Submitted 18 February, 2014; v1 submitted 19 December, 2013; originally announced December 2013.

Comments: 4 pages, 1 table, ICLR Workshop; main experimental table was extended with more experimental results; related word added

ACM Class: I.2.6; I.2.7

arXiv:1312.5129 [pdf, ps, other]

Deep Learning Embeddings for Discontinuous Linguistic Units

Authors: Wenpeng Yin, Hinrich Schütze

Abstract: Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although a number of recent papers have extended this to other linguistic units like morphemes and phrases. In this paper, we argue that learning embeddings for discontinuous linguistic units should also be considered. In an experimental evaluation on co… ▽ More Deep learning embeddings have been successfully used for many natural language processing problems. Embeddings are mostly computed for word forms although a number of recent papers have extended this to other linguistic units like morphemes and phrases. In this paper, we argue that learning embeddings for discontinuous linguistic units should also be considered. In an experimental evaluation on coreference resolution, we show that such embeddings perform better than word form embeddings. △ Less

Submitted 19 December, 2013; v1 submitted 18 December, 2013; originally announced December 2013.

arXiv:1301.3627 [pdf, ps, other]

Two SVDs produce more focal deep learning representations

Authors: Hinrich Schuetze, Christian Scheible

Abstract: A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities. Much recent progress in the field has focused on efficient and effective methods for computing representations. In this paper, we propose an alternative method that is mor… ▽ More A key characteristic of work on deep learning and neural networks in general is that it relies on representations of the input that support generalization, robust inference, domain adaptation and other desirable functionalities. Much recent progress in the field has focused on efficient and effective methods for computing representations. In this paper, we propose an alternative method that is more efficient than prior work and produces representations that have a property we call focality -- a property we hypothesize to be important for neural network representations. The method consists of a simple application of two consecutive SVDs and is inspired by Anandkumar (2012). △ Less

Submitted 11 May, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

arXiv:1301.2811 [pdf, other]

Cutting Recursive Autoencoder Trees

Authors: Christian Scheible, Hinrich Schuetze

Abstract: Deep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an anal… ▽ More Deep Learning models enjoy considerable success in Natural Language Processing. While deep architectures produce useful representations that lead to improvements in various tasks, they are often difficult to interpret. This makes the analysis of learned structures particularly difficult. In this paper, we rely on empirical tests to see whether a particular structure makes sense. We present an analysis of the Semi-Supervised Recursive Autoencoder, a well-known model that produces structural representations of text. We show that for certain tasks, the structure of the autoencoder can be significantly reduced without loss of classification accuracy and we evaluate the produced structures using human judgment. △ Less

Submitted 26 April, 2013; v1 submitted 13 January, 2013; originally announced January 2013.

arXiv:cmp-lg/9707002 [pdf, ps, other]

Automatic Detection of Text Genre

Authors: Brett Kessler, Geoffrey Nunberg, Hinrich Schuetze

Abstract: As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection ba… ▽ More As the text databases available to users become larger and more heterogeneous, genre becomes increasingly important for computational linguistics as a complement to topical and structural principles of classification. We propose a theory of genres as bundles of facets, which correlate with various surface cues, and argue that genre detection based on surface cues is as successful as detection based on deeper structural properties. △ Less

Submitted 8 July, 1997; originally announced July 1997.

Comments: 7 pages

Journal ref: Proceedings ACL/EACL 1997, Madrid, p. 32-38

arXiv:cmp-lg/9503009 [pdf, ps, other]

Distributional Part-of-Speech Tagging

Authors: Hinrich Schuetze

Abstract: This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. △ Less

Submitted 8 March, 1995; originally announced March 1995.

Comments: 8 pages

Journal ref: EACL 95

Showing 201–220 of 220 results for author: Schütze, H