-
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings
Authors:
Yadollah Yaghoobzadeh,
Katharina Kann,
Timothy J. Hazen,
Eneko Agirre,
Hinrich Schütze
Abstract:
Word embeddings typically represent different meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia annotations and word senses, where word senses fro…
▽ More
Word embeddings typically represent different meanings of a word in a single conflated vector. Empirical analysis of embeddings of ambiguous words is currently limited by the small size of manually annotated resources and by the fact that word senses are treated as unrelated individual concepts. We present a large dataset based on manual Wikipedia annotations and word senses, where word senses from different words are related by semantic classes. This is the basis for novel diagnostic tests for an embedding's content: we probe word embeddings for semantic classes and analyze the embedding space by classifying embeddings into semantic classes. Our main findings are: (i) Information about a sense is generally represented well in a single-vector embedding - if the sense is frequent. (ii) A classifier can accurately predict whether a word is single-sense or multi-sense, based only on its embedding. (iii) Although rare senses are not well represented in single-vector embeddings, this does not have negative impact on an NLP application whose performance depends on frequent senses.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
A Hierarchical Decoder with Three-level Hierarchical Attention to Generate Abstractive Summaries of Interleaved Texts
Authors:
Sanjeev Kumar Karn,
Francine Chen,
Yan-Ying Chen,
Ulli Waltinger,
Hinrich Schütze
Abstract:
Interleaved texts, where posts belonging to different threads occur in one sequence, are a common occurrence, e.g., online chat conversations. To quickly obtain an overview of such texts, existing systems first disentangle the posts by threads and then extract summaries from those threads. The major issues with such systems are error propagation and non-fluent summary. To address those, we propose…
▽ More
Interleaved texts, where posts belonging to different threads occur in one sequence, are a common occurrence, e.g., online chat conversations. To quickly obtain an overview of such texts, existing systems first disentangle the posts by threads and then extract summaries from those threads. The major issues with such systems are error propagation and non-fluent summary. To address those, we propose an end-to-end trainable hierarchical encoder-decoder system. We also introduce a novel hierarchical attention mechanism which combines three levels of information from an interleaved text, i.e, posts, phrases and words, and implicitly disentangles the threads. We evaluated the proposed system on multiple interleaved text datasets, and it out-performs a SOTA two-step system by 20-40%.
△ Less
Submitted 9 April, 2020; v1 submitted 5 June, 2019;
originally announced June 2019.
-
SherLIiC: A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
Authors:
Martin Schmitt,
Hinrich Schütze
Abstract:
We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency…
▽ More
We present SherLIiC, a testbed for lexical inference in context (LIiC), consisting of 3985 manually annotated inference rule candidates (InfCands), accompanied by (i) ~960k unlabeled InfCands, and (ii) ~190k typed textual relations between Freebase entities extracted from the large entity-linked corpus ClueWeb09. Each InfCand consists of one of these relations, expressed as a lemmatized dependency path, and two argument placeholders, each linked to one or more Freebase types. Due to our candidate selection process based on strong distributional evidence, SherLIiC is much harder than existing testbeds because distributional evidence is of little utility in the classification of InfCands. We also show that, due to its construction, many of SherLIiC's correct InfCands are novel and missing from existing rule bases. We evaluate a number of strong baselines on SherLIiC, ranging from semantic vector space models to state of the art neural models of natural language inference (NLI). We show that SherLIiC poses a tough challenge to existing NLI systems.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Fine-Grained Argument Unit Recognition and Classification
Authors:
Dietrich Trautmann,
Johannes Daxenberger,
Christian Stab,
Hinrich Schütze,
Iryna Gurevych
Abstract:
Prior work has commonly defined argument retrieval from heterogeneous document collections as a sentence-level classification task. Consequently, argument retrieval suffers both from low recall and from sentence segmentation errors making it difficult for humans and machines to consume the arguments. In this work, we argue that the task should be performed on a more fine-grained level of sequence…
▽ More
Prior work has commonly defined argument retrieval from heterogeneous document collections as a sentence-level classification task. Consequently, argument retrieval suffers both from low recall and from sentence segmentation errors making it difficult for humans and machines to consume the arguments. In this work, we argue that the task should be performed on a more fine-grained level of sequence labeling. For this, we define the task as Argument Unit Recognition and Classification (AURC). We present a dataset of arguments from heterogeneous sources annotated as spans of tokens within a sentence, as well as with a corresponding stance. We show that and how such difficult argument annotations can be effectively collected through crowdsourcing with high interannotator agreement. The new benchmark, AURC-8, contains up to 15% more arguments per topic as compared to annotations on the sentence level. We identify a number of methods targeted at AURC sequence labeling, achieving close to human performance on known domains. Further analysis also reveals that, contrary to previous approaches, our methods are more robust against sentence segmentation errors. We publicly release our code and the AURC-8 dataset.
△ Less
Submitted 21 November, 2019; v1 submitted 21 April, 2019;
originally announced April 2019.
-
An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing
Authors:
Martin Schmitt,
Sahand Sharifzadeh,
Volker Tresp,
Hinrich Schütze
Abstract:
Knowledge graphs (KGs) can vary greatly from one domain to another. Therefore supervised approaches to both graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data; at the same time, adapting a model trained on a different domain is often impossible due to little or no overlap in entities and…
▽ More
Knowledge graphs (KGs) can vary greatly from one domain to another. Therefore supervised approaches to both graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data; at the same time, adapting a model trained on a different domain is often impossible due to little or no overlap in entities and relations. This situation calls for an approach that (1) does not need large amounts of annotated data and thus (2) does not need to rely on domain adaptation techniques to work well in different domains. To this end, we present the first approach to unsupervised text generation from KGs and show simultaneously how it can be used for unsupervised semantic parsing. We evaluate our approach on WebNLG v2.1 and a new benchmark leveraging scene graphs from Visual Genome. Our system outperforms strong baselines for both text$\leftrightarrow$graph conversion tasks without any manual adaptation from one dataset to the other. In additional experiments, we investigate the impact of using different unsupervised objectives.
△ Less
Submitted 17 November, 2020; v1 submitted 20 April, 2019;
originally announced April 2019.
-
Analytical Methods for Interpretable Ultradense Word Embeddings
Authors:
Philipp Dufter,
Hinrich Schütze
Abstract:
Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we…
▽ More
Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. In contrast to Densifier, DensRay can be computed in closed form, is hyperparameter-free and thus more robust than Densifier. We evaluate the three methods on lexicon induction and set-based word analogy. In addition we provide qualitative insights as to how interpretable word spaces can be used for removing gender bias from embeddings.
△ Less
Submitted 13 September, 2019; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking
Authors:
Timo Schick,
Hinrich Schütze
Abstract:
Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a me…
▽ More
Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does indeed substantially improve its understanding of rare words.
△ Less
Submitted 4 December, 2019; v1 submitted 14 April, 2019;
originally announced April 2019.
-
Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts
Authors:
Timo Schick,
Hinrich Schütze
Abstract:
Learning high-quality embeddings for rare words is a hard problem because of sparse context information. Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words. In this paper, we introduce attentive…
▽ More
Learning high-quality embeddings for rare words is a hard problem because of sparse context information. Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words. In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word's surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding. In an evaluation on four tasks, we show that attentive mimicking outperforms previous work for both rare and medium-frequency words. Thus, compared to previous work, attentive mimicking improves embeddings for a much larger part of the vocabulary, including the medium-frequency range.
△ Less
Submitted 5 April, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
Learning Semantic Representations for Novel Words: Leveraging Both Form and Context
Authors:
Timo Schick,
Hinrich Schütze
Abstract:
Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this in…
▽ More
Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this induced embedding space. Currently, two approaches for learning embeddings of novel words exist: (i) learning an embedding from the novel word's surface-form (e.g., subword n-grams) and (ii) learning an embedding from the context in which it occurs. In this paper, we propose an architecture that leverages both sources of information - surface-form and context - and show that it results in large increases in embedding quality. Our architecture obtains state-of-the-art results on the Definitional Nonce and Contextual Rare Words datasets. As input, we only require an embedding set and an unlabeled corpus for training our architecture to produce embeddings appropriate for the induced embedding space. Thus, our model can easily be integrated into any existing NLP system and enhance its capability to handle novel words.
△ Less
Submitted 9 November, 2018;
originally announced November 2018.
-
CIS at TAC Cold Start 2015: Neural Networks and Coreference Resolution for Slot Filling
Authors:
Heike Adel,
Hinrich Schütze
Abstract:
This paper describes the CIS slot filling system for the TAC Cold Start evaluations 2015. It extends and improves the system we have built for the evaluation last year. This paper mainly describes the changes to our last year's system. Especially, it focuses on the coreference and classification component. For coreference, we have performed several analysis and prepared a resource to simplify our…
▽ More
This paper describes the CIS slot filling system for the TAC Cold Start evaluations 2015. It extends and improves the system we have built for the evaluation last year. This paper mainly describes the changes to our last year's system. Especially, it focuses on the coreference and classification component. For coreference, we have performed several analysis and prepared a resource to simplify our end-to-end system and improve its runtime. For classification, we propose to use neural networks. We have trained convolutional and recurrent neural networks and combined them with traditional evaluation methods, namely patterns and support vector machines. Our runs for the 2015 evaluation have been designed to directly assess the effect of each network on the end-to-end performance of the system. The CIS system achieved rank 3 of all slot filling systems participating in the task.
△ Less
Submitted 6 November, 2018;
originally announced November 2018.
-
Multilingual Embeddings Jointly Induced from Contexts and Concepts: Simple, Strong and Scalable
Authors:
Philipp Dufter,
Mengjie Zhao,
Hinrich Schütze
Abstract:
Word embeddings induced from local context are prevalent in NLP. A simple and effective context-based multilingual embedding learner is Levy et al. (2017)'s S-ID (sentence ID) method. Another line of work induces high-performing multilingual embeddings from concepts (Dufter et al., 2018). In this paper, we propose Co+Co, a simple and scalable method that combines context-based and concept-based le…
▽ More
Word embeddings induced from local context are prevalent in NLP. A simple and effective context-based multilingual embedding learner is Levy et al. (2017)'s S-ID (sentence ID) method. Another line of work induces high-performing multilingual embeddings from concepts (Dufter et al., 2018). In this paper, we propose Co+Co, a simple and scalable method that combines context-based and concept-based learning. From a sentence aligned corpus, concepts are extracted via sampling; words are then associated with their concept ID and sentence ID in embedding learning. This is the first work that successfully combines context-based and concept-based embedding learning. We show that Co+Co performs well for two different application scenarios: the Parallel Bible Corpus (1000+ languages, low-resource) and EuroParl (12 languages, high-resource). Among methods applicable to both corpora, Co+Co performs best in our evaluation setup of six tasks.
△ Less
Submitted 30 April, 2020; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Aligning Very Small Parallel Corpora Using Cross-Lingual Word Embeddings and a Monogamy Objective
Authors:
Nina Poerner,
Masoud Jalili Sabet,
Benjamin Roth,
Hinrich Schütze
Abstract:
Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method…
▽ More
Count-based word alignment methods, such as the IBM models or fast-align, struggle on very small parallel corpora. We therefore present an alternative approach based on cross-lingual word embeddings (CLWEs), which are trained on purely monolingual data. Our main contribution is an unsupervised objective to adapt CLWEs to parallel corpora. In experiments on between 25 and 500 sentences, our method outperforms fast-align. We also show that our fine-tuning objective consistently improves a CLWE-only baseline.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
Multi-Multi-View Learning: Multilingual and Multi-Representation Entity Ty**
Authors:
Yadollah Yaghoobzadeh,
Hinrich Schütze
Abstract:
Knowledge bases (KBs) are paramount in NLP. We employ multiview learning for increasing accuracy and coverage of entity type information in KBs. We rely on two metaviews: language and representation. For language, we consider high-resource and low-resource languages from Wikipedia. For representation, we consider representations based on the context distribution of the entity (i.e., on its embeddi…
▽ More
Knowledge bases (KBs) are paramount in NLP. We employ multiview learning for increasing accuracy and coverage of entity type information in KBs. We rely on two metaviews: language and representation. For language, we consider high-resource and low-resource languages from Wikipedia. For representation, we consider representations based on the context distribution of the entity (i.e., on its embedding), on the entity's name (i.e., on its surface form) and on its description in Wikipedia. The two metaviews language and representation can be freely combined: each pair of language and representation (e.g., German embedding, English description, Spanish name) is a distinct view. Our experiments on entity ty** with fine-grained classes demonstrate the effectiveness of multiview learning. We release MVET, a large multiview - and, in particular, multilingual - entity ty** dataset we created. Mono- and multilingual fine-grained entity ty** systems can be evaluated on this dataset.
△ Less
Submitted 24 October, 2018;
originally announced October 2018.
-
Neural Relation Extraction Within and Across Sentence Boundaries
Authors:
Pankaj Gupta,
Subburam Rajaram,
Hinrich Schütze,
Bernt Andrassy,
Thomas Runkler
Abstract:
Past work in relation extraction mostly focuses on binary relation between entity pairs within single sentence. Recently, the NLP community has gained interest in relation extraction in entity pairs spanning multiple sentences. In this paper, we propose a novel architecture for this task: inter-sentential dependency-based neural networks (iDepNN). iDepNN models the shortest and augmented dependenc…
▽ More
Past work in relation extraction mostly focuses on binary relation between entity pairs within single sentence. Recently, the NLP community has gained interest in relation extraction in entity pairs spanning multiple sentences. In this paper, we propose a novel architecture for this task: inter-sentential dependency-based neural networks (iDepNN). iDepNN models the shortest and augmented dependency paths via recurrent and recursive neural networks to extract relationships within (intra-) and across (inter-) sentence boundaries. Compared to SVM and neural network baselines, iDepNN is more robust to false positives in relationships spanning sentences.
We evaluate our models on four datasets from newswire (MUC6) and medical (BioNLP shared task) domains that achieve state-of-the-art performance and show a better balance in precision and recall for inter-sentential relationships. We perform better than 11 teams participating in the BioNLP shared task 2016 and achieve a gain of 5.2% (0.587 vs 0.558) in F1 over the winning team. We also release the crosssentence annotations for MUC6.
△ Less
Submitted 14 January, 2019; v1 submitted 11 October, 2018;
originally announced October 2018.
-
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior
Authors:
Pankaj Gupta,
Yatin Chaudhary,
Florian Buettner,
Hinrich Schütze
Abstract:
We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a "bag-of-word" and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representatio…
▽ More
We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a "bag-of-word" and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic map** on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe.
We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.
△ Less
Submitted 23 February, 2019; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Neural Transductive Learning and Beyond: Morphological Generation in the Minimal-Resource Setting
Authors:
Katharina Kann,
Hinrich Schütze
Abstract:
Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are…
▽ More
Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets. We address paradigm completion, the morphological task of, given a partial paradigm, generating all missing forms. We propose two new methods for the minimal-resource setting: (i) Paradigm transduction: Since we assume only few paradigms available for training, neural seq2seq models are able to capture relationships between paradigm cells, but are tied to the idiosyncracies of the training set. Paradigm transduction mitigates this problem by exploiting the input subset of inflected forms at test time. (ii) Source selection with high precision (SHIP): Multi-source models which learn to automatically select one or multiple sources to predict a target inflection do not perform well in the minimal-resource setting. SHIP is an alternative to identify a reliable source if training data is limited. On a 52-language benchmark dataset, we outperform the previous state of the art by up to 9.71% absolute accuracy.
△ Less
Submitted 9 May, 2019; v1 submitted 23 September, 2018;
originally announced September 2018.
-
Interpretable Textual Neuron Representations for NLP
Authors:
Nina Poerner,
Benjamin Roth,
Hinrich Schütze
Abstract:
Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlig…
▽ More
Input optimization methods, such as Google Deep Dream, create interpretable representations of neurons for computer vision DNNs. We propose and evaluate ways of transferring this technology to NLP. Our results suggest that gradient ascent with a gumbel softmax layer produces n-gram representations that outperform naive corpus search in terms of target neuron activation. The representations highlight differences in syntax awareness between the language and visual models of the Imaginet architecture.
△ Less
Submitted 19 September, 2018;
originally announced September 2018.
-
Document Informed Neural Autoregressive Topic Models with Distributional Prior
Authors:
Pankaj Gupta,
Yatin Chaudhary,
Florian Buettner,
Hinrich Schütze
Abstract:
We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts "artificial neural networks" vs. "biological neuron networks". Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full c…
▽ More
We address two challenges in topic models: (1) Context information around words helps in determining their actual meaning, e.g., "networks" used in the contexts "artificial neural networks" vs. "biological neuron networks". Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. The proposed model is named as iDocNADE. (2) Due to the small number of word occurrences (i.e., lack of context) in short text and data sparsity in a corpus of few documents, the application of topic models is challenging on such texts. Therefore, we propose a simple and efficient way of incorporating external knowledge into neural autoregressive topic models: we use embeddings as a distributional prior. The proposed variants are named as DocNADEe and iDocNADEe.
We present novel neural autoregressive topic model variants that consistently outperform state-of-the-art generative topic models in terms of generalization, interpretability (topic coherence) and applicability (retrieval and classification) over 7 long-text and 8 short-text datasets from diverse domains.
△ Less
Submitted 14 January, 2019; v1 submitted 15 September, 2018;
originally announced September 2018.
-
Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging
Authors:
Apostolos Kemos,
Heike Adel,
Hinrich Schütze
Abstract:
Character-level models of tokens have been shown to be effective at dealing with within-token noise and out-of-vocabulary words. But these models still rely on correct token boundaries. In this paper, we propose a novel end-to-end character-level model and demonstrate its effectiveness in multilingual settings and when token boundaries are noisy. Our model is a semi-Markov conditional random field…
▽ More
Character-level models of tokens have been shown to be effective at dealing with within-token noise and out-of-vocabulary words. But these models still rely on correct token boundaries. In this paper, we propose a novel end-to-end character-level model and demonstrate its effectiveness in multilingual settings and when token boundaries are noisy. Our model is a semi-Markov conditional random field with neural networks for character and segment representation. It requires no tokenizer. The model matches state-of-the-art baselines for various languages and significantly outperforms them on a noisy English version of a part-of-speech tagging benchmark dataset. Our code and the noisy dataset are publicly available at http://cistern.cis.lmu.de/semiCRF.
△ Less
Submitted 2 January, 2020; v1 submitted 13 August, 2018;
originally announced August 2018.
-
Document Informed Neural Autoregressive Topic Models
Authors:
Pankaj Gupta,
Florian Buettner,
Hinrich Schütze
Abstract:
Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document i…
▽ More
Context information around words helps in determining their actual meaning, for example "networks" used in contexts of artificial neural networks or biological neuron networks. Generative topic models infer topic-word distributions, taking no or only little context into account. Here, we extend a neural autoregressive topic model to exploit the full context information around words in a document in a language modeling fashion. This results in an improved performance in terms of generalization, interpretability and applicability. We apply our modeling approach to seven data sets from various domains and demonstrate that our approach consistently outperforms stateof-the-art generative topic models. With the learned representations, we show on an average a gain of 9.6% (0.57 Vs 0.52) in precision at retrieval fraction 0.02 and 7.2% (0.582 Vs 0.543) in F1 for text categorization.
△ Less
Submitted 11 August, 2018;
originally announced August 2018.
-
LISA: Explaining Recurrent Neural Network Judgments via Layer-wIse Semantic Accumulation and Example to Pattern Transformation
Authors:
Pankaj Gupta,
Hinrich Schütze
Abstract:
Recurrent neural networks (RNNs) are temporal networks and cumulative in nature that have shown promising results in various natural language processing tasks. Despite their success, it still remains a challenge to understand their hidden behavior. In this work, we analyze and interpret the cumulative nature of RNN via a proposed technique named as Layer-wIse-Semantic-Accumulation (LISA) for expla…
▽ More
Recurrent neural networks (RNNs) are temporal networks and cumulative in nature that have shown promising results in various natural language processing tasks. Despite their success, it still remains a challenge to understand their hidden behavior. In this work, we analyze and interpret the cumulative nature of RNN via a proposed technique named as Layer-wIse-Semantic-Accumulation (LISA) for explaining decisions and detecting the most likely (i.e., saliency) patterns that the network relies on while decision making. We demonstrate (1) LISA: "How an RNN accumulates or builds semantics during its sequential processing for a given text example and expected response" (2) Example2pattern: "How the saliency patterns look like for each category in the data according to the network in decision making". We analyse the sensitiveness of RNNs about different inputs to check the increase or decrease in prediction scores and further extract the saliency patterns learned by the network. We employ two relation classification datasets: SemEval 10 Task 8 and TAC KBP Slot Filling to explain RNN predictions via the LISA and example2pattern.
△ Less
Submitted 5 August, 2018;
originally announced August 2018.
-
News Article Teaser Tweets and How to Generate Them
Authors:
Sanjeev Kumar Karn,
Mark Buckley,
Ulli Waltinger,
Hinrich Schütze
Abstract:
In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read particular news items. Teasers are one of the main vehicles for transmitting news to social media…
▽ More
In this work, we define the task of teaser generation and provide an evaluation benchmark and baseline systems for the process of generating teasers. A teaser is a short reading suggestion for an article that is illustrative and includes curiosity-arousing elements to entice potential readers to read particular news items. Teasers are one of the main vehicles for transmitting news to social media users. We compile a novel dataset of teasers by systematically accumulating tweets and selecting those that conform to the teaser definition. We have compared a number of neural abstractive architectures on the task of teaser generation and the overall best performing system is See et al.(2017)'s seq2seq with pointer network.
△ Less
Submitted 18 April, 2019; v1 submitted 30 July, 2018;
originally announced July 2018.
-
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Ty**
Authors:
Yadollah Yaghoobzadeh,
Katharina Kann,
Hinrich Schütze
Abstract:
Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embed…
▽ More
Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name ty**: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence context
△ Less
Submitted 18 July, 2018;
originally announced July 2018.
-
Adaptive Hierarchical Sensing for the Efficient Sampling of Sparse and Compressible Signals
Authors:
Henry Schütze,
Erhardt Barth,
Thomas Martinetz
Abstract:
We present the novel adaptive hierarchical sensing algorithm K-AHS, which samples sparse or compressible signals with a measurement complexity equal to that of Compressed Sensing (CS). In contrast to CS, K-AHS is adaptive as sensing vectors are selected while sampling, depending on previous measurements. Prior to sampling, the user chooses a transform domain in which the signal of interest is spar…
▽ More
We present the novel adaptive hierarchical sensing algorithm K-AHS, which samples sparse or compressible signals with a measurement complexity equal to that of Compressed Sensing (CS). In contrast to CS, K-AHS is adaptive as sensing vectors are selected while sampling, depending on previous measurements. Prior to sampling, the user chooses a transform domain in which the signal of interest is sparse. The corresponding transform determines the collection of sensing vectors. K-AHS gradually refines initial coarse measurements to significant signal coefficients in the sparse transform domain based on a sensing tree which provides a natural hierarchy of sensing vectors. K-AHS directly provides significant signal coefficients in the sparse transform domain and does not require a reconstruction stage based on inverse optimization. Therefore, the K-AHS sensing vectors must not satisfy any incoherence or restricted isometry property. A mathematical analysis proves the sampling complexity of K-AHS as well as a general and sufficient condition for sampling the optimal k-term approximation, which is applied to particular signal models. The analytical findings are supported by simulations with synthetic signals and real world images. On standard benchmark images, K-AHS achieves lower reconstruction errors than CS.
△ Less
Submitted 14 July, 2018;
originally announced July 2018.
-
Replicated Siamese LSTM in Ticketing System for Similarity Learning and Retrieval in Asymmetric Texts
Authors:
Pankaj Gupta,
Bernt Andrassy,
Hinrich Schütze
Abstract:
The goal of our industrial ticketing system is to retrieve a relevant solution for an input query, by matching with historical tickets stored in knowledge base. A query is comprised of subject and description, while a historical ticket consists of subject, description and solution. To retrieve a relevant solution, we use textual similarity paradigm to learn similarity in the query and historical t…
▽ More
The goal of our industrial ticketing system is to retrieve a relevant solution for an input query, by matching with historical tickets stored in knowledge base. A query is comprised of subject and description, while a historical ticket consists of subject, description and solution. To retrieve a relevant solution, we use textual similarity paradigm to learn similarity in the query and historical tickets. The task is challenging due to significant term mismatch in the query and ticket pairs of asymmetric lengths, where subject is a short text but description and solution are multi-sentence texts. We present a novel Replicated Siamese LSTM model to learn similarity in asymmetric text pairs, that gives 22% and 7% gain (Accuracy@10) for retrieval task, respectively over unsupervised and supervised baselines. We also show that the topic and distributed semantic features for short and long texts improved both similarity learning and retrieval.
△ Less
Submitted 8 July, 2018;
originally announced July 2018.
-
Recurrent One-Hop Predictions for Reasoning over Knowledge Graphs
Authors:
Wenpeng Yin,
Yadollah Yaghoobzadeh,
Hinrich Schütze
Abstract:
Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. Reasoning over multi-hop (mh) KG paths is thus an important capability that is needed for question answering or other NLP tasks that require knowledge about the world. mh-KG reasoning includes diverse scenarios, e.g., given a head entity and a relation path, predict the tail entity; or given two entities connected by som…
▽ More
Large scale knowledge graphs (KGs) such as Freebase are generally incomplete. Reasoning over multi-hop (mh) KG paths is thus an important capability that is needed for question answering or other NLP tasks that require knowledge about the world. mh-KG reasoning includes diverse scenarios, e.g., given a head entity and a relation path, predict the tail entity; or given two entities connected by some relation paths, predict the unknown relation between them. We present ROPs, recurrent one-hop predictors, that predict entities at each step of mh-KB paths by using recurrent neural networks and vector representations of entities and relations, with two benefits: (i) modeling mh-paths of arbitrary lengths while updating the entity and relation representations by the training signal at each step; (ii) handling different types of mh-KG reasoning in a unified framework. Our models show state-of-the-art for two important multi-hop KG reasoning tasks: Knowledge Base Completion and Path Query Answering.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Joint Bootstrap** Machines for High Confidence Relation Extraction
Authors:
Pankaj Gupta,
Benjamin Roth,
Hinrich Schütze
Abstract:
Semi-supervised bootstrap** techniques for relationship extraction from text iteratively expand a set of initial seed instances. Due to the lack of labeled data, a key challenge in bootstrap** is semantic drift: if a false positive instance is added during an iteration, then all following iterations are contaminated. We introduce BREX, a new bootstrap** method that protects against such cont…
▽ More
Semi-supervised bootstrap** techniques for relationship extraction from text iteratively expand a set of initial seed instances. Due to the lack of labeled data, a key challenge in bootstrap** is semantic drift: if a false positive instance is added during an iteration, then all following iterations are contaminated. We introduce BREX, a new bootstrap** method that protects against such contamination by highly effective confidence assessment. This is achieved by using entity and template seeds jointly (as opposed to just one as in previous work), by expanding entities and templates in parallel and in a mutually constraining fashion in each iteration and by introducing higherquality similarity measures for templates. Experimental results show that BREX achieves an F1 that is 0.13 (0.87 vs. 0.74) better than the state of the art for four relationships.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
Authors:
Wenpeng Yin,
Hinrich Schütze,
Dan Roth
Abstract:
This work deals with SciTail, a natural entailment challenge derived from a multi-choice question answering problem. The premises and hypotheses in SciTail were generated with no awareness of each other, and did not specifically aim at the entailment task. This makes it more challenging than other entailment data sets and more directly useful to the end-task -- question answering. We propose DEIST…
▽ More
This work deals with SciTail, a natural entailment challenge derived from a multi-choice question answering problem. The premises and hypotheses in SciTail were generated with no awareness of each other, and did not specifically aim at the entailment task. This makes it more challenging than other entailment data sets and more directly useful to the end-task -- question answering. We propose DEISTE (deep explorations of inter-sentence interactions for textual entailment) for this entailment task. Given word-to-word interactions between the premise-hypothesis pair ($P$, $H$), DEISTE consists of: (i) a parameter-dynamic convolution to make important words in $P$ and $H$ play a dominant role in learnt representations; and (ii) a position-aware attentive convolution to encode the representation and position information of the aligned word pairs. Experiments show that DEISTE gets $\approx$5\% improvement over prior state of the art and that the pretrained DEISTE on SciTail generalizes well on RTE-5.
△ Less
Submitted 14 May, 2018; v1 submitted 23 April, 2018;
originally announced April 2018.
-
Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages
Authors:
Katharina Kann,
Manuel Mager,
Ivan Meza-Ruiz,
Hinrich Schütze
Abstract:
Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performan…
▽ More
Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Neural Architectures for Open-Type Relation Argument Extraction
Authors:
Benjamin Roth,
Costanza Conforti,
Nina Poerner,
Sanjeev Karn,
Hinrich Schütze
Abstract:
In this work, we introduce the task of Open-Type Relation Argument Extraction (ORAE): Given a corpus, a query entity Q and a knowledge base relation (e.g.,"Q authored notable work with title X"), the model has to extract an argument of non-standard entity type (entities that cannot be extracted by a standard named entity tagger, e.g. X: the title of a book or a work of art) from the corpus. A dist…
▽ More
In this work, we introduce the task of Open-Type Relation Argument Extraction (ORAE): Given a corpus, a query entity Q and a knowledge base relation (e.g.,"Q authored notable work with title X"), the model has to extract an argument of non-standard entity type (entities that cannot be extracted by a standard named entity tagger, e.g. X: the title of a book or a work of art) from the corpus. A distantly supervised dataset based on WikiData relations is obtained and released to address the task.
We develop and compare a wide range of neural models for this task yielding large improvements over a strong baseline obtained with a neural question answering system. The impact of different sentence encoding architectures and answer extraction methods is systematically compared. An encoder based on gated recurrent units combined with a conditional random fields tagger gives the best results.
△ Less
Submitted 30 September, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Embedding Learning Through Multilingual Concept Induction
Authors:
Philipp Dufter,
Mengjie Zhao,
Martin Schmitt,
Alexander Fraser,
Hinrich Schütze
Abstract:
We present a new method for estimating vector space representations of words: embedding learning by concept induction. We test this method on a highly parallel corpus and learn semantic representations of words in 1259 different languages in a single common space. An extensive experimental evaluation on crosslingual word similarity and sentiment analysis indicates that concept-based multilingual e…
▽ More
We present a new method for estimating vector space representations of words: embedding learning by concept induction. We test this method on a highly parallel corpus and learn semantic representations of words in 1259 different languages in a single common space. An extensive experimental evaluation on crosslingual word similarity and sentiment analysis indicates that concept-based multilingual embedding learning performs better than previous approaches.
△ Less
Submitted 27 June, 2018; v1 submitted 21 January, 2018;
originally announced January 2018.
-
Evaluating neural network explanation methods using hybrid documents and morphological agreement
Authors:
Nina Poerner,
Benjamin Roth,
Hinrich Schütze
Abstract:
The behavior of deep neural networks (DNNs) is hard to understand. This makes it necessary to explore post hoc explanation methods. We conduct the first comprehensive evaluation of explanation methods for NLP. To this end, we design two novel evaluation paradigms that cover two important classes of NLP problems: small context and large context problems. Both paradigms require no manual annotation…
▽ More
The behavior of deep neural networks (DNNs) is hard to understand. This makes it necessary to explore post hoc explanation methods. We conduct the first comprehensive evaluation of explanation methods for NLP. To this end, we design two novel evaluation paradigms that cover two important classes of NLP problems: small context and large context problems. Both paradigms require no manual annotation and are therefore broadly applicable. We also introduce LIMSSE, an explanation method inspired by LIME that is designed for NLP. We show empirically that LIMSSE, LRP and DeepLIFT are the most effective explanation methods and recommend them for explaining DNNs in NLP.
△ Less
Submitted 6 May, 2019; v1 submitted 19 January, 2018;
originally announced January 2018.
-
Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time
Authors:
Pankaj Gupta,
Subburam Rajaram,
Hinrich Schütze,
Bernt Andrassy
Abstract:
Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering…
▽ More
Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.
△ Less
Submitted 1 May, 2018; v1 submitted 15 November, 2017;
originally announced November 2017.
-
Impact of Coreference Resolution on Slot Filling
Authors:
Heike Adel,
Hinrich Schütze
Abstract:
In this paper, we demonstrate the importance of coreference resolution for natural language processing on the example of the TAC Slot Filling shared task. We illustrate the strengths and weaknesses of automatic coreference resolution systems and provide experimental results to show that they improve performance in the slot filling end-to-end setting. Finally, we publish KBPchains, a resource conta…
▽ More
In this paper, we demonstrate the importance of coreference resolution for natural language processing on the example of the TAC Slot Filling shared task. We illustrate the strengths and weaknesses of automatic coreference resolution systems and provide experimental results to show that they improve performance in the slot filling end-to-end setting. Finally, we publish KBPchains, a resource containing automatically extracted coreference chains from the TAC source corpus in order to support other researchers working on this topic.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Attentive Convolution: Equip** CNNs with RNN-style Attention Mechanisms
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that…
▽ More
In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text t^x. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text t^x that are distant or (ii) from extra (i.e., external) contexts t^y. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs.
△ Less
Submitted 13 November, 2018; v1 submitted 2 October, 2017;
originally announced October 2017.
-
Corpus-level Fine-grained Entity Ty**
Authors:
Yadollah Yaghoobzadeh,
Heike Adel,
Hinrich Schütze
Abstract:
This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding- based and combines (i)…
▽ More
This paper addresses the problem of corpus-level entity ty**, i.e., inferring from a large corpus that an entity is a member of a class such as "food" or "artist". The application of entity ty** we are interested in is knowledge base completion, specifically, to learn which classes an entity is a member of. We propose FIGMENT to tackle this problem. FIGMENT is embedding- based and combines (i) a global model that scores based on aggregated contextual information of an entity and (ii) a context model that first scores the individual occurrences of an entity and then aggregates the scores. Each of the two proposed models has some specific properties. For the global model, learning high quality entity representations is crucial because it is the only source used for the predictions. Therefore, we introduce representations using name and contexts of entities on the three levels of entity, word, and character. We show each has complementary information and a multi-level representation is the best. For the context model, we need to use distant supervision since the context-level labels are not available for entities. Distant supervised labels are noisy and this harms the performance of models. Therefore, we introduce and apply new algorithms for noise mitigation using multi-instance learning. We show the effectiveness of our models in a large entity ty** dataset, built from Freebase.
△ Less
Submitted 6 June, 2018; v1 submitted 7 August, 2017;
originally announced August 2017.
-
Global Normalization of Convolutional Neural Networks for Joint Entity and Relation Classification
Authors:
Heike Adel,
Hinrich Schütze
Abstract:
We introduce globally normalized convolutional neural networks for joint entity classification and relation extraction. In particular, we propose a way to utilize a linear-chain conditional random field output layer for predicting entity types and relations between entities at the same time. Our experiments show that global normalization outperforms a locally normalized softmax layer on a benchmar…
▽ More
We introduce globally normalized convolutional neural networks for joint entity classification and relation extraction. In particular, we propose a way to utilize a linear-chain conditional random field output layer for predicting entity types and relations between entities at the same time. Our experiments show that global normalization outperforms a locally normalized softmax layer on a benchmark dataset.
△ Less
Submitted 7 August, 2018; v1 submitted 24 July, 2017;
originally announced July 2017.
-
Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models
Authors:
Katharina Kann,
Hinrich Schütze
Abstract:
We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use…
▽ More
We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.9% improvement over state-of-the-art baselines for 8 different languages.
△ Less
Submitted 21 July, 2017; v1 submitted 17 May, 2017;
originally announced May 2017.
-
Past, Present, Future: A Computational Investigation of the Typology of Tense in 1000 Languages
Authors:
Ehsaneddin Asgari,
Hinrich Schütze
Abstract:
We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the…
▽ More
We present SuperPivot, an analysis method for low-resource languages that occur in a superparallel corpus, i.e., in a corpus that contains an order of magnitude more languages than parallel corpora currently in use. We show that SuperPivot performs well for the crosslingual analysis of the linguistic phenomenon of tense. We produce analysis results for more than 1000 languages, conducting - to the best of our knowledge - the largest crosslingual computational study performed to date. We extend existing methodology for leveraging parallel corpora for typological analysis by overcoming a limiting assumption of earlier work: We only require that a linguistic feature is overtly marked in a few of thousands of languages as opposed to requiring that it be marked in all languages under investigation.
△ Less
Submitted 14 September, 2017; v1 submitted 28 April, 2017;
originally announced April 2017.
-
A Subjective Logic Formalisation of the Principle of Polyrepresentation for Information Needs
Authors:
Christina Lioma,
Birger Larsen,
Hinrich Schütze,
Peter Ingwersen
Abstract:
Interactive Information Retrieval refers to the branch of Information Retrieval that considers the retrieval process with respect to a wide range of contexts, which may affect the user's information seeking experience. The identification and representation of such contexts has been the object of the principle of Polyrepresentation, a theoretical framework for reasoning about different representati…
▽ More
Interactive Information Retrieval refers to the branch of Information Retrieval that considers the retrieval process with respect to a wide range of contexts, which may affect the user's information seeking experience. The identification and representation of such contexts has been the object of the principle of Polyrepresentation, a theoretical framework for reasoning about different representations arising from interactive information retrieval in a given context. Although the principle of Polyrepresentation has received attention from many researchers, not much empirical work has been done based on it. One reason may be that it has not yet been formalised mathematically. In this paper we propose an up-to-date and exible mathematical formalisation of the principle of Polyrepresentation for information needs. Specifically, we apply Subjective Logic to model different representations of information needs as beliefs marked by degrees of uncertainty. We combine such beliefs using different logical operators, and we discuss these combinations with respect to different retrieval scenarios and situations. A formal model is introduced and discussed, with illustrative applications to the modelling of information needs.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
One-Shot Neural Cross-Lingual Transfer for Paradigm Completion
Authors:
Katharina Kann,
Ryan Cotterell,
Hinrich Schütze
Abstract:
We present a novel cross-lingual transfer method for paradigm completion, the task of map** a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up…
▽ More
We present a novel cross-lingual transfer method for paradigm completion, the task of map** a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.
△ Less
Submitted 31 March, 2017;
originally announced April 2017.
-
Comparative Study of CNN and RNN for Natural Language Processing
Authors:
Wenpeng Yin,
Katharina Kann,
Mo Yu,
Hinrich Schütze
Abstract:
Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tas…
▽ More
Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
Task-Specific Attentive Pooling of Phrase Alignments Contributes to Sentence Matching
Authors:
Wenpeng Yin,
Hinrich Schütze
Abstract:
This work studies comparatively two typical sentence matching tasks: textual entailment (TE) and answer selection (AS), observing that weaker phrase alignments are more critical in TE, while stronger phrase alignments deserve more attention in AS. The key to reach this observation lies in phrase detection, phrase representation, phrase alignment, and more importantly how to connect those aligned p…
▽ More
This work studies comparatively two typical sentence matching tasks: textual entailment (TE) and answer selection (AS), observing that weaker phrase alignments are more critical in TE, while stronger phrase alignments deserve more attention in AS. The key to reach this observation lies in phrase detection, phrase representation, phrase alignment, and more importantly how to connect those aligned phrases of different matching degrees with the final classifier. Prior work (i) has limitations in phrase generation and representation, or (ii) conducts alignment at word and phrase levels by handcrafted features or (iii) utilizes a single framework of alignment without considering the characteristics of specific tasks, which limits the framework's effectiveness across tasks. We propose an architecture based on Gated Recurrent Unit that supports (i) representation learning of phrases of arbitrary granularity and (ii) task-specific attentive pooling of phrase alignments between two sentences. Experimental results on TE and AS match our observation and show the effectiveness of our approach.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.
-
Multi-level Representations for Fine-Grained Ty** of Knowledge Base Entities
Authors:
Yadollah Yaghoobzadeh,
Hinrich Schütze
Abstract:
Entities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and fin…
▽ More
Entities are essential elements of natural language. In this paper, we present methods for learning multi-level representations of entities on three complementary levels: character (character patterns in entity names extracted, e.g., by neural networks), word (embeddings of words in entity names) and entity (entity embeddings). We investigate state-of-the-art learning methods on each level and find large differences, e.g., for deep learning models, traditional ngram features and the subword model of fasttext (Bojanowski et al., 2016) on the character level; for word2vec (Mikolov et al., 2013) on the word level; and for the order-aware model wang2vec (Ling et al., 2015a) on the entity level. We confirm experimentally that each level of representation contributes complementary information and a joint representation of all three levels improves the existing embedding based baseline for fine-grained entity ty** by a large margin. Additionally, we show that adding information from entity descriptions further improves multi-level representations of entities.
△ Less
Submitted 16 January, 2017; v1 submitted 8 January, 2017;
originally announced January 2017.
-
Joint Semantic Synthesis and Morphological Analysis of the Derived Word
Authors:
Ryan Cotterell,
Hinrich Schütze
Abstract:
Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word's meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematical…
▽ More
Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+able+ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word's meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituents segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DerivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist's notion of morphological productivity.
△ Less
Submitted 10 November, 2018; v1 submitted 4 January, 2017;
originally announced January 2017.
-
Noise Mitigation for Neural Entity Ty** and Relation Extraction
Authors:
Yadollah Yaghoobzadeh,
Heike Adel,
Hinrich Schütze
Abstract:
In this paper, we address two different types of noise in information extraction models: noise from distant supervision and noise from pipeline input features. Our target tasks are entity ty** and relation extraction. For the first noise type, we introduce multi-instance multi-label learning algorithms using neural network models, and apply them to fine-grained entity ty** for the first time.…
▽ More
In this paper, we address two different types of noise in information extraction models: noise from distant supervision and noise from pipeline input features. Our target tasks are entity ty** and relation extraction. For the first noise type, we introduce multi-instance multi-label learning algorithms using neural network models, and apply them to fine-grained entity ty** for the first time. This gives our models comparable performance with the state-of-the-art supervised approach which uses global embeddings of entities. For the second noise type, we propose ways to improve the integration of noisy entity type predictions into relation extraction. Our experiments show that probabilistic predictions are more robust than discrete predictions and that joint training of the two tasks performs best.
△ Less
Submitted 10 January, 2017; v1 submitted 22 December, 2016;
originally announced December 2016.
-
Exploring Different Dimensions of Attention for Uncertainty Detection
Authors:
Heike Adel,
Hinrich Schütze
Abstract:
Neural networks with attention have proven effective for many natural language processing tasks. In this paper, we develop attention mechanisms for uncertainty detection. In particular, we generalize standardly used attention mechanisms by introducing external attention and sequence-preserving attention. These novel architectures differ from standard approaches in that they use external resources…
▽ More
Neural networks with attention have proven effective for many natural language processing tasks. In this paper, we develop attention mechanisms for uncertainty detection. In particular, we generalize standardly used attention mechanisms by introducing external attention and sequence-preserving attention. These novel architectures differ from standard approaches in that they use external resources to compute attention weights and preserve sequence information. We compare them to other configurations along different dimensions of attention. Our novel architectures set the new state of the art on a Wikipedia benchmark dataset and perform similar to the state-of-the-art model on a biomedical benchmark which uses a large set of linguistic features.
△ Less
Submitted 10 January, 2017; v1 submitted 20 December, 2016;
originally announced December 2016.
-
Neural Multi-Source Morphological Reinflection
Authors:
Katharina Kann,
Ryan Cotterell,
Hinrich Schütze
Abstract:
We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version. The input consists of (i) a target tag and (ii) multiple pairs of source form and source tag for a lemma. The motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e.g., different stems.…
▽ More
We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version. The input consists of (i) a target tag and (ii) multiple pairs of source form and source tag for a lemma. The motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e.g., different stems. We further present a novel extension to the encoder- decoder recurrent neural architecture, consisting of multiple encoders, to better solve the task. We show that our new architecture outperforms single-source reinflection models and publish our dataset for multi-source morphological reinflection to facilitate future research.
△ Less
Submitted 22 January, 2017; v1 submitted 18 December, 2016;
originally announced December 2016.
-
Nonsymbolic Text Representation
Authors:
Hinrich Schuetze,
Heike Adel,
Ehsaneddin Asgari
Abstract:
We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that…
▽ More
We introduce the first generic text representation model that is completely nonsymbolic, i.e., it does not require the availability of a segmentation or tokenization method that attempts to identify words or other symbolic units in text. This applies to training the parameters of the model on a training corpus as well as to applying it when computing the representation of a new text. We show that our model performs better than prior work on an information extraction and a text denoising task.
△ Less
Submitted 1 May, 2017; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Intrinsic Subspace Evaluation of Word Embedding Representations
Authors:
Yadollah Yaghoobzadeh,
Hinrich Schütze
Abstract:
We introduce a new methodology for intrinsic evaluation of word representations. Specifically, we identify four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems; and develop tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria. Current intrinsic evaluations are mostly based on th…
▽ More
We introduce a new methodology for intrinsic evaluation of word representations. Specifically, we identify four fundamental criteria based on the characteristics of natural language that pose difficulties to NLP systems; and develop tests that directly show whether or not representations contain the subspaces necessary to satisfy these criteria. Current intrinsic evaluations are mostly based on the overall similarity or full-space similarity of words and thus view vector representations as points. We show the limits of these point-based intrinsic evaluations. We apply our evaluation methodology to the comparison of a count vector model and several neural network models and demonstrate important properties of these models.
△ Less
Submitted 25 June, 2016;
originally announced June 2016.