Skip to main content

Showing 101–150 of 220 results for author: Schütze, H

.
  1. arXiv:2104.08829  [pdf, other

    cs.CL cs.AI cs.SI

    Modeling Ideological Salience and Framing in Polarized Online Groups with Graph Neural Networks and Structured Sparsity

    Authors: Valentin Hofmann, Xiaowen Dong, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: The increasing polarization of online political discourse calls for computational tools that automatically detect and monitor ideological divides in social media. We introduce a minimally supervised method that leverages the network structure of online discussion forums, specifically Reddit, to detect polarized concepts. We model polarization along the dimensions of salience and framing, drawing u… ▽ More

    Submitted 14 December, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: NAACL 2022 (Findings)

  2. arXiv:2104.08551  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

    Authors: Pankaj Gupta, Yatin Chaudhary, Hinrich Schütze

    Abstract: Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embedding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: NAACL2021, 13 pages, 14 tables, 2 figures. arXiv admin note: substantial text overlap with arXiv:1909.06563

  3. arXiv:2104.08401  [pdf, ps, other

    cs.CL cs.AI

    Enriching a Model's Notion of Belief using a Persistent Memory

    Authors: Nora Kassner, Oyvind Tafjord, Hinrich Schutze, Peter Clark

    Abstract: Although pretrained language models (PTLMs) have been shown to contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after using specialized training techniques to reduce inconsistency. As a result, it can be hard to identify what the model actually "believes" about the world. Our goal is to reduce this problem, so systems are mo… ▽ More

    Submitted 7 October, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: This is an old and now obsolete draft. See arXiv:2109.14723 ("BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief") for the final paper

  4. arXiv:2104.07540  [pdf, other

    cs.CL cs.LG

    Generating Datasets with Pretrained Language Models

    Authors: Timo Schick, Hinrich Schütze

    Abstract: To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how PLMs can be leveraged to obta… ▽ More

    Submitted 4 October, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted at EMNLP2021

  5. arXiv:2104.07094  [pdf, other

    cs.CL

    Static Embeddings as Efficient Knowledge Bases?

    Authors: Philipp Dufter, Nora Kassner, Hinrich Schütze

    Abstract: Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically divers… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: NAACL2021 CRV; first two authors contributed equally

  6. arXiv:2103.05131  [pdf, other

    cs.CL

    Few-Shot Learning of an Interleaved Text Summarization Model by Pretraining with Synthetic Data

    Authors: Sanjeev Kumar Karn, Francine Chen, Yan-Ying Chen, Ulli Waltinger, Hinrich Schuetze

    Abstract: Interleaved texts, where posts belonging to different threads occur in a sequence, commonly occur in online chat posts, so that it can be time-consuming to quickly obtain an overview of the discussions. Existing systems first disentangle the posts by threads and then extract summaries from those threads. A major issue with such systems is error propagation from the disentanglement component. While… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: Adapt-NLP: The Second Workshop on Domain Adaptation for NLP

  7. arXiv:2103.00453  [pdf, other

    cs.CL

    Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

    Authors: Timo Schick, Sahana Udupa, Hinrich Schütze

    Abstract: When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In t… ▽ More

    Submitted 9 September, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: Accepted at TACL

  8. arXiv:2102.11090  [pdf, other

    cs.CL cs.AI

    Position Information in Transformers: An Overview

    Authors: Philipp Dufter, Martin Schmitt, Hinrich Schütze

    Abstract: Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this article, we provide an overview and theoretical comparison of existing methods to incorporate positio… ▽ More

    Submitted 9 September, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Comments: First two authors contributed equally

  9. arXiv:2102.05331  [pdf, other

    cs.CL cs.AI

    Language Models for Lexical Inference in Context

    Authors: Martin Schmitt, Hinrich Schütze

    Abstract: Lexical inference in context (LIiC) is the task of recognizing textual entailment between two very similar sentences, i.e., sentences that only differ in one expression. It can therefore be seen as a variant of the natural language inference task that is focused on lexical semantics. We formulate and evaluate the first approaches based on pretrained language models (LMs) for this task: (i) a few-s… ▽ More

    Submitted 27 April, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

    Comments: Final version of EACL 2021 long paper

  10. arXiv:2102.04760  [pdf, other

    cs.CV cs.AI

    Improving Scene Graph Classification by Exploiting Knowledge from Texts

    Authors: Sahand Sharifzadeh, Sina Moayed Baharlou, Martin Schmitt, Hinrich Schütze, Volker Tresp

    Abstract: Training scene graph classification models requires a large amount of annotated image data. Meanwhile, scene graphs represent relational knowledge that can be modeled with symbolic data from texts or knowledge graphs. While image annotation demands extensive labor, collecting textual descriptions of natural scenes requires less effort. In this work, we investigate whether textual scene description… ▽ More

    Submitted 8 October, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

  11. arXiv:2102.03596  [pdf, other

    cs.CL

    Does He Wink or Does He Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models

    Authors: Lutfi Kerem Senel, Hinrich Schütze

    Abstract: Recent progress in pretraining language models on large corpora has resulted in large performance gains on many NLP tasks. These large models acquire linguistic knowledge during pretraining, which helps to improve performance on downstream tasks via fine-tuning. To assess what kind of knowledge is acquired, language models are commonly probed by querying them with `fill in the blank' style cloze q… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

    Comments: 5 pages, to appear in EACL 2021

  12. arXiv:2102.01017  [pdf, other

    cs.CL

    Measuring and Improving Consistency in Pretrained Language Models

    Authors: Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav Goldberg

    Abstract: Consistency of a model -- that is, the invariance of its behavior under meaning-preserving alternations in its input -- is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases… ▽ More

    Submitted 29 May, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to the TACL journal, pre-MIT Press publication version

  13. arXiv:2102.00894  [pdf, other

    cs.CL

    Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

    Authors: Nora Kassner, Philipp Dufter, Hinrich Schütze

    Abstract: Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowle… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to EACL 2021

  14. arXiv:2101.00403  [pdf, other

    cs.CL

    Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021

  15. arXiv:2012.15682  [pdf, other

    cs.CL

    A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

    Authors: Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, Anna Korhonen, Hinrich Schütze

    Abstract: Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of se… ▽ More

    Submitted 2 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL-IJCNLP 2021

  16. arXiv:2012.11926  [pdf, other

    cs.CL cs.LG

    Few-Shot Text Generation with Pattern-Exploiting Training

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Providing pretrained language models with simple task descriptions in natural language enables them to solve some tasks in a fully unsupervised fashion. Moreover, when combined with regular learning from examples, this idea yields impressive few-shot results for a wide range of text classification tasks. It is also a promising direction to improve data efficiency in generative settings, but there… ▽ More

    Submitted 4 October, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted at EMNLP2021

  17. arXiv:2012.11657  [pdf, other

    cs.CL

    Subword Sampling for Low Resource Word Alignment

    Authors: Ehsaneddin Asgari, Masoud Jalili Sabet, Philipp Dufter, Christopher Ringlstetter, Hinrich Schütze

    Abstract: Annotation projection is an important area in NLP that can greatly contribute to creating language resources for low-resource languages. Word alignment plays a key role in this setting. However, most of the existing word alignment methods are designed for a high resource setting in machine translation where millions of parallel sentences are available. This amount reduces to a few thousands of sen… ▽ More

    Submitted 15 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

  18. arXiv:2010.16407  [pdf, other

    cs.CL cs.LG

    TopicBERT for Energy Efficient Document Classification

    Authors: Yatin Chaudhary, Pankaj Gupta, Khushbu Saxena, Vivek Kulkarni, Thomas Runkler, Hinrich Schütze

    Abstract: Prior research notes that BERT's computational cost grows quadratically with sequence length thus leading to longer training times, higher GPU memory constraints and carbon emissions. While recent work seeks to address these scalability issues at pre-training, these issues are also prominent in fine-tuning especially for long sequence tasks like document classification. Our work thus focuses on op… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: EMNLP2020 (Findings): 9 pages, 5 figures, 8 Tables

  19. arXiv:2010.13641  [pdf, other

    cs.CL cs.AI cs.LG

    Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

    Authors: Timo Schick, Helmut Schmid, Hinrich Schütze

    Abstract: A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this map** between words and labels requires both domain expertise and an understanding of the language model's abilities. To mitigate this issue, we de… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: To appear at COLING 2020

  20. arXiv:2010.12684  [pdf, other

    cs.CL

    Dynamic Contextualized Word Embeddings

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts. Building on prior work on contextualized and dynamic word embeddings, we introduce dynamic contextualized word embeddings that represent words as a function of both linguistic and extralinguistic context. Based on a pretrained language… ▽ More

    Submitted 8 June, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: ACL 2021

  21. arXiv:2009.13375  [pdf, other

    cs.CL cs.CY cs.LG

    Identifying Automatically Generated Headlines using Transformers

    Authors: Antonis Maronikolakis, Hinrich Schutze, Mark Stevenson

    Abstract: False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset con… ▽ More

    Submitted 25 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: NLP4IF 2021 Proceedings, NAACL 2021

  22. arXiv:2009.07118  [pdf, other

    cs.CL cs.AI cs.LG

    It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

    Authors: Timo Schick, Hinrich Schütze

    Abstract: When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 ca… ▽ More

    Submitted 12 April, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted at NAACL2021

  23. arXiv:2007.08426  [pdf, other

    cs.CL

    Investigating Pretrained Language Models for Graph-to-Text Generation

    Authors: Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich Schütze, Iryna Gurevych

    Abstract: Graph-to-text generation aims to generate fluent texts from graph-based data. In this paper, we investigate two recently proposed pretrained language models (PLMs) and analyze the impact of different task-adaptive pretraining strategies for PLMs in graph-to-text generation. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.… ▽ More

    Submitted 27 September, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Accepted as a long paper to NLP4ConvAI, EMNLP2021

  24. Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes

    Authors: Marina Sedinkina, Nikolas Breitkopf, Hinrich Schütze

    Abstract: In this paper, we automatically create sentiment dictionaries for predicting financial outcomes. We compare three approaches: (I) manual adaptation of the domain-general dictionary H4N, (ii) automatic adaptation of H4N and (iii) a combination consisting of first manual, then automatic adaptation. In our experiments, we demonstrate that the automatically adapted sentiment dictionary outperforms the… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: Accepted at ACL2019

  25. arXiv:2006.10909  [pdf, other

    cs.CL cs.IR cs.LG cs.NE

    Neural Topic Modeling with Continual Lifelong Learning

    Authors: Pankaj Gupta, Yatin Chaudhary, Thomas Runkler, Hinrich Schütze

    Abstract: Lifelong learning has recently attracted attention in building machine learning systems that continually accumulate and transfer knowledge to help future learning. Unsupervised topic modeling has been popularly used to discover topics from document collections. However, the application of topic modeling is challenging due to data sparsity, e.g., in a small collection of (short) documents and thus,… ▽ More

    Submitted 27 June, 2023; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted at ICML2020 (13 pages, 11 figures, 9 tables)

  26. arXiv:2006.10632  [pdf, other

    cs.CL cs.AI cs.LG

    Explainable and Discourse Topic-aware Neural Language Understanding

    Authors: Yatin Chaudhary, Hinrich Schütze, Pankaj Gupta

    Abstract: Marrying topic models and language models exposes language understanding to a broader source of document-level context beyond sentences via topics. While introducing topical semantics in language models, existing approaches incorporate latent document topic proportions and ignore topical discourse in sentences of the document. This work extends the line of research by additionally introducing an e… ▽ More

    Submitted 27 June, 2023; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted at ICML2020 (13 pages, 2 figures)

  27. arXiv:2006.10413  [pdf, other

    cs.CL

    Are Pretrained Language Models Symbolic Reasoners Over Knowledge?

    Authors: Nora Kassner, Benno Krojer, Hinrich Schütze

    Abstract: How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reas… ▽ More

    Submitted 10 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted to CoNLL 2020

  28. arXiv:2006.09242  [pdf, other

    cs.CL

    Modeling Graph Structure via Relative Position for Text Generation from Knowledge Graphs

    Authors: Martin Schmitt, Leonardo F. R. Ribeiro, Philipp Dufter, Iryna Gurevych, Hinrich Schütze

    Abstract: We present Graformer, a novel Transformer-based encoder-decoder architecture for graph-to-text generation. With our novel graph self-attention, the encoding of a node relies on all nodes in the input graph - not only direct neighbors - facilitating the detection of global patterns. We represent the relation between two nodes as the length of the shortest path between them. Graformer learns to weig… ▽ More

    Submitted 27 April, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Accepted as a long paper at TextGraphs 2021

  29. arXiv:2005.07979  [pdf, other

    cs.CL cs.AI

    Unsupervised Embedding-based Detection of Lexical Semantic Changes

    Authors: Ehsaneddin Asgari, Christoph Ringlstetter, Hinrich Schütze

    Abstract: This paper describes EmbLexChange, a system introduced by the "Life-Language" team for SemEval-2020 Task 1, on unsupervised detection of lexical-semantic changes. EmbLexChange is defined as the divergence between the embedding based profiles of word w (calculated with respect to a set of reference words) in the source and the target domains (source and target domains can be simply two time frames… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  30. arXiv:2005.00766  [pdf, other

    cs.CL

    BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA

    Authors: Nora Kassner, Hinrich Schütze

    Abstract: Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collecti… ▽ More

    Submitted 12 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: to appear in EMNLP Findings

  31. arXiv:2005.00672  [pdf, other

    cs.CL

    DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

    Authors: Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze

    Abstract: Can pretrained language models (PLMs) generate derivationally complex words? We present the first study investigating this question, taking BERT as the example PLM. We examine BERT's derivational capabilities in different settings, ranging from using the unmodified pretrained model to full finetuning. Our best model, DagoBERT (Derivationally and generatively optimized BERT), clearly outperforms th… ▽ More

    Submitted 7 October, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

  32. arXiv:2005.00396  [pdf, other

    cs.CL

    Identifying Necessary Elements for BERT's Multilinguality

    Authors: Philipp Dufter, Hinrich Schütze

    Abstract: It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While recent literature has studied this phenomenon, the reasons for the multilinguality are still somewhat obscure. We aim to identify architectural properties of BERT a… ▽ More

    Submitted 8 February, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

    Comments: EMNLP2020 CRV

  33. arXiv:2004.12406  [pdf, other

    cs.CL

    Masking as an Efficient Alternative to Finetuning for Pretrained Language Models

    Authors: Mengjie Zhao, Tao Lin, Fei Mi, Martin Jaggi, Hinrich Schütze

    Abstract: We present an efficient method of utilizing pretrained language models, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be in… ▽ More

    Submitted 11 October, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020; MZ and TL contribute equally

  34. arXiv:2004.12198  [pdf, other

    cs.CL

    Quantifying the Contextualization of Word Representations with Semantic Class Probing

    Authors: Mengjie Zhao, Philipp Dufter, Yadollah Yaghoobzadeh, Hinrich Schütze

    Abstract: Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well. We investigate the contextualization of words in BERT. We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its context… ▽ More

    Submitted 11 October, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

    Comments: EMNLP Findings 2020

  35. arXiv:2004.08728  [pdf, other

    cs.CL

    SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

    Authors: Masoud Jalili Sabet, Philipp Dufter, François Yvon, Hinrich Schütze

    Abstract: Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data, and quality decreases as less training data is available. We propose word alignment methods that re… ▽ More

    Submitted 16 April, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: EMNLP (Findings) 2020

  36. arXiv:2004.03354  [pdf, other

    cs.CL

    Inexpensive Domain Adaptation of Pretrained Language Models: Case Studies on Biomedical NER and Covid-19 QA

    Authors: Nina Poerner, Ulli Waltinger, Hinrich Schütze

    Abstract: Domain adaptation of Pretrained Language Models (PTLMs) is typically achieved by unsupervised pretraining on target-domain text. While successful, this approach is expensive in terms of hardware, runtime and CO_2 emissions. Here, we propose a cheaper alternative: We train Word2Vec on target-domain text and align the resulting word vectors with the wordpiece vectors of a general-domain PTLM. We eva… ▽ More

    Submitted 27 June, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

  37. arXiv:2001.07676  [pdf, other

    cs.CL

    Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with "task descriptions" in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates i… ▽ More

    Submitted 25 January, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: Accepted at EACL2021

  38. arXiv:2001.02284  [pdf, other

    cs.CL cs.AI

    Multipurpose Intelligent Process Automation via Conversational Assistant

    Authors: Alena Moiseeva, Dietrich Trautmann, Michael Heimann, Hinrich Schütze

    Abstract: Intelligent Process Automation (IPA) is an emerging technology with a primary goal to assist the knowledge worker by taking care of repetitive, routine and low-cognitive tasks. Conversational agents that can interact with users in a natural language are potential application for IPA systems. Such intelligent agents can assist the user by answering specific questions and executing routine tasks tha… ▽ More

    Submitted 21 May, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: Presented at the AAAI-20 Workshop on Intelligent Process Automation

  39. arXiv:1912.05877  [pdf, other

    cs.CL cs.AI

    Extending Machine Language Models toward Human-Level Language Understanding

    Authors: James L. McClelland, Felix Hill, Maja Rudolph, Jason Baldridge, Hinrich Schütze

    Abstract: Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. The human ability to understand and communicate about situations emerges gradually from experience and depends on domain-general principles of biological neural networks: connection-based learning, distributed representation, and… ▽ More

    Submitted 4 July, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

  40. arXiv:1911.04916  [pdf, ps, other

    cs.CL

    Morphological Segmentation Inside-Out

    Authors: Ryan Cotterell, Arun Kumar, Hinrich Schütze

    Abstract: Morphological segmentation has traditionally been modeled with non-hierarchical models, which yield flat segmentations as output. In many cases, however, proper morphological analysis requires hierarchical structure -- especially in the case of derivational morphology. In this work, we introduce a discriminative, joint model of morphological segmentation along with the orthographic changes that oc… ▽ More

    Submitted 12 February, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: EMNLP 2016

  41. arXiv:1911.03700  [pdf, other

    cs.CL

    Sentence Meta-Embeddings for Unsupervised Semantic Textual Similarity

    Authors: Nina Poerner, Ulli Waltinger, Hinrich Schütze

    Abstract: We address the task of unsupervised Semantic Textual Similarity (STS) by ensembling diverse pre-trained sentence encoders into sentence meta-embeddings. We apply, extend and evaluate different meta-embedding methods from the word embedding literature at the sentence level, including dimensionality reduction (Yin and Schütze, 2016), generalized Canonical Correlation Analysis (Rastogi et al., 2015)… ▽ More

    Submitted 24 June, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

  42. arXiv:1911.03681  [pdf, other

    cs.CL

    E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT

    Authors: Nina Poerner, Ulli Waltinger, Hinrich Schütze

    Abstract: We present a novel way of injecting factual knowledge about entities into the pretrained BERT model (Devlin et al., 2019): We align Wikipedia2Vec entity vectors (Yamada et al., 2016) with BERT's native wordpiece vector space and use the aligned entity vectors as if they were wordpiece vectors. The resulting entity-enhanced version of BERT (called E-BERT) is similar in spirit to ERNIE (Zhang et al.… ▽ More

    Submitted 1 May, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

  43. arXiv:1911.03343  [pdf, other

    cs.CL

    Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly

    Authors: Nora Kassner, Hinrich Schütze

    Abstract: Building on Petroni et al. (2019), we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated ("Birds cannot [MASK]") and non-negated ("Birds can [MASK]") cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add "misprimes" to cloze questions ("Talk? Bir… ▽ More

    Submitted 15 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: ACL 2020

  44. arXiv:1910.07181  [pdf, other

    cs.CL

    BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Schütze (2020) recently showed that these models struggle to understand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a p… ▽ More

    Submitted 29 April, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted at ACL2020

  45. arXiv:1910.03385  [pdf, other

    cs.CL

    Linguistically Informed Relation Extraction and Neural Architectures for Nested Named Entity Recognition in BioNLP-OST 2019

    Authors: Usama Yaseen, Pankaj Gupta, Hinrich Schütze

    Abstract: Named Entity Recognition (NER) and Relation Extraction (RE) are essential tools in distilling knowledge from biomedical literature. This paper presents our findings from participating in BioNLP Shared Tasks 2019. We addressed Named Entity Recognition including nested entities extraction, Entity Normalization and Relation Extraction. Our proposed approach of Named Entities can be generalized to dif… ▽ More

    Submitted 8 October, 2019; originally announced October 2019.

    Comments: EMNLP 2019, 11 pages, 4 figures, 8 tables

  46. Type-aware Convolutional Neural Networks for Slot Filling

    Authors: Heike Adel, Hinrich Schütze

    Abstract: The slot filling task aims at extracting answers for queries about entities from text, such as "Who founded Apple". In this paper, we focus on the relation classification component of a slot filling system. We propose type-aware convolutional neural networks to benefit from the mutual dependencies between entity and relation classification. In particular, we explore different ways of integrating t… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Journal of Artificial Intelligence Research (JAIR), volume 66

  47. arXiv:1910.00314  [pdf, other

    cs.LG cs.CL cs.IR stat.ML

    BioNLP-OST 2019 RDoC Tasks: Multi-grain Neural Relevance Ranking Using Topics and Attention Based Query-Document-Sentence Interactions

    Authors: Yatin Chaudhary, Pankaj Gupta, Hinrich Schütze

    Abstract: This paper presents our system details and results of participation in the RDoC Tasks of BioNLP-OST 2019. Research Domain Criteria (RDoC) construct is a multi-dimensional and broad framework to describe mental health disorders by combining knowledge from genomics to behaviour. Non-availability of RDoC labelled dataset and tedious labelling process hinders the use of RDoC framework to reach its ful… ▽ More

    Submitted 2 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: EMNLP2019, 10 pages, 2 figures, 7 tables

  48. arXiv:1909.06563  [pdf, other

    cs.CL cs.IR cs.LG

    Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings

    Authors: Pankaj Gupta, Yatin Chaudhary, Hinrich Schütze

    Abstract: Though word embeddings and topics are complementary representations, several past works have only used pre-trained word embeddings in (neural) topic modeling to address data sparsity problem in short text or small collection of documents. However, no prior work has employed (pre-trained latent) topics in transfer learning paradigm. In this paper, we propose an approach to (1) perform knowledge tra… ▽ More

    Submitted 17 September, 2019; v1 submitted 14 September, 2019; originally announced September 2019.

  49. arXiv:1909.06162  [pdf, other

    cs.CL cs.IR cs.LG

    Neural Architectures for Fine-Grained Propaganda Detection in News

    Authors: Pankaj Gupta, Khushbu Saxena, Usama Yaseen, Thomas Runkler, Hinrich Schütze

    Abstract: This paper describes our system (MIC-CIS) details and results of participation in the fine-grained propaganda detection shared task 2019. To address the tasks of sentence (SLC) and fragment level (FLC) propaganda detection, we explore different neural architectures (e.g., CNN, LSTM-CRF and BERT) and extract linguistic (e.g., part-of-speech, named entity, readability, sentiment, emotion, etc.), lay… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    Comments: EMNLP2019: Fine-grained propaganda detection shared task at NLP4IF workshop (EMNLP2019)

  50. arXiv:1907.02423  [pdf, other

    cs.CL

    Morphological Word Embeddings

    Authors: Ryan Cotterell, Hinrich Schütze

    Abstract: Linguistic similarity is multi-faceted. For instance, two words may be similar with respect to semantics, syntax, or morphology inter alia. Continuous word-embeddings have been shown to capture most of these shades of similarity to some degree. This work considers guiding word-embeddings with morphologically annotated data, a form of semi-supervised learning, encouraging the vectors to encode a wo… ▽ More

    Submitted 4 July, 2019; originally announced July 2019.

    Comments: Published at NAACL 2015