Skip to main content

Showing 1–29 of 29 results for author: Walde, S S i

.
  1. arXiv:2404.04035  [pdf, other

    cs.CL

    A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

    Authors: Annerose Eichel, Sabine Schulte im Walde

    Abstract: We present a novel dataset for physical and abstract plausibility of events in English. Based on naturally occurring sentences extracted from Wikipedia, we infiltrate degrees of abstractness, and automatically generate perturbed pseudo-implausible events. We annotate a filtered and balanced subset for plausibility using crowd-sourcing, and perform extensive cleansing to ensure annotation quality.… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted at The 17th Linguistic Annotation Workshop

  2. arXiv:2404.04031  [pdf, other

    cs.CL cs.CY

    Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds

    Authors: Annerose Eichel, Tana Deeg, André Blessing, Milena Belosevic, Sabine Arndt-Lappe, Sabine Schulte im Walde

    Abstract: We present a comprehensive computational study of the under-investigated phenomenon of personal name compounds (PNCs) in German such as Willkommens-Merkel ('Welcome-Merkel'). Prevalent in news, social media, and political discourse, PNCs are hypothesized to exhibit an evaluative function that is reflected in a more positive or negative perception as compared to the respective personal full name (s… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-COLING 2024

  3. arXiv:2401.15393  [pdf, ps, other

    cs.CL

    Semantics of Multiword Expressions in Transformer-Based Models: A Survey

    Authors: Filip Miletić, Sabine Schulte im Walde

    Abstract: Multiword expressions (MWEs) are composed of multiple words and exhibit variable degrees of compositionality. As such, their meanings are notoriously difficult to model, and it is unclear to what extent this issue affects transformer architectures. Addressing this gap, we provide the first in-depth survey of MWE processing with transformer models. We overall find that they capture MWE semantics in… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted to TACL 2024. This is a pre-MIT Press publication version

  4. arXiv:2311.12664  [pdf, other

    cs.CL cs.AI

    The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change

    Authors: Dominik Schlechtweg, Shafqat Mumtaz Virk, Pauline Sander, Emma Sköldberg, Lukas Theuer Linke, Tuo Zhang, Nina Tahmasebi, Jonas Kuhn, Sabine Schulte im Walde

    Abstract: We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows… ▽ More

    Submitted 5 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: EACL Demo, 7 pages

  5. Investigating the Nature of Disagreements on Mid-Scale Ratings: A Case Study on the Abstractness-Concreteness Continuum

    Authors: Urban Knupleš, Diego Frassinelli, Sabine Schulte im Walde

    Abstract: Humans tend to strongly agree on ratings on a scale for extreme cases (e.g., a CAT is judged as very concrete), but judgements on mid-scale words exhibit more disagreement. Yet, collected rating norms are heavily exploited across disciplines. Our study focuses on concreteness ratings and (i) implements correlations and supervised classification to identify salient multi-modal characteristics of mi… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 17 pages, 13 figures, accepted to CoNLL 2023

    Journal ref: https://aclanthology.org/2023.conll-1.6

  6. arXiv:2304.14745  [pdf, other

    cs.CL cs.AI cs.IR

    Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain

    Authors: Annerose Eichel, Helena Schlipf, Sabine Schulte im Walde

    Abstract: We propose a novel approach to learn domain-specific plausible materials for components in the vehicle repair domain by probing Pretrained Language Models (PLMs) in a cloze task style setting to overcome the lack of annotated datasets. We devise a new method to aggregate salient predictions from a set of cloze query templates and show that domain-adaptation using either a small, high-quality or a… ▽ More

    Submitted 28 April, 2023; originally announced April 2023.

    Comments: Accepted at EACL 2023 Main Conference

  7. arXiv:2205.11113  [pdf, other

    cs.CL

    What Drives the Use of Metaphorical Language? Negative Insights from Abstractness, Affect, Discourse Coherence and Contextualized Word Representations

    Authors: Prisca Piccirilli, Sabine Schulte im Walde

    Abstract: Given a specific discourse, which discourse properties trigger the use of metaphorical language, rather than using literal alternatives? For example, what drives people to say "grasp the meaning" rather than "understand the meaning" within a specific context? Many NLP approaches to metaphorical language rely on cognitive and (psycho-)linguistic insights and have successfully defined models of disc… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: 12 pages, 6 figures, 1 table. Accepted at *SEM2022

  8. arXiv:2205.08939  [pdf, other

    cs.CL

    Features of Perceived Metaphoricity on the Discourse Level: Abstractness and Emotionality

    Authors: Prisca Piccirilli, Sabine Schulte im Walde

    Abstract: Research on metaphorical language has shown ties between abstractness and emotionality with regard to metaphoricity; prior work is however limited to the word and sentence levels, and up to date there is no empirical study establishing the extent to which this is also true on the discourse level. This paper explores which textual and perceptual features human annotators perceive as important for t… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: 13 pages, 3 tables, 6 figures. Accepted at LREC 2022

  9. arXiv:2106.03111  [pdf, other

    cs.CL

    Lexical Semantic Change Discovery

    Authors: Sinan Kurtyigit, Maike Park, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde

    Abstract: While there is a large amount of research in the field of Lexical Semantic Change Detection, only few approaches go beyond a standard benchmark evaluation of existing models. In this paper, we propose a shift of focus from change detection to change discovery, i.e., discovering novel word senses over time from the full corpus vocabulary. By heavily fine-tuning a type-based and a token-based approa… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

    Comments: ACL 2021, 9 pages

  10. arXiv:2106.00055  [pdf, other

    cs.CL

    More than just Frequency? Demasking Unsupervised Hypernymy Prediction Methods

    Authors: Thomas Bott, Dominik Schlechtweg, Sabine Schulte im Walde

    Abstract: This paper presents a comparison of unsupervised methods of hypernymy prediction (i.e., to predict which word in a pair of words such as fish-cod is the hypernym and which the hyponym). Most importantly, we demonstrate across datasets for English and for German that the predictions of three methods (WeedsPrec, invCL, SLQS Row) strongly overlap and are highly correlated with frequency-based predict… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: ACL Findings, 5 pages

  11. arXiv:2103.07259  [pdf, other

    cs.CL

    Explaining and Improving BERT Performance on Lexical Semantic Change Detection

    Authors: Severin Laicher, Sinan Kurtyigit, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde

    Abstract: Type- and token-based embedding architectures are still competing in lexical semantic change detection. The recent success of type-based models in SemEval-2020 Task 1 has raised the question why the success of token-based models on a variety of other NLP tasks does not translate to our field. We investigate the influence of a range of variables on clusterings of BERT vectors and show that its low… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: EACL SRW, 6 Pages

  12. arXiv:2011.07247  [pdf, ps, other

    cs.CL

    CL-IMS @ DIACR-Ita: Volente o Nolente: BERT does not outperform SGNS on Semantic Change Detection

    Authors: Severin Laicher, Gioia Baldissin, Enrique Castañeda, Dominik Schlechtweg, Sabine Schulte im Walde

    Abstract: We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit Average Pairwise Distance of token-based BERT embeddings between time points and rank 5 (of 8) in the official ranking with an accuracy of $.72$. While we tune parameters on the English data set of SemEval-2020 Task 1 and reach high performance, this does not transl… ▽ More

    Submitted 3 December, 2020; v1 submitted 14 November, 2020; originally announced November 2020.

  13. arXiv:2011.03258  [pdf, other

    cs.CL

    OP-IMS @ DIACR-Ita: Back to the Roots: SGNS+OP+CD still rocks Semantic Change Detection

    Authors: Jens Kaiser, Dominik Schlechtweg, Sabine Schulte im Walde

    Abstract: We present the results of our participation in the DIACR-Ita shared task on lexical semantic change detection for Italian. We exploit one of the earliest and most influential semantic change detection models based on Skip-Gram with Negative Sampling, Orthogonal Procrustes alignment and Cosine Distance and obtain the winning submission of the shared task with near to perfect accuracy .94. Our resul… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

  14. arXiv:2008.03164  [pdf, other

    cs.CL

    IMS at SemEval-2020 Task 1: How low can you go? Dimensionality in Lexical Semantic Change Detection

    Authors: Jens Kaiser, Dominik Schlechtweg, Sean Papay, Sabine Schulte im Walde

    Abstract: We present the results of our system for SemEval-2020 Task 1 that exploits a commonly used lexical semantic change detection model based on Skip-Gram with Negative Sampling. Our system focuses on Vector Initialization (VI) alignment, compares VI to the currently top-ranking models for Subtask 2 and demonstrates that these can be outperformed if we optimize VI dimensionality. We demonstrate that di… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

  15. arXiv:2001.03216  [pdf, ps, other

    cs.CL

    Simulating Lexical Semantic Change from Sense-Annotated Data

    Authors: Dominik Schlechtweg, Sabine Schulte im Walde

    Abstract: We present a novel procedure to simulate lexical semantic change from synchronic sense-annotated data, and demonstrate its usefulness for assessing lexical semantic change detection models. The induced dataset represents a stronger correspondence to empirically observed lexical semantic change than previous synthetic datasets, because it exploits the intimate relationship between synchronic polyse… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: EvoLang, 8 pages

  16. arXiv:1909.00412  [pdf, other

    cs.CL

    You Shall Know a User by the Company It Keeps: Dynamic Representations for Social Media Users in NLP

    Authors: Marco Del Tredici, Diego Marcheggiani, Sabine Schulte im Walde, Raquel Fernández

    Abstract: Information about individuals can help to better understand what they say, particularly in social media where texts are short. Current approaches to modelling social media users pay attention to their social connections, but exploit this information in a static way, treating all connections uniformly. This ignores the fact, well known in sociolinguistics, that an individual may be part of several… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Comments: To appear in Proceeding of EMNLP 2019

  17. arXiv:1906.02979  [pdf, ps, other

    cs.CL

    A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains

    Authors: Dominik Schlechtweg, Anna Hätty, Marco del Tredici, Sabine Schulte im Walde

    Abstract: We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

    Comments: ACL 2019, 9 pages

  18. arXiv:1906.02479  [pdf, other

    cs.CL

    Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling

    Authors: Dominik Schlechtweg, Cennet Oguz, Sabine Schulte im Walde

    Abstract: We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our fi… ▽ More

    Submitted 7 June, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: BlackboxNLP 2019, 6 pages

  19. arXiv:1806.04381  [pdf, other

    cs.CL

    Projecting Embeddings for Domain Adaptation: Joint Modeling of Sentiment Analysis in Diverse Domains

    Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

    Abstract: Domain adaptation for sentiment analysis is challenging due to the fact that supervised classifiers are very sensitive to changes in domain. The two most prominent approaches to this problem are structural correspondence learning and autoencoders. However, they either require long training times or suffer greatly on highly divergent domains. Inspired by recent advances in cross-lingual sentiment a… ▽ More

    Submitted 13 June, 2018; v1 submitted 12 June, 2018; originally announced June 2018.

    Comments: Accepted to COLING 2018

  20. arXiv:1805.09016  [pdf, other

    cs.CL

    Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages

    Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

    Abstract: Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE)… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: Accepted to ACL 2018 (Long Papers)

  21. arXiv:1804.06719  [pdf, ps, other

    cs.CL

    Distribution-based Prediction of the Degree of Grammaticalization for German Prepositions

    Authors: Dominik Schlechtweg, Sabine Schulte im Walde

    Abstract: We test the hypothesis that the degree of grammaticalization of German prepositions correlates with their corpus-based contextual dispersion measured by word entropy. We find that there is indeed a moderate correlation for entropy, but a stronger correlation for frequency and number of context types.

    Submitted 14 April, 2018; originally announced April 2018.

    Comments: 2 pages, EvoLang

  22. arXiv:1804.06517  [pdf, other

    cs.CL

    Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic Change

    Authors: Dominik Schlechtweg, Sabine Schulte im Walde, Stefanie Eckmann

    Abstract: We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: 5 pages, NAACL

  23. arXiv:1804.05388  [pdf, other

    cs.CL

    Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

    Authors: Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

    Abstract: We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co… ▽ More

    Submitted 19 April, 2018; v1 submitted 15 April, 2018; originally announced April 2018.

    Comments: The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2018)

  24. arXiv:1709.04219  [pdf, other

    cs.CL cs.AI

    Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets

    Authors: Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde

    Abstract: There has been a good amount of progress in sentiment analysis over the past 10 years, including the proposal of new methods and the creation of benchmark datasets. In some papers, however, there is a tendency to compare models only on one or two datasets, either because of time restraints or because the model is tailored to a specific task. Accordingly, it is hard to understand how well a certain… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

    Comments: Presented at WASSA 2017

    Journal ref: In Proceedings of WASSA (2017). 2 - 12

  25. arXiv:1707.07273  [pdf, other

    cs.CL

    Hierarchical Embeddings for Hypernymy Detection and Directionality

    Authors: Kim Anh Nguyen, Maximilian Köper, Sabine Schulte im Walde, Ngoc Thang Vu

    Abstract: We present a novel neural model HyperVec to learn hierarchical embeddings for hypernymy detection and directionality. While previous embeddings have shown limitations on prototypical hypernyms, HyperVec represents an unsupervised measure where embeddings are learned in a specific order and capture the hypernym$-$hyponym distributional hierarchy. Moreover, our model is able to generalize over unsee… ▽ More

    Submitted 23 July, 2017; originally announced July 2017.

    Comments: 11 pages, accepted as long paper at EMNLP 2017

  26. arXiv:1706.04971  [pdf, other

    cs.CL

    German in Flux: Detecting Metaphoric Change via Word Entropy

    Authors: Dominik Schlechtweg, Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde, Daniel Hole

    Abstract: This paper explores the information-theoretic measure entropy to detect metaphoric change, transferring ideas from hypernym detection to research on language change. We also build the first diachronic test set for German as a standard for metaphoric change annotation. Our model shows high performance, is unsupervised, language-independent and generalizable to other processes of semantic change.

    Submitted 15 June, 2017; originally announced June 2017.

    Comments: CoNLL 2017. 9 pages

  27. arXiv:1701.02962  [pdf, other

    cs.CL

    Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network

    Authors: Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

    Abstract: Distinguishing between antonyms and synonyms is a key task to achieve high performance in NLP systems. While they are notoriously difficult to distinguish by distributional co-occurrence models, pattern-based methods have proven effective to differentiate between the relations. In this paper, we present a novel neural network model AntSynNET that exploits lexico-syntactic patterns from syntactic p… ▽ More

    Submitted 11 January, 2017; originally announced January 2017.

    Comments: EACL 2017, 10 pages

    Journal ref: EACL2017

  28. arXiv:1610.01874  [pdf, other

    cs.CL

    Neural-based Noise Filtering from Word Embeddings

    Authors: Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

    Abstract: Word embeddings have been demonstrated to benefit NLP tasks impressively. Yet, there is room for improvement in the vector representations, because current word embeddings typically contain unnecessary information, i.e., noise. We propose two novel models to improve word embeddings by unsupervised learning, in order to yield word denoising embeddings. The word denoising embeddings are obtained by… ▽ More

    Submitted 6 October, 2016; originally announced October 2016.

    Comments: 9 pages, 4 figures, COLING 2016

  29. arXiv:1605.07766  [pdf, other

    cs.CL

    Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction

    Authors: Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

    Abstract: We propose a novel vector representation that integrates lexical contrast into distributional vectors and strengthens the most salient features for determining degrees of word similarity. The improved vectors significantly outperform standard models and distinguish antonyms from synonyms with an average precision of 0.66-0.76 across word classes (adjectives, nouns, verbs). Moreover, we integrate t… ▽ More

    Submitted 25 May, 2016; originally announced May 2016.

    Comments: 6 pages, 4 figures, InProc ACL 2016