Skip to main content

Showing 1–29 of 29 results for author: Groh, G

.
  1. arXiv:2404.17841  [pdf, other

    cs.CL

    Toxicity Classification in Ukrainian

    Authors: Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, Georg Groh

    Abstract: The task of toxicity detection is still a relevant task, especially in the context of safe and fair LMs development. Nevertheless, labeled binary toxicity classification corpora are not available for all languages, which is understandable given the resource-intensive nature of the annotation process. Ukrainian, in particular, is among the languages lacking such resources. To our knowledge, there h… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted to WOAH, NAACL, 2024. arXiv admin note: text overlap with arXiv:2404.02043

  2. arXiv:2404.06838  [pdf, other

    cs.CL

    Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?

    Authors: Miriam Anschütz, Edoardo Mosca, Georg Groh

    Abstract: Text simplification seeks to improve readability while retaining the original content and meaning. Our study investigates whether pre-trained classifiers also maintain such coherence by comparing their predictions on both original and simplified inputs. We conduct experiments using 11 pre-trained models, including BERT and OpenAI's GPT 3.5, across six datasets spanning three languages. Additionall… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Published at DeTermIt! Workshop at LREC-COLING 2024

  3. arXiv:2404.02043  [pdf, other

    cs.CL cs.AI

    Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

    Authors: Daryna Dementieva, Valeriia Khylenko, Georg Groh

    Abstract: Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classi… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  4. arXiv:2307.13989  [pdf, other

    cs.CL cs.LG

    This is not correct! Negation-aware Evaluation of Language Generation Systems

    Authors: Miriam Anschütz, Diego Miguel Lozano, Georg Groh

    Abstract: Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation eval… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted to INLG 2023

  5. Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

    Authors: Miriam Anschütz, Joshua Oehms, Thomas Wimmer, Bartłomiej Jezierski, Georg Groh

    Abstract: Automatic text simplification systems help to reduce textual information barriers on the internet. However, for languages other than English, only few parallel data to train these systems exists. We propose a two-step approach to overcome this data scarcity issue. First, we fine-tuned language models on a corpus of German Easy Language, a specific style of German. Then, we used these models as dec… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL Findings 2023

  6. arXiv:2305.08636  [pdf, other

    cs.CL cs.AI

    AdamR at SemEval-2023 Task 10: Solving the Class Imbalance Problem in Sexism Detection with Ensemble Learning

    Authors: Adam Rydelek, Daryna Dementieva, Georg Groh

    Abstract: The Explainable Detection of Online Sexism task presents the problem of explainable sexism detection through fine-grained categorisation of sexist cases with three subtasks. Our team experimented with different ways to combat class imbalance throughout the tasks using data augmentation and loss alteration techniques. We tackled the challenge by utilising ensembles of Transformer models trained on… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: One of the top solutions at the SemEval-2023 task "The Explainable Detection of Online Sexism"

  7. arXiv:2305.08625  [pdf, other

    cs.CL cs.AI

    Adam-Smith at SemEval-2023 Task 4: Discovering Human Values in Arguments with Ensembles of Transformer-based Models

    Authors: Daniel Schroter, Daryna Dementieva, Georg Groh

    Abstract: This paper presents the best-performing approach alias "Adam Smith" for the SemEval-2023 Task 4: "Identification of Human Values behind Arguments". The goal of the task was to create systems that automatically identify the values within textual arguments. We train transformer-based models until they reach their loss minimum or f1-score maximum. Ensembling the models by selecting one global decisio… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: The winner of SemEval-2023 Task 4: "Identification of Human Values behind Arguments"

  8. arXiv:2303.03124  [pdf, other

    cs.CL cs.AI

    IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

    Authors: Edoardo Mosca, Daryna Dementieva, Tohid Ebrahim Ajdari, Maximilian Kummeth, Kirill Gringauz, Yutong Zhou, Georg Groh

    Abstract: Interpretability and human oversight are fundamental pillars of deploying complex NLP models into real-world applications. However, applying explainability and human-in-the-loop methods requires technical proficiency. Despite existing toolkits for model understanding and analysis, options to integrate human feedback are still limited. We propose IFAN, a framework for real-time explanation-based in… ▽ More

    Submitted 2 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to AACL 2023 Demonstration systems Track

  9. arXiv:2212.12238  [pdf, other

    cs.CL cs.AI

    From Judgement's Premises Towards Key Points

    Authors: Oren Sultan, Rayen Dhahri, Yauheni Mardan, Tobias Eder, Georg Groh

    Abstract: Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first met… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

  10. arXiv:2210.15377  [pdf, other

    cs.IR cs.AI cs.CL cs.CV cs.LG

    Retrieving Users' Opinions on Social Media with Multimodal Aspect-Based Sentiment Analysis

    Authors: Miriam Anschütz, Tobias Eder, Georg Groh

    Abstract: People post their opinions and experiences on social media, yielding rich databases of end-users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest.… ▽ More

    Submitted 9 January, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 8 pages, 5 figures, published at 2023 IEEE 17th International Conference on Semantic Computing (ICSC)

  11. "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

    Authors: Edoardo Mosca, Shreyash Agarwal, Javier Rando, Georg Groh

    Abstract: Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in computer vision has been carried to develop reliable defense strategies. However, the same issue remains less explored in natural language processing. Our work pres… ▽ More

    Submitted 29 June, 2023; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: ACL 2022

  12. arXiv:2112.03007  [pdf, other

    cs.CL cs.AI

    How to Build Robust FAQ Chatbot with Controllable Question Generator?

    Authors: Yan Pan, Mingyang Ma, Bernhard Pflugfelder, Georg Groh

    Abstract: Many unanswerable adversarial questions fool the question-answer (QA) system with some plausible answers. Building a robust, frequently asked questions (FAQ) chatbot needs a large amount of diverse adversarial examples. Recent question generation methods are ineffective at generating many high-quality and diverse adversarial question-answer pairs from unstructured text. We propose the diversity co… ▽ More

    Submitted 18 November, 2021; originally announced December 2021.

  13. arXiv:2111.02326  [pdf, other

    cs.CL cs.HC cs.LG

    End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis

    Authors: Gerhard Johann Hagerer, David Szabo, Andreas Koch, Maria Luisa Ripoll Dominguez, Christian Widmer, Maximilian Wich, Hannah Danner, Georg Groh

    Abstract: Sentiment analysis is often a crowdsourcing task prone to subjective labels given by many annotators. It is not yet fully understood how the annotation bias of each annotator can be modeled correctly with state-of-the-art methods. However, resolving annotator bias precisely and reliably is the key to understand annotators' labeling behavior and to successfully resolve corresponding individual misc… ▽ More

    Submitted 24 July, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 10 pages, 2 figures, 2 tables, full conference paper, peer-reviewed

    Journal ref: Proceedings of the 3rd International Conference on Natural Language and Speech Processing - ICNLSP 2021

  14. A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining

    Authors: Gerhard Johann Hagerer, Wing Sheung Leung, Qiaoxi Liu, Hannah Danner, Georg Groh

    Abstract: User-generated content from social media is produced in many languages, making it technically challenging to compare the discussed themes from one domain across different cultures and regions. It is relevant for domains in a globalized world, such as market research, where people from two nations and markets might have different requirements for a product. We propose a simple, modern, and effectiv… ▽ More

    Submitted 24 July, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 10 pages, 2 tables, 5 figures, full paper, peer-reviewed, published at KDIR/IC3k 2021 conference

    Journal ref: Proceedings of the 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR 2021

  15. An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder

    Authors: Gerhard Johann Hagerer, Laura Lahesoo, Miriam Anschütz, Stephan Krusche, Georg Groh

    Abstract: Commonly, introductory programming courses in higher education institutions have hundreds of participating students eager to learn to program. The manual effort for reviewing the submitted source code and for providing feedback can no longer be managed. Manually reviewing the submitted homework can be subjective and unfair, particularly if many tutors are responsible for grading. Different autogra… ▽ More

    Submitted 24 July, 2023; v1 submitted 28 October, 2021; originally announced October 2021.

    Comments: Accepted full paper article on IEEE ITHET 2021

    Journal ref: ITHET-2021

  16. arXiv:2110.10575  [pdf, other

    cs.CL

    SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining

    Authors: Gerhard Johann Hagerer, Martin Kirchhoff, Hannah Danner, Robert Pesch, Mainak Ghosh, Archishman Roy, Jiaxi Zhao, Georg Groh

    Abstract: Recent research in opinion mining proposed word embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their corre… ▽ More

    Submitted 24 July, 2023; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Demo paper accepted for publication on RANLP 2021; 8 pages, 5 figures, 1 table

    Journal ref: RANLP-2021

  17. arXiv:2109.07346  [pdf, other

    cs.CL

    Introducing an Abusive Language Classification Framework for Telegram to Investigate the German Hater Community

    Authors: Maximilian Wich, Adrian Gorniak, Tobias Eder, Daniel Bartmann, Burak Enes Çakici, Georg Groh

    Abstract: Since traditional social media platforms continue to ban actors spreading hate speech or other forms of abusive languages (a process known as deplatforming), these actors migrate to alternative platforms that do not moderate users content. One popular platform relevant for the German hater community is Telegram for which limited research efforts have been made so far. This study aims to develop a… ▽ More

    Submitted 24 November, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

  18. arXiv:2106.15498  [pdf, other

    cs.LG cs.CL cs.IR

    Classification of Consumer Belief Statements From Social Media

    Authors: Gerhard Johann Hagerer, Wenbin Le, Hannah Danner, Georg Groh

    Abstract: Social media offer plenty of information to perform market research in order to meet the requirements of customers. One way how this research is conducted is that a domain expert gathers and categorizes user-generated content into a complex and fine-grained class structure. In many of such cases, little data meets complex annotations. It is not yet fully understood how this can be leveraged succes… ▽ More

    Submitted 24 July, 2023; v1 submitted 29 June, 2021; originally announced June 2021.

  19. arXiv:2105.01466  [pdf, other

    cs.CL cs.MM

    GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

    Authors: Lukas Stappen, Jason Thies, Gerhard Hagerer, Björn W. Schuller, Georg Groh

    Abstract: To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed. Existing work tends to apply topic models on written text datasets. In this paper, we propose a topic extractor on video transcripts. Exploiting neural word embeddings through graph-based clustering, we aim to improve usability and semantic coherence. Unlike… ▽ More

    Submitted 28 October, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: JT and LS contributed equally to this work

  20. arXiv:1902.07636  [pdf, ps, other

    cs.SI

    Contributive Social Capital Extraction From Different Types of Online Data Sources

    Authors: Sebastian Schams, Georg Groh

    Abstract: It is a recurring problem of online communication that the properties of unknown people are hard to assess. This may lead to various issues such as the spread of `fake news' from untrustworthy sources. In sociology the sum of (social) resources available to a person through their social network is often described as social capital. In this article, we look at social capital from a different angle.… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

    Comments: 44 pages

  21. arXiv:1808.03926  [pdf, other

    cs.CL cs.LG

    Sequence Labeling: A Practical Approach

    Authors: Adnan Akhundov, Dietrich Trautmann, Georg Groh

    Abstract: We take a practical approach to solving sequence labeling problem assuming unavailability of domain expertise and scarcity of informational and computational resources. To this end, we utilize a universal end-to-end Bi-LSTM-based neural sequence labeling model applicable to a wide range of NLP tasks and languages. The model combines morphological, semantic, and structural cues extracted from data… ▽ More

    Submitted 12 August, 2018; originally announced August 2018.

    Comments: For the source code and detailed experimental results, see http://github.com/aakhundov/sequence-labeling

  22. arXiv:1607.02062  [pdf, other

    cs.CY cs.IR cs.SI

    Estimating the Dissemination of Social and Mobile Search in Categories of Information Needs Using Websites as Proxies

    Authors: Christoph Fuchs, Akash Nayyar, Ruth Nussbaumer, Georg Groh

    Abstract: With the increasing popularity of social means to satisfy information needs using Social Media (e.g., Social Media Question Asking, SMQA) or Social Information Retrieval approaches, this paper tries to identify types of information needs which are inherently social and therefore better suited for those techniques. We describe an experiment where prominent websites from various content categories a… ▽ More

    Submitted 7 July, 2016; originally announced July 2016.

  23. arXiv:1506.07763  [pdf, other

    cs.SI physics.soc-ph

    Mobile Homophily and Social Location Prediction

    Authors: Halgurt Bapierre, Chakajkla Jesdabodi, Georg Groh

    Abstract: The mobility behavior of human beings is predictable to a varying degree e.g. depending on the traits of their personality such as the trait extraversion - introversion: the mobility of introvert users may be more dominated by routines and habitual movement patterns, resulting in a more predictable mobility behavior on the basis of their own location history while, in contrast, extrovert users get… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

  24. arXiv:1409.8028  [pdf, other

    cs.SI cs.MA

    Reaching Consensus Among Mobile Agents: A Distributed Protocol for the Detection of Social Situations

    Authors: Daniel Raumer, Christoph Fuchs, Georg Groh

    Abstract: Physical social encounters are governed by a set of socio-psychological behavioral rules with a high degree of uniform validity. Past research has shown how these rules or the resulting properties of the encounters (e.g. the geometry of interaction) can be used for algorithmic detection of social interaction. In this paper, we present a distributed protocol to gain a common understanding of the ex… ▽ More

    Submitted 29 September, 2014; originally announced September 2014.

    Comments: 16 pages, 4 figures, 1 table

  25. arXiv:1406.6012  [pdf, other

    cs.MM cs.HC

    Designing Sound Collaboratively - Perceptually Motivated Audio Synthesis

    Authors: Niklas Klügel, Timo Becker, Georg Groh

    Abstract: In this contribution, we will discuss a prototype that allows a group of users to design sound collaboratively in real time using a multi-touch tabletop. We make use of a machine learning method to generate a map** from perceptual audio features to synthesis parameters. This map** is then used for visualization and interaction. Finally, we discuss the results of a comparative evaluation study.

    Submitted 23 June, 2014; originally announced June 2014.

    Comments: Extended version of submission to conference proceedings

  26. arXiv:1402.2427  [pdf, other

    cs.SI cs.CL cs.IR

    An evaluation of keyword extraction from online communication for the characterisation of social relations

    Authors: Jan Hauffa, Tobias Lichtenberg, Georg Groh

    Abstract: The set of interpersonal relationships on a social network service or a similar online community is usually highly heterogenous. The concept of tie strength captures only one aspect of this heterogeneity. Since the unstructured text content of online communication artefacts is a salient source of information about a social relationship, we investigate the utility of keywords extracted from the mes… ▽ More

    Submitted 11 February, 2014; originally announced February 2014.

  27. arXiv:1209.2868  [pdf, other

    cs.SI cs.IR physics.soc-ph

    Spatio-Temporal Small Worlds for Decentralized Information Retrieval in Social Networking

    Authors: Georg Groh, Florian Straub, Benjamin Koster

    Abstract: We discuss foundations and options for alternative, agent-based information retrieval (IR) approaches in Social Networking, especially Decentralized and Mobile Social Networking scenarios. In addition to usual semantic contexts, these approaches make use of long-term social and spatio-temporal contexts in order to satisfy conscious as well as unconscious information needs according to Human IR heu… ▽ More

    Submitted 13 September, 2012; originally announced September 2012.

  28. arXiv:1107.5654  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Interest-Based vs. Social Person-Recommenders in Social Networking Platforms

    Authors: Georg Groh, Michele Brocco, Andreas Kleemann

    Abstract: Social network based approaches to person recommendations are compared to interest based approaches with the help of an empirical study on a large German social networking platform. We assess and compare the performance of different basic variants of the two approaches by precision / recall based performance with respect to reproducing known friendship relations and by an empirical questionnaire b… ▽ More

    Submitted 28 July, 2011; originally announced July 2011.

  29. arXiv:1104.2196  [pdf

    cs.IR cs.SI physics.soc-ph

    Space and Time as a Primary Classification Criterion for Information Retrieval in Distributed Social Networking

    Authors: Georg Groh, Florian Straub, Andreas Donaubauer, Benjamin Koster

    Abstract: We discuss in a compact way how the implicit relations between spatiotemporal relatedness of information items, spatiotemporal relatedness of users, social relatedness of users and semantic relatedness of information items may be exploited for an information retrieval architecture that operates along the lines of human ways of searching. The decentralized and agent oriented architecture mirrors em… ▽ More

    Submitted 12 April, 2011; originally announced April 2011.

    Comments: Short Technical Report