-
From Judgement's Premises Towards Key Points
Authors:
Oren Sultan,
Rayen Dhahri,
Yauheni Mardan,
Tobias Eder,
Georg Groh
Abstract:
Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first met…
▽ More
Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first method is an adaptation to an existing state-of-the-art method, and the two others are new methods that we developed from scratch. We present our methods and examples of their outputs, as well a comparison between them. The full evaluation of our results is done in the matching task -- match between the generated KPs to arguments (premises).
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
Retrieving Users' Opinions on Social Media with Multimodal Aspect-Based Sentiment Analysis
Authors:
Miriam Anschütz,
Tobias Eder,
Georg Groh
Abstract:
People post their opinions and experiences on social media, yielding rich databases of end-users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest.…
▽ More
People post their opinions and experiences on social media, yielding rich databases of end-users' sentiments. This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest. Then, the pipeline uses image retrieval to find all images showing similar content and applies aspect-based sentiment analysis to outline users' opinions about the selected term. As part of an interdisciplinary project between architecture and computer science researchers, an empirical study of Hamburg's Elbphilharmonie was conveyed. Therefore, we selected 300 thousand posts with the hashtag \enquote{\texttt{hamburg}} from the platform Flickr. Image retrieval methods generated a subset of slightly more than 1.5 thousand images displaying the Elbphilharmonie. We found that these posts mainly convey a neutral or positive sentiment towards it. With this pipeline, we suggest a new semantic computing method that offers novel insights into end-users opinions, e.g., for architecture domain experts.
△ Less
Submitted 9 January, 2023; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Introducing an Abusive Language Classification Framework for Telegram to Investigate the German Hater Community
Authors:
Maximilian Wich,
Adrian Gorniak,
Tobias Eder,
Daniel Bartmann,
Burak Enes Çakici,
Georg Groh
Abstract:
Since traditional social media platforms continue to ban actors spreading hate speech or other forms of abusive languages (a process known as deplatforming), these actors migrate to alternative platforms that do not moderate users content. One popular platform relevant for the German hater community is Telegram for which limited research efforts have been made so far. This study aims to develop a…
▽ More
Since traditional social media platforms continue to ban actors spreading hate speech or other forms of abusive languages (a process known as deplatforming), these actors migrate to alternative platforms that do not moderate users content. One popular platform relevant for the German hater community is Telegram for which limited research efforts have been made so far. This study aims to develop a broad framework comprising (i) an abusive language classification model for German Telegram messages and (ii) a classification model for the hatefulness of Telegram channels. For the first part, we use existing abusive language datasets containing posts from other platforms to develop our classification models. For the channel classification model, we develop a method that combines channel-specific content information collected from a topic model with a social graph to predict the hatefulness of channels. Furthermore, we complement these two approaches for hate speech detection with insightful results on the evolution of the hater community on Telegram in Germany. We also propose methods for conducting scalable network analyses for social media platforms to the hate speech research community. As an additional output of this study, we provide an annotated abusive language dataset containing 1,149 annotated Telegram messages.
△ Less
Submitted 24 November, 2021; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Anchor-based Bilingual Word Embeddings for Low-Resource Languages
Authors:
Tobias Eder,
Viktor Hangya,
Alexander Fraser
Abstract:
Good quality monolingual word embeddings (MWEs) can be built for languages which have large amounts of unlabeled text. MWEs can be aligned to bilingual spaces using only a few thousand word translation pairs. For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well. This paper proposes a new approach for building…
▽ More
Good quality monolingual word embeddings (MWEs) can be built for languages which have large amounts of unlabeled text. MWEs can be aligned to bilingual spaces using only a few thousand word translation pairs. For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well. This paper proposes a new approach for building BWEs in which the vector space of the high resource source language is used as a starting point for training an embedding space for the low resource target language. By using the source vectors as anchors the vector spaces are automatically aligned during training. We experiment on English-German, English-Hiligaynon and English-Macedonian. We show that our approach results not only in improved BWEs and bilingual lexicon induction performance, but also in improved target language MWE quality as measured using monolingual word similarity.
△ Less
Submitted 27 July, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Deep semantic gaze embedding and scanpath comparison for expertise classification during OPT viewing
Authors:
Nora Castner,
Thomas Kübler,
Katharina Scheiter,
Juilane Richter,
Thérése Eder,
Fabian Hüttig,
Constanze Keutel,
Enkelejda Kasneci
Abstract:
Modeling eye movement indicative of expertise behavior is decisive in user evaluation. However, it is indisputable that task semantics affect gaze behavior. We present a novel approach to gaze scanpath comparison that incorporates convolutional neural networks (CNN) to process scene information at the fixation level. Image patches linked to respective fixations are used as input for a CNN and the…
▽ More
Modeling eye movement indicative of expertise behavior is decisive in user evaluation. However, it is indisputable that task semantics affect gaze behavior. We present a novel approach to gaze scanpath comparison that incorporates convolutional neural networks (CNN) to process scene information at the fixation level. Image patches linked to respective fixations are used as input for a CNN and the resulting feature vectors provide the temporal and spatial gaze information necessary for scanpath similarity comparison.We evaluated our proposed approach on gaze data from expert and novice dentists interpreting dental radiographs using a local alignment similarity score. Our approach was capable of distinguishing experts from novices with 93% accuracy while incorporating the image semantics. Moreover, our scanpath comparison using image patch features has the potential to incorporate task semantics from a variety of tasks
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
ANANAS - A Framework For Analyzing Android Applications
Authors:
Thomas Eder,
Michael Rodler,
Dieter Vymazal,
Markus Zeilinger
Abstract:
Android is an open software platform for mobile devices with a large market share in the smartphone sector. The openness of the system as well as its wide adoption lead to an increasing amount of malware developed for this platform. ANANAS is an expandable and modular framework for analyzing Android applications. It takes care of common needs for dynamic malware analysis and provides an interface…
▽ More
Android is an open software platform for mobile devices with a large market share in the smartphone sector. The openness of the system as well as its wide adoption lead to an increasing amount of malware developed for this platform. ANANAS is an expandable and modular framework for analyzing Android applications. It takes care of common needs for dynamic malware analysis and provides an interface for the development of plugins. Adaptability and expandability have been main design goals during the development process. An abstraction layer for simple user interaction and phone event simulation is also part of the framework. It allows an analyst to script the required user simulation or phone events on demand or adjust the simulation to his needs. Six plugins have been developed for ANANAS. They represent well known techniques for malware analysis, such as system call hooking and network traffic analysis. The focus clearly lies on dynamic analysis, as five of the six plugins are dynamic analysis methods.
△ Less
Submitted 20 July, 2013;
originally announced July 2013.