Skip to main content

Showing 1–9 of 9 results for author: Maronikolakis, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.09752  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models

    Authors: Victor Steinborn, Antonis Maronikolakis, Hinrich Schütze

    Abstract: In efforts to keep up with the rapid progress and use of large language models, gender bias research is becoming more prevalent in NLP. Non-English bias research, however, is still in its infancy with most work focusing on English. In our work, we study how grammatical gender bias relating to politeness levels manifests in Japanese and Korean language models. Linguistic studies in these languages… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  2. arXiv:2304.01890  [pdf, other

    cs.CL cs.AI cs.LG

    Sociocultural knowledge is needed for selection of shots in hate speech detection tasks

    Authors: Antonis Maronikolakis, Abdullatif Köksal, Hinrich Schütze

    Abstract: We introduce HATELEXICON, a lexicon of slurs and targets of hate speech for the countries of Brazil, Germany, India and Kenya, to aid training and interpretability of models. We demonstrate how our lexicon can be used to interpret model predictions, showing that models developed to classify extreme speech rely heavily on target words when making predictions. Further, we propose a method to aid sho… ▽ More

    Submitted 17 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  3. arXiv:2210.13985  [pdf, other

    cs.CL cs.CY

    This joke is [MASK]: Recognizing Humor and Offense with Prompting

    Authors: Junze Li, Mengjie Zhao, Yubo Xie, Antonis Maronikolakis, Pearl Pu, Hinrich Schütze

    Abstract: Humor is a magnetic component in everyday human interactions and communications. Computationally modeling humor enables NLP systems to entertain and engage with users. We investigate the effectiveness of prompting, a new transfer learning paradigm for NLP, for humor recognition. We show that prompting performs similarly to finetuning when numerous annotations are available, but gives stellar perfo… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: Transfer Learning for Natural Language Processing Workshop at NeurIPS 2022

  4. arXiv:2205.06621  [pdf, other

    cs.CL cs.AI cs.LG

    Analyzing Hate Speech Data along Racial, Gender and Intersectional Axes

    Authors: Antonis Maronikolakis, Philip Baader, Hinrich Schütze

    Abstract: To tackle the rising phenomenon of hate speech, efforts have been made towards data curation and analysis. When it comes to analysis of bias, previous work has focused predominantly on race. In our work, we further investigate bias in hate speech datasets along racial, gender and intersectional axes. We identify strong bias against African American English (AAE), masculine and AAE+Masculine tweets… ▽ More

    Submitted 18 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: Accepted at "4th Workshop on Gender Bias in Natural Language Processing", NAACL 2022

  5. arXiv:2203.11764  [pdf, other

    cs.CL cs.AI

    Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments

    Authors: Antonis Maronikolakis, Axel Wisiorek, Leah Nann, Haris Jabbar, Sahana Udupa, Hinrich Schuetze

    Abstract: Building on current work on multilingual hate speech (e.g., Ousidhoum et al. (2019)) and hate speech reduction (e.g., Sap et al. (2020)), we present XTREMESPEECH, a new hate speech dataset containing 20,297 social media passages from Brazil, Germany, India and Kenya. The key novelty is that we directly involve the affected communities in collecting and annotating the data - as opposed to giving co… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022 Findings

  6. arXiv:2109.09700  [pdf, other

    cs.CL cs.AI

    BERT Cannot Align Characters

    Authors: Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze

    Abstract: In previous work, it has been shown that BERT can adequately align cross-lingual sentences on the word level. Here we investigate whether BERT can also operate as a char-level aligner. The languages examined are English, Fake-English, German and Greek. We show that the closer two languages are, the better BERT can align them on the character level. BERT indeed works well in English to Fake-English… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    Comments: Second Workshop on Insights from Negative Results, EMNLP 2021

  7. arXiv:2109.05772  [pdf, other

    cs.CL

    Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages

    Authors: Antonis Maronikolakis, Philipp Dufter, Hinrich Schütze

    Abstract: The size of the vocabulary is a central design choice in large pretrained language models, with respect to both performance and memory requirements. Typically, subword tokenization algorithms such as byte pair encoding and WordPiece are used. In this work, we investigate the compatibility of tokenizations for multilingual static and contextualized embedding spaces and propose a measure that reflec… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021 Findings

  8. arXiv:2009.13375  [pdf, other

    cs.CL cs.CY cs.LG

    Identifying Automatically Generated Headlines using Transformers

    Authors: Antonis Maronikolakis, Hinrich Schutze, Mark Stevenson

    Abstract: False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset con… ▽ More

    Submitted 25 April, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: NLP4IF 2021 Proceedings, NAACL 2021

  9. arXiv:2004.13878  [pdf, other

    cs.CL

    Analyzing Political Parody in Social Media

    Authors: Antonis Maronikolakis, Danae Sanchez Villegas, Daniel Preotiuc-Pietro, Nikolaos Aletras

    Abstract: Parody is a figurative device used to imitate an entity for comedic or critical purposes and represents a widespread phenomenon in social media through many popular parody accounts. In this paper, we present the first computational study of parody. We introduce a new publicly available data set of tweets from real politicians and their corresponding parody accounts. We run a battery of supervised… ▽ More

    Submitted 1 May, 2020; v1 submitted 28 April, 2020; originally announced April 2020.