Skip to main content

Showing 1–11 of 11 results for author: Emmery, C

.
  1. arXiv:2309.06923  [pdf, ps, other

    cs.CL

    Native Language Identification with Big Bird Embeddings

    Authors: Sergey Kramp, Giovanni Cassani, Chris Emmery

    Abstract: Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and transformer-based NLI models have thus far failed to offer effective, practical alternatives. The current work investigates if input size is a limiting factor, and shows that… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  2. arXiv:2304.08891  [pdf, other

    cs.CL

    Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Authors: Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Frédéric Blain, Eva Vanmassenhove, Mirella De Sisto, Chris Emmery, Pieter Spronck

    Abstract: While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able t… ▽ More

    Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to EAMT 2023 (main)

  3. arXiv:2301.04230  [pdf, other

    cs.CL cs.CY

    User-Centered Security in Natural Language Processing

    Authors: Chris Emmery

    Abstract: This dissertation proposes a framework of user-centered security in Natural Language Processing (NLP), and demonstrates how it can improve the accessibility of related research. Accordingly, it focuses on two security domains within NLP with great public interest. First, that of author profiling, which can be employed to compromise online privacy through invasive inferences. Without access and det… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

    Comments: PhD thesis, ISBN 978-94-6458-867-5

  4. arXiv:2207.06839  [pdf, ps, other

    cs.CL

    Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model

    Authors: Chris van der Lee, Thiago Castro Ferreira, Chris Emmery, Travis Wiltshire, Emiel Krahmer

    Abstract: This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text system… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 22 pages (excluding bibliography and appendix)

  5. arXiv:2201.06384  [pdf, other

    cs.CL cs.CY cs.SI

    Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

    Authors: Chris Emmery, Ákos Kádár, Grzegorz Chrupała, Walter Daelemans

    Abstract: A limited amount of studies investigates the role of model-agnostic adversarial behavior in toxic content classification. As toxicity classifiers predominantly rely on lexical cues, (deliberately) creative and evolving language-use can be detrimental to the utility of current corpora and state-of-the-art models when they are deployed for content moderation. The less training data is available, the… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022

  6. arXiv:2109.06105  [pdf, other

    cs.CL cs.AI

    NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender-Neutral Alternatives

    Authors: Eva Vanmassenhove, Chris Emmery, Dimitar Shterionov

    Abstract: Recent years have seen an increasing need for gender-neutral and inclusive language. Within the field of NLP, there are various mono- and bilingual use cases where gender inclusive language is appropriate, if not preferred due to ambiguity or uncertainty in terms of the gender of referents. In this work, we present a rule-based and a neural approach to gender-neutral rewriting for English along wi… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  7. arXiv:2101.11310  [pdf, other

    cs.CL cs.CY

    Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

    Authors: Chris Emmery, Ákos Kádár, Grzegorz Chrupała

    Abstract: Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author's text. Our research proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither data nor target models are accessible. We introduce a… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021

  8. Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

    Authors: Chris Emmery, Ben Verhoeven, Guy De Pauw, Gilles Jacobs, Cynthia Van Hee, Els Lefever, Bart Desmet, Véronique Hoste, Walter Daelemans

    Abstract: The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datase… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

  9. arXiv:1805.07143  [pdf, other

    cs.CL

    Style Obfuscation by Invariance

    Authors: Chris Emmery, Enrique Manjavacas, Grzegorz Chrupała

    Abstract: The task of obfuscating writing style using sequence models has previously been investigated under the framework of obfuscation-by-transfer, where the input text is explicitly rewritten in another style. These approaches also often lead to major alterations to the semantic content of the input. In this work, we propose obfuscation-by-invariance, and investigate to what extent models trained to be… ▽ More

    Submitted 18 May, 2018; originally announced May 2018.

    Comments: Accepted for presentation at COLING18

  10. Automatic Detection of Cyberbullying in Social Media Text

    Authors: Cynthia Van Hee, Gilles Jacobs, Chris Emmery, Bart Desmet, Els Lefever, Ben Verhoeven, Guy De Pauw, Walter Daelemans, Véronique Hoste

    Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to iden… ▽ More

    Submitted 17 January, 2018; originally announced January 2018.

    Comments: 21 pages, 9 tables, under review

    MSC Class: 68T50 ACM Class: I.2.7; J.4

  11. arXiv:1607.00225  [pdf, other

    cs.CL

    Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

    Authors: Stéphan Tulkens, Chris Emmery, Walter Daelemans

    Abstract: Word embeddings have recently seen a strong increase in interest as a result of strong performance gains on a variety of tasks. However, most of this research also underlined the importance of benchmark datasets, and the difficulty of constructing these for a variety of language-specific tasks. Still, many of the datasets used in these tasks could prove to be fruitful linguistic resources, allowin… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.

    Comments: in LREC 2016