The Sensitivity of Word Embeddings-based Author Detection Models to Semantic-preserving Adversarial Perturbations
Authors:
Jeremiah Duncan,
Fabian Fallas,
Chris Gropp,
Emily Herron,
Maria Mahbub,
Paula Olaya,
Eduardo Ponce,
Tabitha K. Samuel,
Daniel Schultz,
Sudarshan Srinivasan,
Maofeng Tang,
Viktor Zenkov,
Quan Zhou,
Edmon Begoli
Abstract:
Authorship analysis is an important subject in the field of natural language processing. It allows the detection of the most likely writer of articles, news, books, or messages. This technique has multiple uses in tasks related to authorship attribution, detection of plagiarism, style analysis, sources of misinformation, etc. The focus of this paper is to explore the limitations and sensitiveness…
▽ More
Authorship analysis is an important subject in the field of natural language processing. It allows the detection of the most likely writer of articles, news, books, or messages. This technique has multiple uses in tasks related to authorship attribution, detection of plagiarism, style analysis, sources of misinformation, etc. The focus of this paper is to explore the limitations and sensitiveness of established approaches to adversarial manipulations of inputs. To this end, and using those established techniques, we first developed an experimental frame-work for author detection and input perturbations. Next, we experimentally evaluated the performance of the authorship detection model to a collection of semantic-preserving adversarial perturbations of input narratives. Finally, we compare and analyze the effects of different perturbation strategies, input and model configurations, and the effects of these on the author detection model.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
Building Containerized Environments for Reproducibility and Traceability of Scientific Workflows
Authors:
Paula Olaya,
Jay Lofstead,
Michela Taufer
Abstract:
Scientists rely on simulations to study natural phenomena. Trusting the simulation results is vital to develop sciences in any field. One approach to build trust is to ensure the reproducibility and traceability of the simulations through the annotation of executions at the system-level; by the generation of record trails of data moving through the simulation workflow. In this work, we present a s…
▽ More
Scientists rely on simulations to study natural phenomena. Trusting the simulation results is vital to develop sciences in any field. One approach to build trust is to ensure the reproducibility and traceability of the simulations through the annotation of executions at the system-level; by the generation of record trails of data moving through the simulation workflow. In this work, we present a system-level solution that leverages the intrinsic characteristics of containers (i.e., portability, isolation, encapsulation, and unique identifiers). Our solution consists of a containerized environment capable to annotate workflows, capture provenance metadata, and build record trails. We assess our environment on four different workflows and measure containerization costs in terms of time and space. Our solution, built with a tolerable time and space overhead, enables transparent and automatic provenance metadata collection and access, an easy-to-read record trail, and tight connections between data and metadata.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.