Skip to main content

Showing 1–18 of 18 results for author: Arnold, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.02322  [pdf, other

    cs.LG cs.AI

    A Survey of Transformer Enabled Time Series Synthesis

    Authors: Alexander Sommers, Logan Cummins, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jaboure, Thomas Arnold

    Abstract: Generative AI has received much attention in the image and language domains, with the transformer neural network continuing to dominate the state of the art. Application of these models to time series generation is less explored, however, and is of great utility to machine learning, privacy preservation, and explainability research. The present survey identifies this gap at the intersection of the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2404.14183  [pdf, other

    cs.CL

    SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 tables

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  3. arXiv:2403.15451  [pdf, other

    cs.CL

    Towards Enabling FAIR Dataspaces Using Large Language Models

    Authors: Benedikt T. Arnold, Johannes Theissen-Lipp, Diego Collarana, Christoph Lange, Sandra Geisler, Edward Curry, Stefan Decker

    Abstract: Dataspaces have recently gained adoption across various sectors, including traditionally less digitized domains such as culture. Leveraging Semantic Web technologies helps to make dataspaces FAIR, but their complexity poses a significant challenge to the adoption of dataspaces and increases their cost. The advent of Large Language Models (LLMs) raises the question of how these models can support t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 8 pages. Preprint. Under review

  4. arXiv:2402.11175  [pdf, other

    cs.CL

    M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: The advent of Large Language Models (LLMs) has brought an unprecedented surge in machine-generated text (MGT) across diverse channels. This raises legitimate concerns about its potential misuse and societal implications. The need to identify and differentiate such content from genuine human-generated text is critical in combating disinformation, preserving the integrity of education and scientific… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 29 pages

    Journal ref: ACL 2024 main

  5. arXiv:2305.14902  [pdf, other

    cs.CL

    M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection

    Authors: Yuxia Wang, Jonibek Mansurov, Petar Ivanov, **yan Su, Artem Shelmanov, Akim Tsvigun, Chenxi Whitehouse, Osama Mohammed Afzal, Tarek Mahmoud, Toru Sasaki, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

    Abstract: Large language models (LLMs) have demonstrated remarkable capability to generate fluent responses to a wide variety of user queries. However, this has also raised concerns about the potential misuse of such texts in journalism, education, and academia. In this study, we strive to create automated systems that can detect machine-generated texts and pinpoint potential misuse. We first introduce a la… ▽ More

    Submitted 9 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 41 pages

  6. arXiv:2208.10400  [pdf, other

    cs.CL cs.CR

    DP-Rewrite: Towards Reproducibility and Transparency in Differentially Private Text Rewriting

    Authors: Timour Igamberdiev, Thomas Arnold, Ivan Habernal

    Abstract: Text rewriting with differential privacy (DP) provides concrete theoretical guarantees for protecting the privacy of individuals in textual documents. In practice, existing systems may lack the means to validate their privacy-preserving claims, leading to problems of transparency and reproducibility. We introduce DP-Rewrite, an open-source framework for differentially private text rewriting which… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Accepted at COLING 2022

  7. Synthetic-to-Real Domain Adaptation using Contrastive Unpaired Translation

    Authors: Benedikt T. Imbusch, Max Schwarz, Sven Behnke

    Abstract: The usefulness of deep learning models in robotics is largely dependent on the availability of training data. Manual annotation of training data is often infeasible. Synthetic data is a viable alternative, but suffers from domain gap. We propose a multi-step method to obtain training data without manual annotation effort: From 3D object meshes, we generate images using a modern synthesis pipeline.… ▽ More

    Submitted 28 June, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Journal ref: 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), 2022, pp. 595-602

  8. arXiv:2009.02242  [pdf, other

    cs.HC

    Visualizing a Large Spatiotemporal Collection of Historic Photography with a Generous Interface

    Authors: Taylor Arnold, Nathaniel Ayers, Justin Madron, Robert Nelson, Lauren Tilton

    Abstract: Museums, libraries, and other cultural institutions continue to prioritize and build web-based visualization systems that increase access and discovery to digitized archives. Prominent examples exist that illustrate impressive visualizations of a particular feature of a collection. For example, interactive maps showing geographic spread or timelines capturing the temporal aspects of collections. B… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: Presented at 5th Workshop on Visualization for the Digital Humanities

  9. arXiv:1902.01320  [pdf

    cs.RO cs.HC

    When Exceptions are the Norm: Exploring the Role of Consent in HRI

    Authors: Vasanth Sarathy, Thomas Arnold, Matthias Scheutz

    Abstract: HRI researchers have made major strides in develo** robotic architectures that are capable of reading a limited set of social cues and producing behaviors that enhance their likeability and feeling of comfort amongst humans. However, the cues in these models are fairly direct and the interactions largely dyadic. To capture the normative qualities of interaction more robustly, we propose consent… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  10. arXiv:1807.02572  [pdf, ps, other

    cs.AI cs.CY

    Quasi-Dilemmas for Artificial Moral Agents

    Authors: Daniel Kasenberg, Vasanth Sarathy, Thomas Arnold, Matthias Scheutz, Tom Williams

    Abstract: In this paper we describe moral quasi-dilemmas (MQDs): situations similar to moral dilemmas, but in which an agent is unsure whether exploring the plan space or the world may reveal a course of action that satisfies all moral requirements. We argue that artificial moral agents (AMAs) should be built to handle MQDs (in particular, by exploring the plan space rather than immediately accepting the in… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Accepted to the International Conference on Robot Ethics and Standards (ICRES), 2018

  11. arXiv:1806.11183  [pdf, other

    cs.CL

    Cross-Discourse and Multilingual Exploration of Textual Corpora with the DualNeighbors Algorithm

    Authors: Taylor Arnold, Lauren Tilton

    Abstract: Word choice is dependent on the cultural context of writers and their subjects. Different words are used to describe similar actions, objects, and features based on factors such as class, race, gender, geography and political affinity. Exploratory techniques based on locating and counting words may, therefore, lead to conclusions that reinforce culturally inflected boundaries. We offer a new metho… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

    Comments: Chosen for oral presentation at 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2018)

  12. arXiv:1806.11099  [pdf, ps, other

    cs.CL

    Predicting CEFRL levels in learner English on the basis of metrics and full texts

    Authors: Taylor Arnold, Nicolas Ballier, Thomas Gaillat, Paula Lissòn

    Abstract: This paper analyses the contribution of language metrics and, potentially, of linguistic structures, to classify French learners of English according to levels of the Common European Framework of Reference for Languages (CEFRL). The purpose is to build a model for the prediction of learner levels as a function of language complexity features. We used the EFCAMDAT corpus, a database of one million… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

    Comments: Conference paper presented at Conférence sur l'Apprentissage Automatique (CAp) 2018

  13. arXiv:1703.09570  [pdf

    cs.CL stat.CO

    A Tidy Data Model for Natural Language Processing using cleanNLP

    Authors: Taylor Arnold

    Abstract: The package cleanNLP provides a set of fast tools for converting a textual corpus into a set of normalized tables. The underlying natural language processing pipeline utilizes Stanford's CoreNLP library, exposing a number of annotation tasks for text written in English, French, German, and Spanish. Annotators include tokenization, part of speech tagging, named entity recognition, entity linking, s… ▽ More

    Submitted 3 May, 2018; v1 submitted 26 March, 2017; originally announced March 2017.

    Comments: 20 pages; 4 figures

    Journal ref: The R Journal, 9.2, 248-267 (2017)

  14. arXiv:1602.03814  [pdf, other

    cs.RO cs.AI cs.HC

    Enabling Basic Normative HRI in a Cognitive Robotic Architecture

    Authors: Vasanth Sarathy, Jason R. Wilson, Thomas Arnold, Matthias Scheutz

    Abstract: Collaborative human activities are grounded in social and moral norms, which humans consciously and subconsciously use to guide and constrain their decision-making and behavior, thereby strengthening their interactions and preventing emotional and physical harm. This type of norm-based processing is also critical for robots in many human-robot interaction scenarios (e.g., when hel** elderly and… ▽ More

    Submitted 11 February, 2016; originally announced February 2016.

    Comments: Presented at "2nd Workshop on Cognitive Architectures for Social Human-Robot Interaction 2016 (arXiv:1602.01868)"

    Report number: CogArch4sHRI/2016/04

  15. arXiv:1510.00755  [pdf, other

    stat.CO cs.DS

    Sparse Density Representations for Simultaneous Inference on Large Spatial Datasets

    Authors: Taylor Arnold

    Abstract: Large spatial datasets often represent a number of spatial point processes generated by distinct entities or classes of events. When crossed with covariates, such as discrete time buckets, this can quickly result in a data set with millions of individual density estimates. Applications that require simultaneous access to a substantial subset of these estimates become resource constrained when dens… ▽ More

    Submitted 2 October, 2015; originally announced October 2015.

    Comments: 9 pages, 3 figures, 5 tables

  16. arXiv:1510.00041  [pdf, other

    stat.CO cs.PF

    iotools: High-Performance I/O Tools for R

    Authors: Taylor Arnold, Michael Kane, Simon Urbanek

    Abstract: The iotools package provides a set of tools for Input/Output (I/O) intensive datasets processing in R (R Core Team, 2014). Efficent parsing methods are included which minimize copying and avoid the use of intermediate string representations whenever possible. Functions for applying chunk-wise operations allow for computing on streaming input as well as arbitrarily large files. We present a set of… ▽ More

    Submitted 7 April, 2016; v1 submitted 30 September, 2015; originally announced October 2015.

    Comments: 8 pages, 2 figures

    MSC Class: 03-04

  17. arXiv:1506.05158  [pdf, other

    cs.DB cs.DC

    An Entropy Maximizing Geohash for Distributed Spatiotemporal Database Indexing

    Authors: Taylor Arnold

    Abstract: We present a modification of the standard geohash algorithm based on maximum entropy encoding in which the data volume is approximately constant for a given hash prefix length. Distributed spatiotemporal databases, which typically require interleaving spatial and temporal elements into a single key, reap large benefits from a balanced geohash by creating a consistent ratio between spatial and temp… ▽ More

    Submitted 16 June, 2015; originally announced June 2015.

    Comments: 12 pages, 4 figures

  18. arXiv:1405.3222  [pdf, other

    stat.CO cs.LG stat.ML

    Efficient Implementations of the Generalized Lasso Dual Path Algorithm

    Authors: Taylor Arnold, Ryan Tibshirani

    Abstract: We consider efficient implementations of the generalized lasso dual path algorithm of Tibshirani and Taylor (2011). We first describe a generic approach that covers any penalty matrix D and any (full column rank) matrix X of predictor variables. We then describe fast implementations for the special cases of trend filtering problems, fused lasso problems, and sparse fused lasso problems, both with… ▽ More

    Submitted 3 November, 2014; v1 submitted 13 May, 2014; originally announced May 2014.