-
Improving Interpretability and Robustness for the Detection of AI-Generated Images
Authors:
Tatiana Gaintseva,
Laida Kushnareva,
German Magai,
Irina Piontkovskaya,
Sergey Nikolenko,
Martin Benning,
Serguei Barannikov,
Gregory Slabaugh
Abstract:
With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on froz…
▽ More
With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection and use it in our evaluation; we believe this dataset will help boost further research. The dataset and code are provided as a supplement.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
AI-generated text boundary detection with RoFT
Authors:
Laida Kushnareva,
Tatiana Gaintseva,
German Magai,
Serguei Barannikov,
Dmitry Abulkhanov,
Kristian Kuznetsov,
Eduard Tulchinskii,
Irina Piontkovskaya,
Sergey Nikolenko
Abstract:
Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt sta…
▽ More
Due to the rapid development of large language models, people increasingly often encounter texts that may start as written by a human but continue as machine-generated. Detecting the boundary between human-written and machine-generated parts of such texts is a challenging problem that has not received much attention in literature. We attempt to bridge this gap and examine several ways to adapt state of the art artificial text detection classifiers to the boundary detection setting. We push all detectors to their limits, using the Real or Fake text benchmark that contains short texts on several topics and includes generations of various language models. We use this diversity to deeply examine the robustness of all detectors in cross-domain and cross-model settings to provide baselines and insights for future research. In particular, we find that perplexity-based approaches to boundary detection tend to be more robust to peculiarities of domain-specific data than supervised fine-tuning of the RoBERTa model; we also find which features of the text confuse boundary detection algorithms and negatively influence their performance in cross-domain settings.
△ Less
Submitted 2 April, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts
Authors:
Eduard Tulchinskii,
Kristian Kuznetsov,
Laida Kushnareva,
Daniil Cherniavskii,
Serguei Barannikov,
Irina Piontkovskaya,
Sergey Nikolenko,
Evgeny Burnaev
Abstract:
Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and c…
▽ More
Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant for human-written texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings for a given text sample. We show that the average intrinsic dimensionality of fluent texts in a natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.
△ Less
Submitted 31 October, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Topological Data Analysis for Speech Processing
Authors:
Eduard Tulchinskii,
Kristian Kuznetsov,
Laida Kushnareva,
Daniil Cherniavskii,
Serguei Barannikov,
Irina Piontkovskaya,
Sergey Nikolenko,
Evgeny Burnaev
Abstract:
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we…
▽ More
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction. Appendices, an introduction to TDA, and other additional materials are available here - https://topohubert.github.io/speech-topology-webpages/
△ Less
Submitted 6 June, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Betti numbers of attention graphs is all you really need
Authors:
Laida Kushnareva,
Dmitri Piontkovski,
Irina Piontkovskaya
Abstract:
We apply methods of topological analysis to the attention graphs, calculated on the attention heads of the BERT model ( arXiv:1810.04805v2 ). Our research shows that the classifier built upon basic persistent topological features (namely, Betti numbers) of the trained neural network can achieve classification results on par with the conventional classification method. We show the relevance of such…
▽ More
We apply methods of topological analysis to the attention graphs, calculated on the attention heads of the BERT model ( arXiv:1810.04805v2 ). Our research shows that the classifier built upon basic persistent topological features (namely, Betti numbers) of the trained neural network can achieve classification results on par with the conventional classification method. We show the relevance of such topological text representation on three text classification benchmarks. For the best of our knowledge, it is the first attempt to analyze the topology of an attention-based neural network, widely used for Natural Language Processing.
△ Less
Submitted 5 July, 2022;
originally announced July 2022.
-
Acceptability Judgements via Examining the Topology of Attention Maps
Authors:
Daniil Cherniavskii,
Eduard Tulchinskii,
Vladislav Mikhailov,
Irina Proskurina,
Laida Kushnareva,
Ekaterina Artemova,
Serguei Barannikov,
Irina Piontkovskaya,
Dmitri Piontkovski,
Evgeny Burnaev
Abstract:
The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be effi…
▽ More
The role of the attention mechanism in encoding linguistic knowledge has received special interest in NLP. However, the ability of the attention heads to judge the grammatical acceptability of a sentence has been underexplored. This paper approaches the paradigm of acceptability judgments with topological data analysis (TDA), showing that the geometric properties of the attention graph can be efficiently exploited for two standard practices in linguistics: binary judgments and linguistic minimal pairs. Topological features enhance the BERT-based acceptability classifier scores by $8$%-$24$% on CoLA in three languages (English, Italian, and Swedish). By revealing the topological discrepancy between attention maps of minimal pairs, we achieve the human-level performance on the BLiMP benchmark, outperforming nine statistical and Transformer LM baselines. At the same time, TDA provides the foundation for analyzing the linguistic functions of attention heads and interpreting the correspondence between the graph features and grammatical phenomena.
△ Less
Submitted 23 October, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Artificial Text Detection via Examining the Topology of Attention Maps
Authors:
Laida Kushnareva,
Daniil Cherniavskii,
Vladislav Mikhailov,
Ekaterina Artemova,
Serguei Barannikov,
Alexander Bernstein,
Irina Piontkovskaya,
Dmitri Piontkovski,
Evgeny Burnaev
Abstract:
The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose…
▽ More
The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.
△ Less
Submitted 28 April, 2022; v1 submitted 10 September, 2021;
originally announced September 2021.
-
Category-Learning with Context-Augmented Autoencoder
Authors:
Denis Kuzminykh,
Laida Kushnareva,
Timofey Grigoryev,
Alexander Zatolokin
Abstract:
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. Biological neural networks are known to solve this problem quite well in unsupervised manner, yet unsupervised artificial neural networks either struggle to do it or require fine tuning for each task individually. We associate this with the fact that a biological brain learns in…
▽ More
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. Biological neural networks are known to solve this problem quite well in unsupervised manner, yet unsupervised artificial neural networks either struggle to do it or require fine tuning for each task individually. We associate this with the fact that a biological brain learns in the context of the relationships between observations, while an artificial network does not. We also notice that, though a naive data augmentation technique can be very useful for supervised learning problems, autoencoders typically fail to generalize transformations from data augmentations. Thus, we believe that providing additional knowledge about relationships between data samples will improve model's capability of finding useful inner data representation. More formally, we consider a dataset not as a manifold, but as a category, where the examples are objects. Two these objects are connected by a morphism, if they actually represent different transformations of the same entity. Following this formalism, we propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network in terms of the hidden representation. We believe that the classification accuracy of a linear classifier on the learned representation is a good metric to measure its interpretability. In our experiments, present approach outperforms $β$-VAE and is comparable with Gaussian-mixture VAE.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.