Search | arXiv e-print repository

Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors

Authors: Federico Baldassarre, Quentin Debard, Gonzalo Fiz Pontiveros, Tri Kurniawan Wijaya

Abstract: The proliferation of DeepFake technology is a rising challenge in today's society, owing to more powerful and accessible generation methods. To counter this, the research community has developed detectors of ever-increasing accuracy. However, the ability to explain the decisions of such models to users is lacking behind and is considered an accessory in large-scale benchmarks, despite being a cruc… ▽ More The proliferation of DeepFake technology is a rising challenge in today's society, owing to more powerful and accessible generation methods. To counter this, the research community has developed detectors of ever-increasing accuracy. However, the ability to explain the decisions of such models to users is lacking behind and is considered an accessory in large-scale benchmarks, despite being a crucial requirement for the correct deployment of automated tools for content moderation. We attribute the issue to the reliance on qualitative comparisons and the lack of established metrics. We describe a simple set of metrics to evaluate the visual quality and informativeness of explanations of video DeepFake classifiers from a human-centric perspective. With these metrics, we compare common approaches to improve explanation quality and discuss their effect on both classification and explanation performance on the recent DFDC and DFD datasets. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted at BMVC 2022, code repository at https://github.com/baldassarreFe/deepfake-detection

arXiv:2203.05997 [pdf, other]

Towards Self-Supervised Learning of Global and Object-Centric Representations

Authors: Federico Baldassarre, Hossein Azizpour

Abstract: Self-supervision allows learning meaningful representations of natural images, which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset. Regarding the architecture, we confirm the importance… ▽ More Self-supervision allows learning meaningful representations of natural images, which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset. Regarding the architecture, we confirm the importance of competition for attention-based object discovery, where each image patch is exclusively attended by one object. For training, we show that contrastive losses equipped with matching can be applied directly in a latent space, avoiding pixel-based reconstruction. However, such an optimization objective is sensitive to false negatives (recurring objects) and false positives (matching errors). Careful consideration is thus required around data augmentation and negative sample selection. △ Less

Submitted 13 April, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: Published at the ICLR 2022 workshop on Objects, Structure and Causality. Code, datasets, and notebooks are available at https://github.com/baldassarreFe/iclr-osc-22

arXiv:2006.09562 [pdf, other]

Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks

Authors: Federico Baldassarre, Kevin Smith, Josephine Sullivan, Hossein Azizpour

Abstract: Visual relationship detection is fundamental for holistic image understanding. However, the localization and classification of (subject, predicate, object) triplets remain challenging tasks, due to the combinatorial explosion of possible relationships, their long-tailed distribution in natural images, and an expensive annotation process. This paper introduces a novel weakly-supervised method for v… ▽ More Visual relationship detection is fundamental for holistic image understanding. However, the localization and classification of (subject, predicate, object) triplets remain challenging tasks, due to the combinatorial explosion of possible relationships, their long-tailed distribution in natural images, and an expensive annotation process. This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels. A graph neural network is trained to classify predicates in images from a graph representation of detected objects, implicitly encoding an inductive bias for pairwise relations. We then frame relationship detection as the explanation of such a predicate classifier, i.e. we obtain a complete relation by recovering the subject and object of a predicted predicate. We present results comparable to recent fully- and weakly-supervised methods on three diverse and challenging datasets: HICO-DET for human-object interaction, Visual Relationship Detection for generic object-to-object relations, and UnRel for unusual triplets; demonstrating robustness to non-comprehensive annotations and good few-shot generalization. △ Less

Submitted 17 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Published at the European Conference on Computer Vision, ECCV 2020 (Poster)

arXiv:1905.13686 [pdf, other]

Explainability Techniques for Graph Convolutional Networks

Authors: Federico Baldassarre, Hossein Azizpour

Abstract: Graph Networks are used to make decisions in potentially complex scenarios but it is usually not obvious how or why they made them. In this work, we study the explainability of Graph Network decisions using two main classes of techniques, gradient-based and decomposition-based, on a toy dataset and a chemistry task. Our study sets the ground for future development as well as application to real-wo… ▽ More Graph Networks are used to make decisions in potentially complex scenarios but it is usually not obvious how or why they made them. In this work, we study the explainability of Graph Network decisions using two main classes of techniques, gradient-based and decomposition-based, on a toy dataset and a chemistry task. Our study sets the ground for future development as well as application to real-world problems. △ Less

Submitted 31 May, 2019; originally announced May 2019.

Comments: Accepted at the ICML 2019 Workshop "Learning and Reasoning with Graph-Structured Representations" (poster + spotlight talk)

arXiv:1712.03400 [pdf, other]

Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2

Authors: Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao

Abstract: We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any siz… ▽ More We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the "public acceptance" of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs. △ Less

Submitted 9 December, 2017; originally announced December 2017.

Comments: 12 pages

Showing 1–5 of 5 results for author: Baldassarre, F