Search | arXiv e-print repository

GPU-accelerated SIFT-aided source identification of stabilized videos

Authors: Andrea Montibeller, Cecilia Pasquini, Giulia Boato, Stefano Dell'Anna, Fernando Pérez-González

Abstract: Video stabilization is an in-camera processing commonly applied by modern acquisition devices. While significantly improving the visual quality of the resulting videos, it has been shown that such operation typically hinders the forensic analysis of video signals. In fact, the correct identification of the acquisition source usually based on Photo Response non-Uniformity (PRNU) is subject to the e… ▽ More Video stabilization is an in-camera processing commonly applied by modern acquisition devices. While significantly improving the visual quality of the resulting videos, it has been shown that such operation typically hinders the forensic analysis of video signals. In fact, the correct identification of the acquisition source usually based on Photo Response non-Uniformity (PRNU) is subject to the estimation of the transformation applied to each frame in the stabilization phase. A number of techniques have been proposed for dealing with this problem, which however typically suffer from a high computational burden due to the grid search in the space of inversion parameters. Our work attempts to alleviate these shortcomings by exploiting the parallelization capabilities of Graphics Processing Units (GPUs), typically used for deep learning applications, in the framework of stabilised frames inversion. Moreover, we propose to exploit SIFT features {to estimate the camera momentum and} %to identify less stabilized temporal segments, thus enabling a more accurate identification analysis, and to efficiently initialize the frame-wise parameter search of consecutive frames. Experiments on a consolidated benchmark dataset confirm the effectiveness of the proposed approach in reducing the required computational time and improving the source identification accuracy. {The code is available at \url{https://github.com/AMontiB/GPU-PRNU-SIFT}}. △ Less

Submitted 29 July, 2022; originally announced July 2022.

arXiv:2108.02515 [pdf, other]

Multi-clue reconstruction of sharing chains for social media images

Authors: Sebastiano Verde, Cecilia Pasquini, Federica Lago, Alessandro Goller, Francesco GB De Natale, Alessandro Piva, Giulia Boato

Abstract: The amount of multimedia content shared everyday, combined with the level of realism reached by recent fake-generating technologies, threatens to impair the trustworthiness of online information sources. The process of uploading and sharing data tends to hinder standard media forensic analyses, since multiple re-sharing steps progressively hide the traces of past manipulations. At the same time th… ▽ More The amount of multimedia content shared everyday, combined with the level of realism reached by recent fake-generating technologies, threatens to impair the trustworthiness of online information sources. The process of uploading and sharing data tends to hinder standard media forensic analyses, since multiple re-sharing steps progressively hide the traces of past manipulations. At the same time though, new traces are introduced by the platforms themselves, enabling the reconstruction of the sharing history of digital objects, with possible applications in information flow monitoring and source identification. In this work, we propose a supervised framework for the reconstruction of image sharing chains on social media platforms. The system is structured as a cascade of backtracking blocks, each of them tracing back one step of the sharing chain at a time. Blocks are designed as ensembles of classifiers trained to analyse the input image independently from one another by leveraging different feature representations that describe both content and container of the media object. Individual decisions are then properly combined by a late fusion strategy. Results highlight the advantages of employing multiple clues, which allow accurately tracing back up to three steps along the sharing chain. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2106.07226 [pdf, other]

doi 10.1109/MSP.2021.3120982

More Real than Real: A Study on Human Visual Perception of Synthetic Faces

Authors: Federica Lago, Cecilia Pasquini, Rainer Böhme, Hélène Dumont, Valérie Goffaux, Giulia Boato

Abstract: Deep fakes became extremely popular in the last years, also thanks to their increasing realism. Therefore, there is the need to measures human's ability to distinguish between real and synthetic face images when confronted with cutting-edge creation technologies. We describe the design and results of a perceptual experiment we have conducted, where a wide and diverse group of volunteers has been e… ▽ More Deep fakes became extremely popular in the last years, also thanks to their increasing realism. Therefore, there is the need to measures human's ability to distinguish between real and synthetic face images when confronted with cutting-edge creation technologies. We describe the design and results of a perceptual experiment we have conducted, where a wide and diverse group of volunteers has been exposed to synthetic face images produced by state-of-the-art Generative Adversarial Networks (namely, PG-GAN, StyleGAN, StyleGAN2). The experiment outcomes reveal how strongly we should call into question our human ability to discriminate real faces from synthetic ones generated through modern AI. △ Less

Submitted 20 October, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2007.15271 [pdf, other]

Dynamic texture analysis for detecting fake faces in video sequences

Authors: Mattia Bonomi, Cecilia Pasquini, Giulia Boato

Abstract: The creation of manipulated multimedia content involving human characters has reached in the last years unprecedented realism, calling for automated techniques to expose synthetically generated faces in images and videos. This work explores the analysis of spatio-temporal texture dynamics of the video signal, with the goal of characterizing and distinguishing real and fake sequences. We propose to… ▽ More The creation of manipulated multimedia content involving human characters has reached in the last years unprecedented realism, calling for automated techniques to expose synthetically generated faces in images and videos. This work explores the analysis of spatio-temporal texture dynamics of the video signal, with the goal of characterizing and distinguishing real and fake sequences. We propose to build a binary decision on the joint analysis of multiple temporal segments and, in contrast to previous approaches, to exploit the textural dynamics of both the spatial and temporal dimensions. This is achieved through the use of Local Derivative Patterns on Three Orthogonal Planes (LDP-TOP), a compact feature representation known to be an important asset for the detection of face spoofing attacks. Experimental analyses on state-of-the-art datasets of manipulated videos show the discriminative power of such descriptors in separating real and fake sequences, and also identifying the creation method used. Linear Support Vector Machines (SVMs) are used which, despite the lower complexity, yield comparable performance to previously proposed deep models for fake content detection. △ Less

Submitted 30 July, 2020; originally announced July 2020.

arXiv:1910.01568 [pdf, other]

Incremental learning for the detection and classification of GAN-generated images

Authors: Francesco Marra, Cristiano Saltori, Giulia Boato, Luisa Verdoliva

Abstract: Current developments in computer vision and deep learning allow to automatically generate hyper-realistic images, hardly distinguishable from real ones. In particular, human face generation achieved a stunning level of realism, opening new opportunities for the creative industry but, at the same time, new scary scenarios where such content can be maliciously misused. Therefore, it is essential to… ▽ More Current developments in computer vision and deep learning allow to automatically generate hyper-realistic images, hardly distinguishable from real ones. In particular, human face generation achieved a stunning level of realism, opening new opportunities for the creative industry but, at the same time, new scary scenarios where such content can be maliciously misused. Therefore, it is essential to develop innovative methodologies to automatically tell apart real from computer generated multimedia, possibly able to follow the evolution and continuous improvement of data in terms of quality and realism. In the last few years, several deep learning-based solutions have been proposed for this problem, mostly based on Convolutional Neural Networks (CNNs). Although results are good in controlled conditions, it is not clear how such proposals can adapt to real-world scenarios, where learning needs to continuously evolve as new types of generated data appear. In this work, we tackle this problem by proposing an approach based on incremental learning for the detection and classification of GAN-generated images. Experiments on a dataset comprising images generated by several GAN-based architectures show that the proposed method is able to correctly perform discrimination when new GANs are presented to the network △ Less

Submitted 6 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1903.01735 [pdf, other]

Hue Modification Localization By Pair Matching

Authors: Quoc-Tin Phan, Michele Vascotto, Giulia Boato

Abstract: Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) conf… ▽ More Hue modification is the adjustment of hue property on color images. Conducting hue modification on an image is trivial, and it can be abused to falsify opinions of viewers. Since shapes, edges or textural information remains unchanged after hue modification, this type of manipulation is relatively hard to be detected and localized. Since small patches inherit the same Color Filter Array (CFA) configuration and demosaicing, any distortion made by local hue modification can be detected by patch matching within the same image. In this paper, we propose to localize hue modification by means of a Siamese neural network specifically designed for matching two inputs. By crafting the network outputs, we are able to form a heatmap which potentially highlights malicious regions. Our proposed method deals well not only with uncompressed images but also with the presence of JPEG compression, an operation usually hindering the exploitation of CFA and demosaicing artifacts. Experimental evidences corroborate the effectiveness of the proposed method. △ Less

Submitted 5 March, 2019; originally announced March 2019.

arXiv:1810.07945 [pdf, other]

Accurate and Scalable Image Clustering Based On Sparse Representation of Camera Fingerprint

Authors: Quoc-Tin Phan, Giulia Boato, Francesco G. B. De Natale

Abstract: Clustering images according to their acquisition devices is a well-known problem in multimedia forensics, which is typically faced by means of camera Sensor Pattern Noise (SPN). Such an issue is challenging since SPN is a noise-like signal, hard to be estimated and easy to be attenuated or destroyed by many factors. Moreover, the high dimensionality of SPN hinders large-scale applications. Existin… ▽ More Clustering images according to their acquisition devices is a well-known problem in multimedia forensics, which is typically faced by means of camera Sensor Pattern Noise (SPN). Such an issue is challenging since SPN is a noise-like signal, hard to be estimated and easy to be attenuated or destroyed by many factors. Moreover, the high dimensionality of SPN hinders large-scale applications. Existing approaches are typically based on the correlation among SPNs in the pixel domain, which might not be able to capture intrinsic data structure in union of vector subspaces. In this paper, we propose an accurate clustering framework, which exploits linear dependencies among SPNs in their intrinsic vector subspaces. Such dependencies are encoded under sparse representations which are obtained by solving a LASSO problem with non-negativity constraint. The proposed framework is highly accurate in number of clusters estimation and image association. Moreover, our framework is scalable to the number of images and robust against double JPEG compression as well as the presence of outliers, owning big potential for real-world applications. Experimental results on Dresden and Vision database show that our proposed framework can adapt well to both medium-scale and large-scale contexts, and outperforms state-of-the-art methods. △ Less

Submitted 30 November, 2018; v1 submitted 18 October, 2018; originally announced October 2018.

arXiv:1608.06770 [pdf, other]

Automatic Synchronization of Multi-User Photo Galleries

Authors: E. Sansone, K. Apostolidis, N. Conci, G. Boato, V. Mezaris, F. G. B. De Natale

Abstract: In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solutio… ▽ More In this paper we address the issue of photo galleries synchronization, where pictures related to the same event are collected by different users. Existing solutions to address the problem are usually based on unrealistic assumptions, like time consistency across photo galleries, and often heavily rely on heuristics, limiting therefore the applicability to real-world scenarios. We propose a solution that achieves better generalization performance for the synchronization task compared to the available literature. The method is characterized by three stages: at first, deep convolutional neural network features are used to assess the visual similarity among the photos; then, pairs of similar photos are detected across different galleries and used to construct a graph; eventually, a probabilistic graphical model is used to estimate the temporal offset of each pair of galleries, by traversing the minimum spanning tree extracted from this graph. The experimental evaluation is conducted on four publicly available datasets covering different types of events, demonstrating the strength of our proposed method. A thorough discussion of the obtained results is provided for a critical assessment of the quality in synchronization. △ Less

Submitted 16 January, 2017; v1 submitted 24 August, 2016; originally announced August 2016.

Comments: ACCEPTED to IEEE Transactions on Multimedia

Showing 1–8 of 8 results for author: Boato, G