Search | arXiv e-print repository

Explainable Face Verification via Feature-Guided Gradient Backpropagation

Authors: Yuhang Lu, Zewei Xu, Touradj Ebrahimi

Abstract: Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from differ… ▽ More Recent years have witnessed significant advancement in face recognition (FR) techniques, with their applications widely spread in people's lives and security-sensitive areas. There is a growing need for reliable interpretations of decisions of such systems. Existing studies relying on various mechanisms have investigated the usage of saliency maps as an explanation approach, but suffer from different limitations. This paper first explores the spatial relationship between face image and its deep representation via gradient backpropagation. Then a new explanation approach FGGB has been conceived, which provides precise and insightful similarity and dissimilarity saliency maps to explain the "Accept" and "Reject" decision of an FR system. Extensive visual presentation and quantitative measurement have shown that FGGB achieves superior performance in both similarity and dissimilarity maps when compared to current state-of-the-art explainable face verification approaches. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.00410 [pdf, other]

Assessing objective quality metrics for JPEG and MPEG point cloud coding

Authors: Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi

Abstract: As applications using immersive media gain increasing attention from both academia and industry, research in the field of point cloud compression has greatly intensified in recent years, leading to the development of the MPEG compression standards V-PCC and G-PCC, as well as the more recent JPEG Pleno learning-based point cloud coding. Each of the above-mentioned standards is based on a different… ▽ More As applications using immersive media gain increasing attention from both academia and industry, research in the field of point cloud compression has greatly intensified in recent years, leading to the development of the MPEG compression standards V-PCC and G-PCC, as well as the more recent JPEG Pleno learning-based point cloud coding. Each of the above-mentioned standards is based on a different algorithm, introducing distinct types of degradation that may impair the quality of experience when high lossy compression is applied. Although the impact on perceptual quality could be accurately evaluated during subjective quality assessment experiments, objective quality metrics serve as predictors of the visually perceived quality and provide similarity scores without human intervention. Nevertheless, their accuracy can be susceptible to the characteristics of the evaluated media as well as to the type and intensity of the added distortion. While the performance of multiple state-of-the-art objective quality metrics has already been evaluated through their correlation with subjective scores obtained in the presence of artifacts produced by the MPEG standards, no study has evaluated how metrics perform with the more recent JPEG Pleno point cloud coding. In this paper, a study is conducted to benchmark the performance of a large set of objective quality metrics in a subjective dataset including distortions produced by JPEG and MPEG codecs. The dataset also contains three different trade-offs between color and geometry compression for each codec, adding another dimension to the analysis. Performance indexes are computed over the entire dataset but also after splitting according to the codec and to the original model, resulting in detailed insights about the overall performance of each visual quality predictor as well as their cross-content and cross-codec generalization ability. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.08750 [pdf, other]

Towards the Detection of AI-Synthesized Human Face Images

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In partic… ▽ More Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.04760 [pdf, other]

Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression

Authors: Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi

Abstract: The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as MPEG and JPEG Pleno to launch activities aiming at develo** compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessm… ▽ More The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as MPEG and JPEG Pleno to launch activities aiming at develo** compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of MPEG and JPEG Pleno standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices. △ Less

Submitted 8 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.00560 [pdf]

Technical description of the EPFL submission to the JPEG DNA CfP

Authors: Davi Lazzarotto, Jorge Encinas Ramos, Michela Testolina, Touradj Ebrahimi

Abstract: This document provides a technical description of the codec proposed by EPFL to the JPEG DNA Call for Proposals. The codec we refer to as V-DNA for its versatility, enables the encoding of raw images and already compressed JPEG 1 bitstreams, but the underlying algorithm could be used to encode and transcode any kind of data. The codec is composed of two main modules: the image compression module,… ▽ More This document provides a technical description of the codec proposed by EPFL to the JPEG DNA Call for Proposals. The codec we refer to as V-DNA for its versatility, enables the encoding of raw images and already compressed JPEG 1 bitstreams, but the underlying algorithm could be used to encode and transcode any kind of data. The codec is composed of two main modules: the image compression module, handled by the state-of-the-art JPEG XL codec, and the DNA encoding module, implemented using a modified Raptor Code implementation following the RU10 (Raptor Unsystematic) description. The code for encoding and decoding, as well as the objective metrics results, plots and biochemical constraints analysis are available on ISO Documents system with document number WG1M101013-ICQ-EPFL submission to the JPEG DNA CfP. △ Less

Submitted 1 December, 2023; originally announced December 2023.

arXiv:2306.00402 [pdf, other]

Discriminative Deep Feature Visualization for Explainable Face Recognition

Authors: Zewei Xu, Yuhang Lu, Touradj Ebrahimi

Abstract: Despite the huge success of deep convolutional neural networks in face recognition (FR) tasks, current methods lack explainability for their predictions because of their "black-box" nature. In recent years, studies have been carried out to give an interpretation of the decision of a deep FR system. However, the affinity between the input facial image and the extracted deep features has not been ex… ▽ More Despite the huge success of deep convolutional neural networks in face recognition (FR) tasks, current methods lack explainability for their predictions because of their "black-box" nature. In recent years, studies have been carried out to give an interpretation of the decision of a deep FR system. However, the affinity between the input facial image and the extracted deep features has not been explored. This paper contributes to the problem of explainable face recognition by first conceiving a face reconstruction-based explanation module, which reveals the correspondence between the deep feature and the facial regions. To further interpret the decision of an FR model, a novel visual saliency explanation algorithm has been proposed. It provides insightful explanation by producing visual saliency maps that represent similar and dissimilar regions between input faces. A detailed analysis has been presented for the generated visual explanation to show the effectiveness of the proposed method. △ Less

Submitted 5 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.08546 [pdf, other]

Towards Visual Saliency Explanations of Face Verification

Authors: Yuhang Lu, Zewei Xu, Touradj Ebrahimi

Abstract: In the past years, deep convolutional neural networks have been pushing the frontier of face recognition (FR) techniques in both verification and identification scenarios. Despite the high accuracy, they are often criticized for lacking explainability. There has been an increasing demand for understanding the decision-making process of deep face recognition systems. Recent studies have investigate… ▽ More In the past years, deep convolutional neural networks have been pushing the frontier of face recognition (FR) techniques in both verification and identification scenarios. Despite the high accuracy, they are often criticized for lacking explainability. There has been an increasing demand for understanding the decision-making process of deep face recognition systems. Recent studies have investigated the usage of visual saliency maps as an explanation, but they often lack a discussion and analysis in the context of face recognition. This paper concentrates on explainable face verification tasks and conceives a new explanation framework. Firstly, a definition of the saliency-based explanation method is provided, which focuses on the decisions made by the deep FR model. Secondly, a new model-agnostic explanation method named CorrRISE is proposed to produce saliency maps, which reveal both the similar and dissimilar regions of any given pair of face images. Then, an evaluation methodology is designed to measure the performance of general visual saliency explanation methods in face verification. Finally, substantial visual and quantitative results have shown that the proposed CorrRISE method demonstrates promising results in comparison with other state-of-the-art explainable face verification approaches. △ Less

Submitted 24 October, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.06125 [pdf, other]

Assessment Framework for Deepfake Detection in Real-world Situations

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect r… ▽ More Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of three popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2304.06118 [pdf, other]

Explanation of Face Recognition via Saliency Maps

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: Despite the significant progress in face recognition in the past years, they are often treated as "black boxes" and have been criticized for lacking explainability. It becomes increasingly important to understand the characteristics and decisions of deep face recognition systems to make them more acceptable to the public. Explainable face recognition (XFR) refers to the problem of interpreting why… ▽ More Despite the significant progress in face recognition in the past years, they are often treated as "black boxes" and have been criticized for lacking explainability. It becomes increasingly important to understand the characteristics and decisions of deep face recognition systems to make them more acceptable to the public. Explainable face recognition (XFR) refers to the problem of interpreting why the recognition model matches a probe face with one identity over others. Recent studies have explored use of visual saliency maps as an explanation, but they often lack a deeper analysis in the context of face recognition. This paper starts by proposing a rigorous definition of explainable face recognition (XFR) which focuses on the decision-making process of the deep recognition model. Following the new definition, a similarity-based RISE algorithm (S-RISE) is then introduced to produce high-quality visual saliency maps. Furthermore, an evaluation approach is proposed to systematically validate the reliability and accuracy of general visual saliency-based XFR methods. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2303.17247 [pdf, other]

Impact of Video Processing Operations in Deepfake Detection

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: The detection of digital face manipulation in video has attracted extensive attention due to the increased risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been developed and have shown impressive results. However, the performance of these detectors is often evaluated using benchmarks that hardly reflect real-world situ… ▽ More The detection of digital face manipulation in video has attracted extensive attention due to the increased risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been developed and have shown impressive results. However, the performance of these detectors is often evaluated using benchmarks that hardly reflect real-world situations. For example, the impact of various video processing operations on detection accuracy has not been systematically assessed. To address this gap, this paper first analyzes numerous real-world influencing factors and typical video processing operations. Then, a more systematic assessment methodology is proposed, which allows for a quantitative evaluation of a detector's robustness under the influence of different processing operations. Moreover, substantial experiments have been carried out on three popular deepfake detectors, which give detailed analyses on the impact of each operation and bring insights to foster future research. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.08665 [pdf, other]

Cross-resolution Face Recognition via Identity-Preserving Network and Knowledge Distillation

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: Cross-resolution face recognition has become a challenging problem for modern deep face recognition systems. It aims at matching a low-resolution probe image with high-resolution gallery images registered in a database. Existing methods mainly leverage prior information from high-resolution images by either reconstructing facial details with super-resolution techniques or learning a unified featur… ▽ More Cross-resolution face recognition has become a challenging problem for modern deep face recognition systems. It aims at matching a low-resolution probe image with high-resolution gallery images registered in a database. Existing methods mainly leverage prior information from high-resolution images by either reconstructing facial details with super-resolution techniques or learning a unified feature space. To address this challenge, this paper proposes a new approach that enforces the network to focus on the discriminative information stored in the low-frequency components of a low-resolution image. A cross-resolution knowledge distillation paradigm is first employed as the learning framework. Then, an identity-preserving network, WaveResNet, and a wavelet similarity loss are designed to capture low-frequency details and boost performance. Finally, an image degradation model is conceived to simulate more realistic low-resolution training data. Consequently, extensive experimental results show that the proposed method consistently outperforms the baseline model and other state-of-the-art methods across a variety of image resolutions. △ Less

Submitted 5 September, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

arXiv:2203.11807 [pdf, other]

A New Approach to Improve Learning-based Deepfake Detection in Realistic Conditions

Authors: Yuhang Lu, Touradj Ebrahimi

Abstract: Deep convolutional neural networks have achieved exceptional results on multiple detection and recognition tasks. However, the performance of such detectors are often evaluated in public benchmarks under constrained and non-realistic situations. The impact of conventional distortions and processing operations found in imaging workflows such as compression, noise, and enhancement are not sufficient… ▽ More Deep convolutional neural networks have achieved exceptional results on multiple detection and recognition tasks. However, the performance of such detectors are often evaluated in public benchmarks under constrained and non-realistic situations. The impact of conventional distortions and processing operations found in imaging workflows such as compression, noise, and enhancement are not sufficiently studied. Currently, only a few researches have been done to improve the detector robustness to unseen perturbations. This paper proposes a more effective data augmentation scheme based on real-world image degradation process. This novel technique is deployed for deepfake detection tasks and has been evaluated by a more realistic assessment framework. Extensive experiments show that the proposed data augmentation scheme improves generalization ability to unpredictable data distortions and unseen datasets. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2203.11797 [pdf, other]

A Novel Framework for Assessment of Learning-based Detectors in Realistic Conditions with Application to Deepfake Detection

Authors: Yuhang Lu, Ruizhi Luo, Touradj Ebrahimi

Abstract: Deep convolutional neural networks have shown remarkable results on multiple detection tasks. Despite the significant progress, the performance of such detectors are often assessed in public benchmarks under non-realistic conditions. Specifically, impact of conventional distortions and processing operations such as compression, noise, and enhancement are not sufficiently studied. This paper propos… ▽ More Deep convolutional neural networks have shown remarkable results on multiple detection tasks. Despite the significant progress, the performance of such detectors are often assessed in public benchmarks under non-realistic conditions. Specifically, impact of conventional distortions and processing operations such as compression, noise, and enhancement are not sufficiently studied. This paper proposes a rigorous framework to assess performance of learning-based detectors in more realistic situations. An illustrative example is shown under deepfake detection context. Inspired by the assessment results, a data augmentation strategy based on natural image degradation process is designed, which significantly improves the generalization ability of two deepfake detectors. △ Less

Submitted 22 March, 2022; originally announced March 2022.

arXiv:2201.06935 [pdf, other]

Sampling color and geometry point clouds from ShapeNet dataset

Authors: Davi Lazzarotto, Touradj Ebrahimi

Abstract: The popularisation of acquisition devices capable of capturing volumetric information such as LiDAR scans and depth cameras has lead to an increased interest in point clouds as an imaging modality. Due to the high amount of data needed for their representation, efficient compression solutions are needed to enable practical applications. Among the many techniques that have been proposed in the last… ▽ More The popularisation of acquisition devices capable of capturing volumetric information such as LiDAR scans and depth cameras has lead to an increased interest in point clouds as an imaging modality. Due to the high amount of data needed for their representation, efficient compression solutions are needed to enable practical applications. Among the many techniques that have been proposed in the last years, learning-based methods are receiving large attention due to their high performance and potential for improvement. Such algorithms depend on large and diverse training sets to achieve good compression performance. ShapeNet is a large-scale dataset composed of CAD models with texture and constitute and effective option for training such compression methods. This dataset is entirely composed of meshes, which must go through a sampling process in order to obtain point clouds with geometry and texture information. Although many existing software libraries are able to sample geometry from meshes through simple functions, obtaining an output point cloud with geometry and color of the external faces of the mesh models is not a straightforward process for the ShapeNet dataset. The main difficulty associated with this dataset is that its models are often defined with duplicated faces sharing the same vertices, but with different color values. This document describes a script for sampling the meshes from ShapeNet that circumvent this issue by excluding the internal faces of the mesh models prior to the sampling. The script can be accessed from the following link: https://github.com/mmspg/mesh-sampling. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:1908.02039 [pdf, other]

Digital Watermarking of video streams: Review of the State-Of-The-Art

Authors: Romain Artru, Alexandre Gouaillard, Touradj Ebrahimi

Abstract: Digital Watermarking is an extremely wide aspect of information security, either by its applications, by its properties, or by its designs. In particular, a lot of research has been made about video watermarking and it can make it quite difficult to put into perspective the various schemes possible in order to implement a watermarking process for a given application. This paper presents an in-dept… ▽ More Digital Watermarking is an extremely wide aspect of information security, either by its applications, by its properties, or by its designs. In particular, a lot of research has been made about video watermarking and it can make it quite difficult to put into perspective the various schemes possible in order to implement a watermarking process for a given application. This paper presents an in-depth overview of the current video watermarking technologies and how they each respond to certain criteria that may be imposed by the aimed application. The goal being in first place to be able to define the desired equilibrium point between invisibility, robustness and efficiency for an application. Then, given this balance, being able to deduce the best location of the information embedding as well as the method used to embed it. The equilibrium point is to be found using the needed properties of the watermark and by studying the threat model that the scheme will have to face. The location describes whether the extra information should be added to the metadata of the video, to its frames or to specific regions of its frames. Finally, the method to embed the watermark refers to the insertion domain and its coefficients to be altered in order to insert the wanted information. △ Less

Submitted 23 August, 2019; v1 submitted 6 August, 2019; originally announced August 2019.

Comments: 33 pages, 11 figues

arXiv:1905.03951 [pdf, other]

Perceptual Quality Study on Deep Learning based Image Compression

Authors: Zhengxue Cheng, Pinar Akyazi, Heming Sun, Jiro Katto, Touradj Ebrahimi

Abstract: Recently deep learning based image compression has made rapid advances with promising results based on objective quality metrics. However, a rigorous subjective quality evaluation on such compression schemes have rarely been reported. This paper aims at perceptual quality studies on learned compression. First, we build a general learned compression approach, and optimize the model. In total six co… ▽ More Recently deep learning based image compression has made rapid advances with promising results based on objective quality metrics. However, a rigorous subjective quality evaluation on such compression schemes have rarely been reported. This paper aims at perceptual quality studies on learned compression. First, we build a general learned compression approach, and optimize the model. In total six compression algorithms are considered for this study. Then, we perform subjective quality tests in a controlled environment using high-resolution images. Results demonstrate learned compression optimized by MS-SSIM yields competitive results that approach the efficiency of state-of-the-art compression. The results obtained can provide a useful benchmark for future developments in learned image compression. △ Less

Submitted 10 May, 2019; originally announced May 2019.

Comments: Accepted as a conference contribution to IEEE International Conference on Image Processing (ICIP) 2019

Showing 1–16 of 16 results for author: Ebrahimi, T