-
Cool-chic video: Learned video coding with 800 parameters
Authors:
Thomas Leguay,
Théo Ladune,
Pierrick Philippe,
Olivier Déforges
Abstract:
We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to…
▽ More
We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to compress videos using both low-delay and random access configurations and achieves rate-distortion close to AVC while out-performing other overfitted codecs such as FFNeRV. The system is made open-source: orange-opensource.github.io/Cool-Chic.
△ Less
Submitted 6 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Wei Sun,
Yulun Zhang,
Kai Zhang,
Radu Timofte,
Guangtao Zhai,
Yixuan Gao,
Yuqin Cao,
Tengchuan Kou,
Yunlong Dong,
Ziheng Jia,
Yilin Li,
Wei Wu,
Shuming Hu,
Sibin Deng,
Pengxiang Xiao,
Ying Chen,
Kai Li,
Kai Zhao,
Kun Yuan,
Ming Sun,
Heng Cong,
Hao Wang,
Lingzhi Fu
, et al. (47 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual…
▽ More
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Federated Adversarial Training with Transformers
Authors:
Ahmed Aldahdooh,
Wassim Hamidouche,
Olivier Déforges
Abstract:
Federated learning (FL) has emerged to enable global model training over distributed clients' data while preserving its privacy. However, the global trained model is vulnerable to the evasion attacks especially, the adversarial examples (AEs), carefully crafted samples to yield false classification. Adversarial training (AT) is found to be the most promising approach against evasion attacks and it…
▽ More
Federated learning (FL) has emerged to enable global model training over distributed clients' data while preserving its privacy. However, the global trained model is vulnerable to the evasion attacks especially, the adversarial examples (AEs), carefully crafted samples to yield false classification. Adversarial training (AT) is found to be the most promising approach against evasion attacks and it is widely studied for convolutional neural network (CNN). Recently, vision transformers have been found to be effective in many computer vision tasks. To the best of the authors' knowledge, there is no work that studied the feasibility of AT in a FL process for vision transformers. This paper investigates such feasibility with different federated model aggregation methods and different vision transformer models with different tokenization and classification head techniques. In order to improve the robust accuracy of the models with the not independent and identically distributed (Non-IID), we propose an extension to FedAvg aggregation method, called FedWAvg. By measuring the similarities between the last layer of the global model and the last layer of the client updates, FedWAvg calculates the weights to aggregate the local models updates. The experiments show that FedWAvg improves the robust accuracy when compared with other state-of-the-art aggregation methods.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
CAESR: Conditional Autoencoder and Super-Resolution for Learned Spatial Scalability
Authors:
Charles Bonnineau,
Wassim Hamidouche,
Jean-François Travers,
Naty Sidaty,
Jean-Yves Aubié,
Olivier Deforges
Abstract:
In this paper, we present CAESR, an hybrid learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard. Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model. The EL encoder takes as inputs both the upscaled BL recon…
▽ More
In this paper, we present CAESR, an hybrid learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard. Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model. The EL encoder takes as inputs both the upscaled BL reconstruction and the original image. Our approach relies on conditional coding that learns the optimal mixture of the source and the upscaled BL image, enabling better performance than residual coding. On the decoder side, a super-resolution (SR) module is used to recover high-resolution details and invert the conditional coding process. Experimental results have shown that our solution is competitive with the VVC full-resolution intra coding while being scalable.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Reveal of Vision Transformers Robustness against Adversarial Attacks
Authors:
Ahmed Aldahdooh,
Wassim Hamidouche,
Olivier Deforges
Abstract:
The major part of the vanilla vision transformer (ViT) is the attention block that brings the power of mimicking the global context of the input image. For better performance, ViT needs large-scale training data. To overcome this data hunger limitation, many ViT-based networks, or hybrid-ViT, have been proposed to include local context during the training. The robustness of ViTs and its variants a…
▽ More
The major part of the vanilla vision transformer (ViT) is the attention block that brings the power of mimicking the global context of the input image. For better performance, ViT needs large-scale training data. To overcome this data hunger limitation, many ViT-based networks, or hybrid-ViT, have been proposed to include local context during the training. The robustness of ViTs and its variants against adversarial attacks has not been widely investigated in the literature like CNNs. This work studies the robustness of ViT variants 1) against different Lp-based adversarial attacks in comparison with CNNs, 2) under adversarial examples (AEs) after applying preprocessing defense methods and 3) under the adaptive attacks using expectation over transformation (EOT) framework. To that end, we run a set of experiments on 1000 images from ImageNet-1k and then provide an analysis that reveals that vanilla ViT or hybrid-ViT are more robust than CNNs. For instance, we found that 1) Vanilla ViTs or hybrid-ViTs are more robust than CNNs under Lp-based attacks and under adaptive attacks. 2) Unlike hybrid-ViTs, Vanilla ViTs are not responding to preprocessing defenses that mainly reduce the high frequency components. Furthermore, feature maps, attention maps, and Grad-CAM visualization jointly with image quality measures, and perturbations' energy spectrum are provided for an insight understanding of attention-based models.
△ Less
Submitted 20 September, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
SHD360: A Benchmark Dataset for Salient Human Detection in 360° Videos
Authors:
Yi Zhang,
Lu Zhang,
Kang Wang,
Wassim Hamidouche,
Olivier Deforges
Abstract:
Salient human detection (SHD) in dynamic 360° immersive videos is of great importance for various applications such as robotics, inter-human and human-object interaction in augmented reality. However, 360° video SHD has been seldom discussed in the computer vision community due to a lack of datasets with large-scale omnidirectional videos and rich annotations. To this end, we propose SHD360, the f…
▽ More
Salient human detection (SHD) in dynamic 360° immersive videos is of great importance for various applications such as robotics, inter-human and human-object interaction in augmented reality. However, 360° video SHD has been seldom discussed in the computer vision community due to a lack of datasets with large-scale omnidirectional videos and rich annotations. To this end, we propose SHD360, the first 360° video SHD dataset which contains various real-life daily scenes. Since so far there is no method proposed for 360° image/video SHD, we systematically benchmark 11 representative state-of-the-art salient object detection (SOD) approaches on our SHD360, and explore key issues derived from extensive experimenting results. We hope our proposed dataset and benchmark could serve as a good starting point for advancing human-centric researches towards 360° panoramic data. The dataset is available at https://github.com/PanoAsh/SHD360.
△ Less
Submitted 22 December, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
CMA-Net: A Cascaded Mutual Attention Network for Light Field Salient Object Detection
Authors:
Yi Zhang,
Lu Zhang,
Wassim Hamidouche,
Olivier Deforges
Abstract:
In the past few years, numerous deep learning methods have been proposed to address the task of segmenting salient objects from RGB images. However, these approaches depending on single modality fail to achieve the state-of-the-art performance on widely used light field salient object detection (SOD) datasets, which collect large-scale natural images and provide multiple modalities such as multi-v…
▽ More
In the past few years, numerous deep learning methods have been proposed to address the task of segmenting salient objects from RGB images. However, these approaches depending on single modality fail to achieve the state-of-the-art performance on widely used light field salient object detection (SOD) datasets, which collect large-scale natural images and provide multiple modalities such as multi-view, micro-lens images and depth maps. Most recently proposed light field SOD methods have acquired improving detecting accuracy, yet still predict rough objects' structures and perform slow inference speed. To this end, we propose CMA-Net, which consists of two novel cascaded mutual attention modules aiming at fusing the high level features from the modalities of all-in-focus and depth. Our proposed CMA-Net outperforms 30 SOD methods on two widely applied light field benchmark datasets. Besides, the proposed CMA-Net is able to inference at the speed of 53 fps. Extensive quantitative and qualitative experiments illustrate both the effectiveness and efficiency of our CMA-Net.
△ Less
Submitted 7 December, 2021; v1 submitted 3 May, 2021;
originally announced May 2021.
-
Adversarial Example Detection for DNN Models: A Review and Experimental Comparison
Authors:
Ahmed Aldahdooh,
Wassim Hamidouche,
Sid Ahmed Fezza,
Olivier Deforges
Abstract:
Deep learning (DL) has shown great success in many human-related tasks, which has led to its adoption in many computer vision based applications, such as security surveillance systems, autonomous vehicles and healthcare. Such safety-critical applications have to draw their path to success deployment once they have the capability to overcome safety-critical challenges. Among these challenges are th…
▽ More
Deep learning (DL) has shown great success in many human-related tasks, which has led to its adoption in many computer vision based applications, such as security surveillance systems, autonomous vehicles and healthcare. Such safety-critical applications have to draw their path to success deployment once they have the capability to overcome safety-critical challenges. Among these challenges are the defense against or/and the detection of the adversarial examples (AEs). Adversaries can carefully craft small, often imperceptible, noise called perturbations to be added to the clean image to generate the AE. The aim of AE is to fool the DL model which makes it a potential risk for DL applications. Many test-time evasion attacks and countermeasures,i.e., defense or detection methods, are proposed in the literature. Moreover, few reviews and surveys were published and theoretically showed the taxonomy of the threats and the countermeasure methods with little focus in AE detection methods. In this paper, we focus on image classification task and attempt to provide a survey for detection methods of test-time evasion attacks on neural network classifiers. A detailed discussion for such methods is provided with experimental results for eight state-of-the-art detectors under different scenarios on four datasets. We also provide potential challenges and future perspectives for this research direction.
△ Less
Submitted 7 January, 2022; v1 submitted 1 May, 2021;
originally announced May 2021.
-
Learning Synergistic Attention for Light Field Salient Object Detection
Authors:
Yi Zhang,
Geng Chen,
Qian Chen,
Yujia Sun,
Yong Xia,
Olivier Deforges,
Wassim Hamidouche,
Lu Zhang
Abstract:
We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms. Our SA-Net exploits the rich information of focal stacks via 3D convolutional neural networks, decodes the high-level features of multi-modal light field data with two cascaded synergistic…
▽ More
We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms. Our SA-Net exploits the rich information of focal stacks via 3D convolutional neural networks, decodes the high-level features of multi-modal light field data with two cascaded synergistic attention modules, and predicts the saliency map using an effective feature fusion module in a progressive manner. Extensive experiments on three widely-used benchmark datasets show that our SA-Net outperforms 28 state-of-the-art models, sufficiently demonstrating its effectiveness and superiority. Our code is available at https://github.com/PanoAsh/SA-Net.
△ Less
Submitted 22 October, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Conditional Coding and Variable Bitrate for Practical Learned Video Coding
Authors:
Théo Ladune,
Pierrick Philippe,
Wassim Hamidouche,
Lu Zhang,
Olivier Déforges
Abstract:
This paper introduces a practical learned video codec. Conditional coding and quantization gain vectors are used to provide flexibility to a single encoder/decoder pair, which is able to compress video sequences at a variable bitrate. The flexibility is leveraged at test time by choosing the rate and GOP structure to optimize a rate-distortion cost. Using the CLIC21 video test conditions, the prop…
▽ More
This paper introduces a practical learned video codec. Conditional coding and quantization gain vectors are used to provide flexibility to a single encoder/decoder pair, which is able to compress video sequences at a variable bitrate. The flexibility is leveraged at test time by choosing the rate and GOP structure to optimize a rate-distortion cost. Using the CLIC21 video test conditions, the proposed approach shows performance on par with HEVC.
△ Less
Submitted 20 April, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Multitask Learning for VVC Quality Enhancement and Super-Resolution
Authors:
Charles Bonnineau,
Wassim Hamidouche,
Jean-Francois Travers,
Naty Sidaty,
Olivier Deforges
Abstract:
The latest video coding standard, called versatile video coding (VVC), includes several novel and refined coding tools at different levels of the coding chain. These tools bring significant coding gains with respect to the previous standard, high efficiency video coding (HEVC). However, the encoder may still introduce visible coding artifacts, mainly caused by coding decisions applied to adjust th…
▽ More
The latest video coding standard, called versatile video coding (VVC), includes several novel and refined coding tools at different levels of the coding chain. These tools bring significant coding gains with respect to the previous standard, high efficiency video coding (HEVC). However, the encoder may still introduce visible coding artifacts, mainly caused by coding decisions applied to adjust the bitrate to the available bandwidth. Hence, pre and post-processing techniques are generally added to the coding pipeline to improve the quality of the decoded video. These methods have recently shown outstanding results compared to traditional approaches, thanks to the recent advances in deep learning. Generally, multiple neural networks are trained independently to perform different tasks, thus omitting to benefit from the redundancy that exists between the models. In this paper, we investigate a learning-based solution as a post-processing step to enhance the decoded VVC video quality. Our method relies on multitask learning to perform both quality enhancement and super-resolution using a single shared network optimized for multiple degradation levels. The proposed solution enables a good performance in both mitigating coding artifacts and super-resolution with fewer network parameters compared to traditional specialized architectures.
△ Less
Submitted 3 May, 2021; v1 submitted 16 April, 2021;
originally announced April 2021.
-
Revisiting Model's Uncertainty and Confidences for Adversarial Example Detection
Authors:
Ahmed Aldahdooh,
Wassim Hamidouche,
Olivier Déforges
Abstract:
Security-sensitive applications that rely on Deep Neural Networks (DNNs) are vulnerable to small perturbations that are crafted to generate Adversarial Examples(AEs). The AEs are imperceptible to humans and cause DNN to misclassify them. Many defense and detection techniques have been proposed. Model's confidences and Dropout, as a popular way to estimate the model's uncertainty, have been used fo…
▽ More
Security-sensitive applications that rely on Deep Neural Networks (DNNs) are vulnerable to small perturbations that are crafted to generate Adversarial Examples(AEs). The AEs are imperceptible to humans and cause DNN to misclassify them. Many defense and detection techniques have been proposed. Model's confidences and Dropout, as a popular way to estimate the model's uncertainty, have been used for AE detection but they showed limited success against black- and gray-box attacks. Moreover, the state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we revisit the model's uncertainty and confidences and propose a novel unsupervised ensemble AE detection mechanism that 1) uses the uncertainty method called SelectiveNet, 2) processes model layers outputs, i.e.feature maps, to generate new confidence probabilities. The detection method is called Selective and Feature based Adversarial Detection (SFAD). Experimental results show that the proposed approach achieves better performance against black- and gray-box attacks than the state-of-the-art methods and achieves comparable performance against white-box attacks. Moreover, results show that SFAD is fully robust against High Confidence Attacks (HCAs) for MNIST and partially robust for CIFAR10 datasets.
△ Less
Submitted 21 June, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Selective Encryption of the Versatile Video Coding Standard
Authors:
Guillaume Gautier,
Mousa FarajAllah,
Wassim Hamidouche,
Olivier Déforges,
Safwan El Assad
Abstract:
Versatile video coding (VVC) is the next generation video coding standard developed by the joint video experts team (JVET) and released in July 2020. VVC introduces several new coding tools providing a significant coding gain over the high efficiency video coding (HEVC) standard. It is well known that increasing the coding efficiency adds more dependencies in the video bitstream making format-comp…
▽ More
Versatile video coding (VVC) is the next generation video coding standard developed by the joint video experts team (JVET) and released in July 2020. VVC introduces several new coding tools providing a significant coding gain over the high efficiency video coding (HEVC) standard. It is well known that increasing the coding efficiency adds more dependencies in the video bitstream making format-compliant encryption with the standard more challenging. In this paper we tackle the problem of selective encryption of the VVC standard in format-compliant and constant bitrate. These two constraints ensure that the encrypted bitstream can be decoded by any VVC decoder while the bitrate remains unchanged by the encryption. The selective encryption of all possible VVC syntax elements is investigated. A new algorithm is proposed to encrypt in format-compliant and constant bitrate the transform coefficients (TCs) together with other syntax elements at the level of the entropy encoder. The proposed solution was integrated and assessed under the VVC reference software model version 6.0. Experimental results showed that the encryption drastically decreases the video quality while the encryption is robust against several types of attacks. The encryption space is estimated in the range of 15% to 26% of the bitstream size resulting in a lightweight encryption process. The web page of this work is available at https://gugautie.github.io/sevvc/.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Optical Flow and Mode Selection for Learning-based Video Coding
Authors:
Théo Ladune,
Pierrick Philippe,
Wassim Hamidouche,
Lu Zhang,
Olivier Déforges
Abstract:
This paper introduces a new method for inter-frame coding based on two complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing and conveying the Optical Flow and a pixel-wise coding Mode selection. The optical flow is used to perform a prediction of the frame to code. The coding mode selection enables competition between direct copy of the prediction or transmission through Codec…
▽ More
This paper introduces a new method for inter-frame coding based on two complementary autoencoders: MOFNet and CodecNet. MOFNet aims at computing and conveying the Optical Flow and a pixel-wise coding Mode selection. The optical flow is used to perform a prediction of the frame to code. The coding mode selection enables competition between direct copy of the prediction or transmission through CodecNet. The proposed coding scheme is assessed under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding conditions, where it is shown to perform on par with the state-of-the-art video codec ITU/MPEG HEVC. Moreover, the possibility of copying the prediction enables to learn the optical flow in an end-to-end fashion i.e. without relying on pre-training and/or a dedicated loss term.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
ModeNet: Mode Selection Network For Learned Video Coding
Authors:
Théo Ladune,
Pierrick Philippe,
Wassim Hamidouche,
Lu Zhang,
Olivier Déforges
Abstract:
In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different…
▽ More
In this paper, a mode selection network (ModeNet) is proposed to enhance deep learning-based video compression. Inspired by traditional video coding, ModeNet purpose is to enable competition among several coding modes. The proposed ModeNet learns and conveys a pixel-wise partitioning of the frame, used to assign each pixel to the most suited coding mode. ModeNet is trained alongside the different coding modes to minimize a rate-distortion cost. It is a flexible component which can be generalized to other systems to allow competition between different coding tools. Mod-eNet interest is studied on a P-frame coding task, where it is used to design a method for coding a frame given its prediction. ModeNet-based systems achieve compelling performance when evaluated under the Challenge on Learned Image Compression 2020 (CLIC20) P-frame coding track conditions.
△ Less
Submitted 31 July, 2020; v1 submitted 6 July, 2020;
originally announced July 2020.
-
Binary Probability Model for Learning Based Image Compression
Authors:
Théo Ladune,
Pierrick Philippe,
Wassim Hamidouche,
Lu Zhang,
Olivier Deforges
Abstract:
In this paper, we propose to enhance learned image compression systems with a richer probability model for the latent variables. Previous works model the latents with a Gaussian or a Laplace distribution. Inspired by binary arithmetic coding , we propose to signal the latents with three binary values and one integer, with different probability models. A relaxation method is designed to perform gra…
▽ More
In this paper, we propose to enhance learned image compression systems with a richer probability model for the latent variables. Previous works model the latents with a Gaussian or a Laplace distribution. Inspired by binary arithmetic coding , we propose to signal the latents with three binary values and one integer, with different probability models. A relaxation method is designed to perform gradient-based training. The richer probability model results in a better entropy coding leading to lower rate. Experiments under the Challenge on Learned Image Compression (CLIC) test conditions demonstrate that this method achieves 18% rate saving compared to Gaussian or Laplace models.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
A Fixation-based 360° Benchmark Dataset for Salient Object Detection
Authors:
Yi Zhang,
Lu Zhang,
Wassim Hamidouche,
Olivier Deforges
Abstract:
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. However, another issue within the field of visual saliency, salient object detection (SOD), has been seldom explored in 360° (or omnidirectional) images due to the lack of datasets representative of real scenes with pixel-level annotations. Toward this end,…
▽ More
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications. However, another issue within the field of visual saliency, salient object detection (SOD), has been seldom explored in 360° (or omnidirectional) images due to the lack of datasets representative of real scenes with pixel-level annotations. Toward this end, we collect 107 equirectangular panoramas with challenging scenes and multiple object classes. Based on the consistency between FP and explicit saliency judgements, we further manually annotate 1,165 salient objects over the collected images with precise masks under the guidance of real human eye fixation maps. Six state-of-the-art SOD models are then benchmarked on the proposed fixation-based 360° image dataset (F-360iSOD), by applying a multiple cubic projection-based fine-tuning method. Experimental results show a limitation of the current methods when used for SOD in panoramic images, which indicates the proposed dataset is challenging. Key issues for 360° SOD is also discussed. The proposed dataset is available at https://github.com/PanoAsh/F-360iSOD.
△ Less
Submitted 19 May, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
Quality Assessment of DIBR-synthesized views: An Overview
Authors:
Shishun Tian,
Lu Zhang,
Wenbin Zou,
Xia Li,
Ting Su,
Luce Morin,
Olivier Deforges
Abstract:
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, b…
▽ More
The Depth-Image-Based-Rendering (DIBR) is one of the main fundamental technique to generate new views in 3D video applications, such as Multi-View Videos (MVV), Free-Viewpoint Videos (FVV) and Virtual Reality (VR). However, the quality assessment of DIBR-synthesized views is quite different from the traditional 2D images/videos. In recent years, several efforts have been made towards this topic, but there {is a lack of} detailed survey in {the} literature. In this paper, we provide a comprehensive survey on various current approaches for DIBR-synthesized views. The current accessible datasets of DIBR-synthesized views are firstly reviewed{, followed} by a summary analysis of the representative state-of-the-art objective metrics. Then, the performances of different objective metrics are evaluated and discussed on all available datasets. Finally, we discuss the potential challenges and suggest possible directions for future research.
△ Less
Submitted 27 April, 2021; v1 submitted 16 November, 2019;
originally announced November 2019.
-
Perceptual Evaluation of Adversarial Attacks for CNN-based Image Classification
Authors:
Sid Ahmed Fezza,
Yassine Bakhti,
Wassim Hamidouche,
Olivier Déforges
Abstract:
Deep neural networks (DNNs) have recently achieved state-of-the-art performance and provide significant progress in many machine learning tasks, such as image classification, speech processing, natural language processing, etc. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. For instance, in the image classification domain, adding small imperceptible perturbatio…
▽ More
Deep neural networks (DNNs) have recently achieved state-of-the-art performance and provide significant progress in many machine learning tasks, such as image classification, speech processing, natural language processing, etc. However, recent studies have shown that DNNs are vulnerable to adversarial attacks. For instance, in the image classification domain, adding small imperceptible perturbations to the input image is sufficient to fool the DNN and to cause misclassification. The perturbed image, called \textit{adversarial example}, should be visually as close as possible to the original image. However, all the works proposed in the literature for generating adversarial examples have used the $L_{p}$ norms ($L_{0}$, $L_{2}$ and $L_{\infty}$) as distance metrics to quantify the similarity between the original image and the adversarial example. Nonetheless, the $L_{p}$ norms do not correlate with human judgment, making them not suitable to reliably assess the perceptual similarity/fidelity of adversarial examples. In this paper, we present a database for visual fidelity assessment of adversarial examples. We describe the creation of the database and evaluate the performance of fifteen state-of-the-art full-reference (FR) image fidelity assessment metrics that could substitute $L_{p}$ norms. The database as well as subjective scores are publicly available to help designing new metrics for adversarial examples and to facilitate future research works.
△ Less
Submitted 1 June, 2019;
originally announced June 2019.