Search | arXiv e-print repository

DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection

Authors: Ruituo Wu, Yang Chen, Jian Xiao, Bing Li, Jicong Fan, Frédéric Dufaux, Ce Zhu, Yipeng Liu

Abstract: Cooperation between temporal convolutional networks (TCN) and graph convolutional networks (GCN) as a processing module has shown promising results in skeleton-based video anomaly detection (SVAD). However, to maintain a lightweight model with low computational and storage complexity, shallow GCN and TCN blocks are constrained by small receptive fields and a lack of cross-dimension interaction cap… ▽ More Cooperation between temporal convolutional networks (TCN) and graph convolutional networks (GCN) as a processing module has shown promising results in skeleton-based video anomaly detection (SVAD). However, to maintain a lightweight model with low computational and storage complexity, shallow GCN and TCN blocks are constrained by small receptive fields and a lack of cross-dimension interaction capture. To tackle this limitation, we propose a lightweight module called the Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in spatio-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops. Furthermore, the proposed Dual Attention Normalizing Flow (DA-Flow) integrates the DAM as a post-processing unit after GCN within the normalizing flow framework. Simulations show that the proposed model is robust against noise and negative samples. Experimental results show that DA-Flow reaches competitive or better performance than the existing state-of-the-art (SOTA) methods in terms of the micro AUC metric with the fewest number of parameters. Moreover, we found that even without training, simply using random projection without dimensionality reduction on skeleton data enables substantial anomaly detection capabilities. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2303.08634 [pdf, other]

Quality evaluation of point clouds: a novel no-reference approach using transformer-based architecture

Authors: Marouane Tliba, Aladine Chetouani, Giuseppe Valenzise, Frederic Dufaux

Abstract: With the increased interest in immersive experiences, point cloud came to birth and was widely adopted as the first choice to represent 3D media. Besides several distortions that could affect the 3D content spanning from acquisition to rendering, efficient transmission of such volumetric content over traditional communication systems stands at the expense of the delivered perceptual quality. To es… ▽ More With the increased interest in immersive experiences, point cloud came to birth and was widely adopted as the first choice to represent 3D media. Besides several distortions that could affect the 3D content spanning from acquisition to rendering, efficient transmission of such volumetric content over traditional communication systems stands at the expense of the delivered perceptual quality. To estimate the magnitude of such degradation, employing quality metrics became an inevitable solution. In this work, we propose a novel deep-based no-reference quality metric that operates directly on the whole point cloud without requiring extensive pre-processing, enabling real-time evaluation over both transmission and rendering levels. To do so, we use a novel model design consisting primarily of cross and self-attention layers, in order to learn the best set of local semantic affinities while kee** the best combination of geometry and color information in multiple levels from basic features extraction to deep representation modeling. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2211.02459

arXiv:2211.02459 [pdf, other]

PCQA-GRAPHPOINT: Efficients Deep-Based Graph Metric For Point Cloud Quality Assessment

Authors: Marouane Tliba, Aladine Chetouani, Giuseppe Valenzise, Frederic Dufaux

Abstract: Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compressed 3D content remain open problems and an area of… ▽ More Following the advent of immersive technologies and the increasing interest in representing interactive geometrical format, 3D Point Clouds (PC) have emerged as a promising solution and effective means to display 3D visual information. In addition to other challenges in immersive applications, objective and subjective quality assessments of compressed 3D content remain open problems and an area of research interest. Yet most of the efforts in the research area ignore the local geometrical structures between points representation. In this paper, we overcome this limitation by introducing a novel and efficient objective metric for Point Clouds Quality Assessment, by learning local intrinsic dependencies using Graph Neural Network (GNN). To evaluate the performance of our method, two well-known datasets have been used. The results demonstrate the effectiveness and reliability of our solution compared to state-of-the-art metrics. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2106.12746 [pdf]

A Global Appearance and Local Coding Distortion based Fusion Framework for CNN based Filtering in Video Coding

Authors: Jian Yue, Yanbo Gao, Shuai Li, Hui Yuan, Frédéric Dufaux

Abstract: In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed frame is also obtained by a fixed line of block ba… ▽ More In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts. With the development of convolutional neural networks (CNNs), CNNs have been explored for in-loop filtering considering it can be treated as an image de-noising task. However, in addition to being a distorted image, the reconstructed frame is also obtained by a fixed line of block based encoding operations in video coding. It carries coding-unit based coding distortion of some similar characteristics. Therefore, in this paper, we address the filtering problem from two aspects, global appearance restoration for disrupted texture and local coding distortion restoration caused by fixed pipeline of coding. Accordingly, a three-stream global appearance and local coding distortion based fusion network is developed with a high-level global feature stream, a high-level local feature stream and a low-level local feature stream. Ablation study is conducted to validate the necessity of different features, demonstrating that the global features and local features can complement each other in filtering and achieve better performance when combined. To the best of our knowledge, we are the first one that clearly characterizes the video filtering process from the above global appearance and local coding distortion restoration aspects with experimental verification, providing a clear pathway to develo** filter techniques. Experimental results demonstrate that the proposed method significantly outperforms the existing single-frame based methods and achieves 13.5%, 11.3%, 11.7% BD-Rate saving on average for AI, LDP and RA configurations, respectively, compared with the HEVC reference software. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2102.12839 [pdf, other]

A deep perceptual metric for 3D point clouds

Authors: Maurice Quach, Aladine Chetouani, Giuseppe Valenzise, Frederic Dufaux

Abstract: Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to tr… ▽ More Point clouds are essential for storage and transmission of 3D content. As they can entail significant volumes of data, point cloud compression is crucial for practical usage. Recently, point cloud geometry compression approaches based on deep neural networks have been explored. In this paper, we evaluate the ability to predict perceptual quality of typical voxel-based loss functions employed to train these networks. We find that the commonly used focal loss and weighted binary cross entropy are poorly correlated with human perception. We thus propose a perceptual loss function for 3D point clouds which outperforms existing loss functions on the ICIP2020 subjective dataset. In addition, we propose a novel truncated distance field voxel grid representation and find that it leads to sparser latent spaces and loss functions that are more correlated with perceived visual quality compared to a binary representation. The source code is available at https://github.com/mauriceqch/2021_pc_perceptual_loss. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: Presented at IS&T Electronic Imaging: Image Quality and System Performance, January 2021

arXiv:2006.09043 [pdf, other]

Improved Deep Point Cloud Geometry Compression

Authors: Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Abstract: Point clouds have been recognized as a crucial data structure for 3D content and are essential in a number of applications such as virtual and mixed reality, autonomous driving, cultural heritage, etc. In this paper, we propose a set of contributions to improve deep point cloud compression, i.e.: using a scale hyperprior model for entropy coding; employing deeper transforms; a different balancing… ▽ More Point clouds have been recognized as a crucial data structure for 3D content and are essential in a number of applications such as virtual and mixed reality, autonomous driving, cultural heritage, etc. In this paper, we propose a set of contributions to improve deep point cloud compression, i.e.: using a scale hyperprior model for entropy coding; employing deeper transforms; a different balancing weight in the focal loss; optimal thresholding for decoding; and sequential model training. In addition, we present an extensive ablation study on the impact of each of these factors, in order to provide a better understanding about why they improve RD performance. An optimal combination of the proposed improvements achieves BD-PSNR gains over G-PCC trisoup and octree of 5.50 (6.48) dB and 6.84 (5.95) dB, respectively, when using the point-to-point (point-to-plane) metric. Code is available at https://github.com/mauriceqch/pcc_geo_cnn_v2 . △ Less

Submitted 24 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Code is available at https://github.com/mauriceqch/pcc_geo_cnn_v2

arXiv:2002.04439 [pdf, other]

Folding-based compression of point cloud attributes

Authors: Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Abstract: Existing techniques to compress point cloud attributes leverage either geometric or video-based compression tools. We explore a radically different approach inspired by recent advances in point cloud representation learning. Point clouds can be interpreted as 2D manifolds in 3D space. Specifically, we fold a 2D grid onto a point cloud and we map attributes from the point cloud onto the folded 2D g… ▽ More Existing techniques to compress point cloud attributes leverage either geometric or video-based compression tools. We explore a radically different approach inspired by recent advances in point cloud representation learning. Point clouds can be interpreted as 2D manifolds in 3D space. Specifically, we fold a 2D grid onto a point cloud and we map attributes from the point cloud onto the folded 2D grid using a novel optimized map** method. This map** results in an image, which opens a way to apply existing image processing techniques on point cloud attributes. However, as this map** process is lossy in nature, we propose several strategies to refine it so that attributes can be mapped to the 2D grid with minimal distortion. Moreover, this approach can be flexibly applied to point cloud patches in order to better adapt to local geometric complexity. In this work, we consider point cloud attribute compression; thus, we compress this image with a conventional 2D image codec. Our preliminary results show that the proposed folding-based coding scheme can already reach performance similar to the latest MPEG Geometry-based PCC (G-PCC) codec. △ Less

Submitted 22 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: Published in ICIP 2020. The source code can be found at https://github.com/mauriceqch/pcc_attr_folding

arXiv:1910.01968 [pdf, other]

Generating Relevant Counter-Examples from a Positive Unlabeled Dataset for Image Classification

Authors: Florent Chiaroni, Ghazaleh Khodabandelou, Mohamed-Cherif Rahal, Nicolas Hueber, Frederic Dufaux

Abstract: With surge of available but unlabeled data, Positive Unlabeled (PU) learning is becoming a thriving challenge. This work deals with this demanding task for which recent GAN-based PU approaches have demonstrated promising results. Generative adversarial Networks (GANs) are not hampered by deterministic bias or need for specific dimensionality. However, existing GAN-based PU approaches also present… ▽ More With surge of available but unlabeled data, Positive Unlabeled (PU) learning is becoming a thriving challenge. This work deals with this demanding task for which recent GAN-based PU approaches have demonstrated promising results. Generative adversarial Networks (GANs) are not hampered by deterministic bias or need for specific dimensionality. However, existing GAN-based PU approaches also present some drawbacks such as sensitive dependence to prior knowledge, a cumbersome architecture or first-stage overfitting. To settle these issues, we propose to incorporate a biased PU risk within the standard GAN discriminator loss function. In this manner, the discriminator is constrained to request the generator to converge towards the unlabeled samples distribution while diverging from the positive samples distribution. This enables the proposed model, referred to as D-GAN, to exclusively learn the counter-examples distribution without prior knowledge. Experiments demonstrate that our approach outperforms state-of-the-art PU methods without prior by overcoming their issues. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: Submitted to Pattern Recognition

arXiv:1910.01636 [pdf, other]

Self-supervised learning for autonomous vehicles perception: A conciliation between analytical and learning methods

Authors: Florent Chiaroni, Mohamed-Cherif Rahal, Nicolas Hueber, Frederic Dufaux

Abstract: Nowadays, supervised deep learning techniques yield the best state-of-the-art prediction performances for a wide variety of computer vision tasks. However, such supervised techniques generally require a large amount of manually labeled training data. In the context of autonomous vehicles perception, this requirement is critical, as the distribution of sensor data can continuously change and includ… ▽ More Nowadays, supervised deep learning techniques yield the best state-of-the-art prediction performances for a wide variety of computer vision tasks. However, such supervised techniques generally require a large amount of manually labeled training data. In the context of autonomous vehicles perception, this requirement is critical, as the distribution of sensor data can continuously change and include several unexpected variations. It turns out that a category of learning techniques, referred to as self-supervised learning (SSL), consists of replacing the manual labeling effort by an automatic labeling process. Thanks to their ability to learn on the application time and in varying environments, state-of-the-art SSL techniques provide a valid alternative to supervised learning for a variety of different tasks, including long-range traversable area segmentation, moving obstacle instance segmentation, long-term moving obstacle tracking, or depth map prediction. In this tutorial-style article, we present an overview and a general formalization of the concept of self-supervised learning (SSL) for autonomous vehicles perception. This formalization provides helpful guidelines for develo** novel frameworks based on generic SSL principles. Moreover, it enables to point out significant challenges in the design of future SSL systems. △ Less

Submitted 7 June, 2020; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1908.04197 [pdf, other]

doi 10.1109/TIP.2019.2936649

Deep Tone Map** Operator for High Dynamic Range Images

Authors: Aakanksha Rana, Praveer Singh, Giuseppe Valenzise, Frederic Dufaux, Nikos Komodakis, Aljosa Smolic

Abstract: A computationally fast tone map** operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subje… ▽ More A computationally fast tone map** operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone map** operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Map** Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method. △ Less

Submitted 12 August, 2019; originally announced August 2019.

arXiv:1903.08548 [pdf, other]

doi 10.1109/ICIP.2019.8803413

Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression

Authors: Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

Abstract: Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and… ▽ More Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates. Code and supplementary material are available at https://github.com/mauriceqch/pcc_geo_cnn . △ Less

Submitted 22 May, 2019; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: Published in ICIP 2019. The source code can be found at https://github.com/mauriceqch/pcc_geo_cnn and the supplementary material can be found at https://www.mauricequach.com/pcc_geo_cnn_samples

arXiv:1803.04053 [pdf, other]

Learning Local Distortion Visibility From Image Quality Data-sets

Authors: Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux, Irene Cheng

Abstract: Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural images with complex types of distortion. In this pape… ▽ More Accurate prediction of local distortion visibility thresholds is critical in many image and video processing applications. Existing methods require an accurate modeling of the human visual system, and are derived through pshycophysical experiments with simple, artificial stimuli. These approaches, however, are difficult to generalize to natural images with complex types of distortion. In this paper, we explore a different perspective, and we investigate whether it is possible to learn local distortion visibility from image quality scores. We propose a convolutional neural network based optimization framework to infer local detection thresholds in a distorted image. Our model is trained on multiple quality datasets, and the results are correlated with empirical visibility thresholds collected on complex stimuli in a recent study. Our results are comparable to state-of-the-art mathematical models that were trained on phsycovisual data directly. This suggests that it is possible to predict psychophysical phenomena from visibility information embedded in image quality scores. △ Less

Submitted 11 March, 2018; originally announced March 2018.

arXiv:1712.07269 [pdf, other]

doi 10.1109/TIP.2017.2778570

Blind High Dynamic Range Quality estimation by disentangling perceptual and noise features in images

Authors: Navaneeth Kamballur Kottayil, Giuseppe Valenzise, Frederic Dufaux, Irene Cheng

Abstract: Assessing the visual quality of High Dynamic Range (HDR) images is an unexplored and an interesting research topic that has become relevant with the current boom in HDR technology. We propose a new convolutional neural network based model for No reference image quality assessment(NR-IQA) on HDR data. This model predicts the amount and location of noise, perceptual influence of image pixels on the… ▽ More Assessing the visual quality of High Dynamic Range (HDR) images is an unexplored and an interesting research topic that has become relevant with the current boom in HDR technology. We propose a new convolutional neural network based model for No reference image quality assessment(NR-IQA) on HDR data. This model predicts the amount and location of noise, perceptual influence of image pixels on the noise, and the perceived quality, of a distorted image without any reference image. The proposed model extracts numerical values corresponding to the noise present in any given distorted image, and the perceptual effects exhibited by a human eye when presented with the same. These two measures are extracted separately yet sequentially and combined in a mixing function to compute the quality of the distorted image perceived by a human eye. Our training process derives the the component that computes perceptual effects from a real world image quality dataset, rather than using results of psycovisual experiments. With the proposed model, we demonstrate state of the art performance for HDR NR-IQA and our results show performance similar to HDR Full Reference Image Quality Assessment algorithms (FR-IQA). △ Less

Submitted 19 December, 2017; originally announced December 2017.

arXiv:1712.00043 [pdf, other]

doi 10.1007/s11760-016-0873-x

A Color Intensity Invariant Low Level Feature Optimization Framework for Image Quality Assessment

Authors: Navaneeth K. Kottayil, Irene Cheng, Frederic Dufaux, Anup Basu

Abstract: Image Quality Assessment (IQA) algorithms evaluate the perceptual quality of an image using evaluation scores that assess the similarity or difference between two images. We propose a new low-level feature based IQA technique, which applies filter-bank decomposition and center-surround methodology. Differing from existing methods, our model incorporates color intensity adaptation and frequency sca… ▽ More Image Quality Assessment (IQA) algorithms evaluate the perceptual quality of an image using evaluation scores that assess the similarity or difference between two images. We propose a new low-level feature based IQA technique, which applies filter-bank decomposition and center-surround methodology. Differing from existing methods, our model incorporates color intensity adaptation and frequency scaling optimization at each filter-bank level and spatial orientation to extract and enhance perceptually significant features. Our computational model exploits the concept of object detection and encapsulates characteristics proposed in other IQA algorithms in a unified architecture. We also propose a systematic approach to review the evolution of IQA algorithms using unbiased test datasets, instead of looking at individual scores in isolation. Experimental results demonstrate the feasibility of our approach. △ Less

Submitted 30 November, 2017; originally announced December 2017.

Journal ref: Signal, Image and Video Processing 10.6 (2016):1169-1176

arXiv:1707.09791 [pdf, ps, other]

Intra Prediction Using In-Loop Residual Coding for the post-HEVC Standard

Authors: Mohsen Abdoli, Félix Henry, Patric Brault, Pierre Duhamel, Frédéric Dufaux

Abstract: A few years after standardization of the High Efficiency Video Coding (HEVC), now the Joint Video Exploration Team (JVET) group is exploring post-HEVC video compression technologies. In the intra prediction domain, this effort has resulted in an algorithm with 67 internal modes, new filters and tools which significantly improve HEVC. However, the improved algorithm still suffers from the long dist… ▽ More A few years after standardization of the High Efficiency Video Coding (HEVC), now the Joint Video Exploration Team (JVET) group is exploring post-HEVC video compression technologies. In the intra prediction domain, this effort has resulted in an algorithm with 67 internal modes, new filters and tools which significantly improve HEVC. However, the improved algorithm still suffers from the long distance prediction inaccuracy problem. In this paper, we propose an In-Loop Residual coding Intra Prediction (ILR-IP) algorithm which utilizes inner-block reconstructed pixels as references to reduce the distance from predicted pixels. This is done by using the ILR signal for partially reconstructing each pixel, right after its prediction and before its block-level out-loop residual calculation. The ILR signal is decided in the rate-distortion sense, by a brute-force search on a QP-dependent finite codebook that is known to the decoder. Experiments show that the proposed ILR-IP algorithm improves the existing method in the Joint Exploration Model (JEM) up to 0.45% in terms of bit rate saving, without complexity overhead at the decoder side. △ Less

Submitted 31 July, 2017; originally announced July 2017.

Comments: 6 pages, 5 figure, Conference: IEEE 19th International Workshop on Multimedia Signal Processing, Luton, UK

Showing 1–15 of 15 results for author: Dufaux, F