-
Multi-view Disparity Estimation Using a Novel Gradient Consistency Model
Authors:
James L. Gray,
Aous T. Naman,
David S. Taubman
Abstract:
Variational approaches to disparity estimation typically use a linearised brightness constancy constraint, which only applies in smooth regions and over small distances. Accordingly, current variational approaches rely on a schedule to progressively include image data. This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation; this information is u…
▽ More
Variational approaches to disparity estimation typically use a linearised brightness constancy constraint, which only applies in smooth regions and over small distances. Accordingly, current variational approaches rely on a schedule to progressively include image data. This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation; this information is used to determine the weights applied to the data term as part of an analytically inspired Gradient Consistency Model. The Gradient Consistency Model penalises the data term for view pairs that have a mismatch between the spatial gradients in the source view and the spatial gradients in the target view. Instead of relying on a tuned or learned schedule, the Gradient Consistency Model is self-scheduling, since the weights evolve as the algorithm progresses. We show that the Gradient Consistency Model outperforms standard coarse-to-fine schemes and the recently proposed progressive inclusion of views approach in both rate of convergence and accuracy.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Neural Network Assisted Lifting Steps For Improved Fully Scalable Lossy Image Compression in JPEG 2000
Authors:
Xinyue Li,
Aous Naman,
David Taubman
Abstract:
This work proposes to augment the lifting steps of the conventional wavelet transform with additional neural network assisted lifting steps. These additional steps reduce residual redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. The proposed approach involves two steps, a high-to-low step fol…
▽ More
This work proposes to augment the lifting steps of the conventional wavelet transform with additional neural network assisted lifting steps. These additional steps reduce residual redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. The proposed approach involves two steps, a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands, so as to achieve higher energy compaction. The proposed two lifting steps are trained in an end-to-end fashion; we employ a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the networks employed in this paper are compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the proposed approach within the JPEG 2000 image coding standard, our method can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining quality and resolution scalability features of JPEG 2000.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression
Authors:
Xinyue Li,
Aous Naman,
David Taubman
Abstract:
This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the imp…
▽ More
This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the impact of the number of learned lifting steps, the number of channels, the number of layers and the support of kernels in each learned lifting operator. To facilitate the study, we investigate two generic training methodologies that are simultaneously appropriate to a wide variety of lifting structures considered. Experimental results ultimately suggest that retaining fixed lifting steps from the base wavelet transform is highly beneficial. Moreover, we demonstrate that employing more learned lifting steps and more layers in each learned lifting operator do not contribute strongly to the compression performance. However, benefits can be obtained by utilizing more channels in each learned lifting operator. Ultimately, the learned wavelet-like transform proposed in this paper achieves over 25% bit-rate savings compared to JPEG 2000 with compact spatial support.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Welsch Based Multiview Disparity Estimation
Authors:
James L. Gray,
Aous T. Naman,
David S. Taubman
Abstract:
In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variationa…
▽ More
In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variational framework for disparity estimation. We also propose a disciplined war** strategy and a progressive inclusion of views strategy that can reduce the need for coarse to fine strategies that discard high spatial frequency components from the early iterations. Experimental results demonstrate that the proposed approach produces superior and/or more robust estimates than other conventional variational approaches.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Human-Machine Collaborative Video Coding Through Cuboidal Partitioning
Authors:
Ashek Ahmmed,
Manoranjan Paul,
Manzur Murshed,
David Taubman
Abstract:
Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a…
▽ More
Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs $7\%$ less in bit rate if the captured frames are need be communicated to a receiver.
△ Less
Submitted 2 September, 2021; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Transform Quantization for CNN (Convolutional Neural Network) Compression
Authors:
Sean I. Young,
Wang Zhe,
David Taubman,
Bernd Girod
Abstract:
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trai…
▽ More
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).
△ Less
Submitted 7 November, 2021; v1 submitted 2 September, 2020;
originally announced September 2020.