Search | arXiv e-print repository

Multi-view Disparity Estimation Using a Novel Gradient Consistency Model

Authors: James L. Gray, Aous T. Naman, David S. Taubman

Abstract: Variational approaches to disparity estimation typically use a linearised brightness constancy constraint, which only applies in smooth regions and over small distances. Accordingly, current variational approaches rely on a schedule to progressively include image data. This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation; this information is u… ▽ More Variational approaches to disparity estimation typically use a linearised brightness constancy constraint, which only applies in smooth regions and over small distances. Accordingly, current variational approaches rely on a schedule to progressively include image data. This paper proposes the use of Gradient Consistency information to assess the validity of the linearisation; this information is used to determine the weights applied to the data term as part of an analytically inspired Gradient Consistency Model. The Gradient Consistency Model penalises the data term for view pairs that have a mismatch between the spatial gradients in the source view and the spatial gradients in the target view. Instead of relying on a tuned or learned schedule, the Gradient Consistency Model is self-scheduling, since the weights evolve as the algorithm progresses. We show that the Gradient Consistency Model outperforms standard coarse-to-fine schemes and the recently proposed progressive inclusion of views approach in both rate of convergence and accuracy. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 11 pages, 11 figures. Submitted to Transactions on Image Processing

arXiv:2403.01647 [pdf]

Neural Network Assisted Lifting Steps For Improved Fully Scalable Lossy Image Compression in JPEG 2000

Authors: Xinyue Li, Aous Naman, David Taubman

Abstract: This work proposes to augment the lifting steps of the conventional wavelet transform with additional neural network assisted lifting steps. These additional steps reduce residual redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. The proposed approach involves two steps, a high-to-low step fol… ▽ More This work proposes to augment the lifting steps of the conventional wavelet transform with additional neural network assisted lifting steps. These additional steps reduce residual redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions. The proposed approach involves two steps, a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands, so as to achieve higher energy compaction. The proposed two lifting steps are trained in an end-to-end fashion; we employ a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the networks employed in this paper are compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the proposed approach within the JPEG 2000 image coding standard, our method can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining quality and resolution scalability features of JPEG 2000. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2402.18761 [pdf, other]

Exploration of Learned Lifting-Based Transform Structures for Fully Scalable and Accessible Wavelet-Like Image Compression

Authors: Xinyue Li, Aous Naman, David Taubman

Abstract: This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the imp… ▽ More This paper provides a comprehensive study on features and performance of different ways to incorporate neural networks into lifting-based wavelet-like transforms, within the context of fully scalable and accessible image compression. Specifically, we explore different arrangements of lifting steps, as well as various network architectures for learned lifting operators. Moreover, we examine the impact of the number of learned lifting steps, the number of channels, the number of layers and the support of kernels in each learned lifting operator. To facilitate the study, we investigate two generic training methodologies that are simultaneously appropriate to a wide variety of lifting structures considered. Experimental results ultimately suggest that retaining fixed lifting steps from the base wavelet transform is highly beneficial. Moreover, we demonstrate that employing more learned lifting steps and more layers in each learned lifting operator do not contribute strongly to the compression performance. However, benefits can be obtained by utilizing more channels in each learned lifting operator. Ultimately, the learned wavelet-like transform proposed in this paper achieves over 25% bit-rate savings compared to JPEG 2000 with compact spatial support. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2110.00803 [pdf, other]

doi 10.1109/ICIP42928.2021.9506766

Welsch Based Multiview Disparity Estimation

Authors: James L. Gray, Aous T. Naman, David S. Taubman

Abstract: In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variationa… ▽ More In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variational framework for disparity estimation. We also propose a disciplined war** strategy and a progressive inclusion of views strategy that can reduce the need for coarse to fine strategies that discard high spatial frequency components from the early iterations. Experimental results demonstrate that the proposed approach produces superior and/or more robust estimates than other conventional variational approaches. △ Less

Submitted 2 October, 2021; originally announced October 2021.

Comments: Published in 2021 IEEE International Conference on Image Processing (ICIP), 5 pages

Journal ref: 2021 IEEE International Conference on Image Processing (ICIP), 2021, pp. 3223-3227,

arXiv:2102.01307 [pdf, other]

doi 10.1109/ICIP42928.2021.9506150

Human-Machine Collaborative Video Coding Through Cuboidal Partitioning

Authors: Ashek Ahmmed, Manoranjan Paul, Manzur Murshed, David Taubman

Abstract: Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a… ▽ More Video coding algorithms encode and decode an entire video frame while feature coding techniques only preserve and communicate the most critical information needed for a given application. This is because video coding targets human perception, while feature coding aims for machine vision tasks. Recently, attempts are being made to bridge the gap between these two domains. In this work, we propose a video coding framework by leveraging on to the commonality that exists between human vision and machine vision applications using cuboids. This is because cuboids, estimated rectangular regions over a video frame, are computationally efficient, has a compact representation and object centric. Such properties are already shown to add value to traditional video coding systems. Herein cuboidal feature descriptors are extracted from the current frame and then employed for accomplishing a machine vision task in the form of object detection. Experimental results show that a trained classifier yields superior average precision when equipped with cuboidal features oriented representation of the current test frame. Additionally, this representation costs $7\%$ less in bit rate if the captured frames are need be communicated to a receiver. △ Less

Submitted 2 September, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

arXiv:2009.01174 [pdf]

doi 10.1109/TPAMI.2021.3084839

Transform Quantization for CNN (Convolutional Neural Network) Compression

Authors: Sean I. Young, Wang Zhe, David Taubman, Bernd Girod

Abstract: In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trai… ▽ More In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits). △ Less

Submitted 7 November, 2021; v1 submitted 2 September, 2020; originally announced September 2020.

Comments: To appear in IEEE Trans Pattern Anal Mach Intell

Showing 1–6 of 6 results for author: Taubman, D