Search | arXiv e-print repository

Memory-efficient deep end-to-end posterior network (DEEPEN) for inverse problems

Authors: Jyothi Rikhab Chand, Mathews Jacob

Abstract: End-to-End (E2E) unrolled optimization frameworks show promise for Magnetic Resonance (MR) image recovery, but suffer from high memory usage during training. In addition, these deterministic approaches do not offer opportunities for sampling from the posterior distribution. In this paper, we introduce a memory-efficient approach for E2E learning of the posterior distribution. We represent this dis… ▽ More End-to-End (E2E) unrolled optimization frameworks show promise for Magnetic Resonance (MR) image recovery, but suffer from high memory usage during training. In addition, these deterministic approaches do not offer opportunities for sampling from the posterior distribution. In this paper, we introduce a memory-efficient approach for E2E learning of the posterior distribution. We represent this distribution as the combination of a data-consistency-induced likelihood term and an energy model for the prior, parameterized by a Convolutional Neural Network (CNN). The CNN weights are learned from training data in an E2E fashion using maximum likelihood optimization. The learned model enables the recovery of images from undersampled measurements using the Maximum A Posteriori (MAP) optimization. In addition, the posterior model can be sampled to derive uncertainty maps about the reconstruction. Experiments on parallel MR image reconstruction show that our approach performs comparable to the memory-intensive E2E unrolled algorithm, performs better than its memory-efficient counterpart, and can provide uncertainty maps. Our framework paves the way towards MR image reconstruction in 3D and higher dimensions △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.13211 [pdf, other]

DSFormer: Effective Compression of Text-Transformers by Dense-Sparse Weight Factorization

Authors: Rahul Chand, Yashoteja Prabhu, Pratyush Kumar

Abstract: With the tremendous success of large transformer models in natural language understanding, down-sizing them for cost-effective deployments has become critical. Recent studies have explored the low-rank weight factorization techniques which are efficient to train, and apply out-of-the-box to any transformer architecture. Unfortunately, the low-rank assumption tends to be over-restrictive and hinder… ▽ More With the tremendous success of large transformer models in natural language understanding, down-sizing them for cost-effective deployments has become critical. Recent studies have explored the low-rank weight factorization techniques which are efficient to train, and apply out-of-the-box to any transformer architecture. Unfortunately, the low-rank assumption tends to be over-restrictive and hinders the expressiveness of the compressed model. This paper proposes, DSFormer, a simple alternative factorization scheme which expresses a target weight matrix as the product of a small dense and a semi-structured sparse matrix. The resulting approximation is more faithful to the weight distribution in transformers and therefore achieves a stronger efficiency-accuracy trade-off. Another concern with existing factorizers is their dependence on a task-unaware initialization step which degrades the accuracy of the resulting model. DSFormer addresses this issue through a novel Straight-Through Factorizer (STF) algorithm that jointly learns all the weight factorizations to directly maximize the final task accuracy. Extensive experiments on multiple natural language understanding benchmarks demonstrate that DSFormer obtains up to 40% better compression than the state-of-the-art low-rank factorizers, leading semi-structured sparsity baselines and popular knowledge distillation approaches. Our approach is also orthogonal to mainstream compressors and offers up to 50% additional compression when added to popular distilled, layer-shared and quantized transformers. We empirically evaluate the benefits of STF over conventional optimization practices. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 9 page main paper. 1 page appendix

arXiv:2312.00386 [pdf, other]

Local monotone operator learning using non-monotone operators: MnM-MOL

Authors: Maneesh John, Jyothi Rikhab Chand, Mathews Jacob

Abstract: The recovery of magnetic resonance (MR) images from undersampled measurements is a key problem that has seen extensive research in recent years. Unrolled approaches, which rely on end-to-end training of convolutional neural network (CNN) blocks within iterative reconstruction algorithms, offer state-of-the-art performance. These algorithms require a large amount of memory during training, making t… ▽ More The recovery of magnetic resonance (MR) images from undersampled measurements is a key problem that has seen extensive research in recent years. Unrolled approaches, which rely on end-to-end training of convolutional neural network (CNN) blocks within iterative reconstruction algorithms, offer state-of-the-art performance. These algorithms require a large amount of memory during training, making them difficult to employ in high-dimensional applications. Deep equilibrium (DEQ) models and the recent monotone operator learning (MOL) approach were introduced to eliminate the need for unrolling, thus reducing the memory demand during training. Both approaches require a Lipschitz constraint on the network to ensure that the forward and backpropagation iterations converge. Unfortunately, the constraint often results in reduced performance compared to unrolled methods. The main focus of this work is to relax the constraint on the CNN block in two different ways. Inspired by convex-non-convex regularization strategies, we now impose the monotone constraint on the sum of the gradient of the data term and the CNN block, rather than constrain the CNN itself to be a monotone operator. This approach enables the CNN to learn possibly non-monotone score functions, which can translate to improved performance. In addition, we only restrict the operator to be monotone in a local neighborhood around the image manifold. Our theoretical results show that the proposed algorithm is guaranteed to converge to the fixed point and that the solution is robust to input perturbations, provided that it is initialized close to the true solution. Our empirical results show that the relaxed constraints translate to improved performance and that the approach enjoys robustness to input perturbations similar to MOL. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 10 pages, 7 figures

arXiv:2304.00306 [pdf, other]

CapsFlow: Optical Flow Estimation with Capsule Networks

Authors: Rahul Chand, Rajat Arora, K Ram Prabhakar, R Venkatesh Babu

Abstract: We present a framework to use recently introduced Capsule Networks for solving the problem of Optical Flow, one of the fundamental computer vision tasks. Most of the existing state of the art deep architectures either uses a correlation oepration to match features from them. While correlation layer is sensitive to the choice of hyperparameters and does not put a prior on the underlying structure o… ▽ More We present a framework to use recently introduced Capsule Networks for solving the problem of Optical Flow, one of the fundamental computer vision tasks. Most of the existing state of the art deep architectures either uses a correlation oepration to match features from them. While correlation layer is sensitive to the choice of hyperparameters and does not put a prior on the underlying structure of the object, spatio temporal features will be limited by the network's receptive field. Also, we as humans look at moving objects as whole, something which cannot be encoded by correlation or spatio temporal features. Capsules, on the other hand, are specialized to model seperate entities and their pose as a continuous matrix. Thus, we show that a simpler linear operation over poses of the objects detected by the capsules in enough to model flow. We show reslts on a small toy dataset where we outperform FlowNetC and PWC-Net models. △ Less

Submitted 1 December, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

Comments: Newer version added to correct issue in the conference name of the previous version uploaded on April 1st

arXiv:2302.11570 [pdf, other]

Plug-and-Play Deep Energy Model for Inverse problems

Authors: Jyothi Rikabh Chand, Mathews Jacob

Abstract: We introduce a novel energy formulation for Plug- and-Play (PnP) image recovery. Traditional PnP methods that use a convolutional neural network (CNN) do not have an energy based formulation. The primary focus of this work is to introduce an energy-based PnP formulation, which relies on a CNN that learns the log of the image prior from training data. The score function is evaluated as the gradient… ▽ More We introduce a novel energy formulation for Plug- and-Play (PnP) image recovery. Traditional PnP methods that use a convolutional neural network (CNN) do not have an energy based formulation. The primary focus of this work is to introduce an energy-based PnP formulation, which relies on a CNN that learns the log of the image prior from training data. The score function is evaluated as the gradient of the energy model, which resembles a UNET with shared encoder and decoder weights. The proposed score function is thus constrained to a conservative vector field, which is the key difference with classical PnP models. The energy-based formulation offers algorithms with convergence guarantees, even when the learned score model is not a contraction. The relaxation of the contraction constraint allows the proposed model to learn more complex priors, thus offering improved performance over traditional PnP schemes. Our experiments in magnetic resonance image reconstruction demonstrates the improved performance offered by the proposed energy model over traditional PnP methods. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Showing 1–5 of 5 results for author: Chand, R