Search | arXiv e-print repository

From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers

Authors: Swaminathan Gurumurthy, Karnik Ram, Bingqing Chen, Zachary Manchester, Zico Kolter

Abstract: Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem (often computed using a deep network) followed by a weighted least squares optimization problem to solve for the poses. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. However, tra… ▽ More Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem (often computed using a deep network) followed by a weighted least squares optimization problem to solve for the poses. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. However, training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training. In this work, we take the visual odometry problem as an example and identify three plausible causes: (1) flow loss interference, (2) linearization errors in the bundle adjustment (BA) layer, and (3) dependence of weight gradients on the BA residual. We show how these issues result in noisy and higher variance gradients, potentially leading to a slow down in training and instabilities. We then propose a simple, yet effective solution to reduce the gradient variance by using the weights predicted by the network in the inner optimization loop to weight the correspondence objective in the training problem. This helps the training objective `focus' on the more important points, thereby reducing the variance and mitigating the influence of outliers. We show that the resulting method leads to faster training and can be more flexibly trained in varying training setups without sacrificing performance. In particular we show $2$--$2.5\times$ training speedups over a baseline visual odometry model we modify. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted at CVPR 2024

arXiv:2311.18056 [pdf, other]

ReLU-QP: A GPU-Accelerated Quadratic Programming Solver for Model-Predictive Control

Authors: Arun L. Bishop, John Z. Zhang, Swaminathan Gurumurthy, Kevin Tracy, Zachary Manchester

Abstract: We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the depl… ▽ More We present ReLU-QP, a GPU-accelerated solver for quadratic programs (QPs) that is capable of solving high-dimensional control problems at real-time rates. ReLU-QP is derived by exactly reformulating the Alternating Direction Method of Multipliers (ADMM) algorithm for solving QPs as a deep, weight-tied neural network with rectified linear unit (ReLU) activations. This reformulation enables the deployment of ReLU-QP on GPUs using standard machine-learning toolboxes. We evaluate the performance of ReLU-QP across three model-predictive control (MPC) benchmarks: stabilizing random linear dynamical systems with control limits, balancing an Atlas humanoid robot on a single foot, and tracking whole-body reference trajectories on a quadruped equipped with a six-degree-of-freedom arm. These benchmarks indicate that ReLU-QP is competitive with state-of-the-art CPU-based solvers for small-to-medium-scale problems and offers order-of-magnitude speed improvements for larger-scale problems. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: submitted to ICRA 2024

arXiv:2111.13236 [pdf, other]

Joint inference and input optimization in equilibrium networks

Authors: Swaminathan Gurumurthy, Shaojie Bai, Zachary Manchester, J. Zico Kolter

Abstract: Many tasks in deep learning involve optimizing over the \emph{inputs} to a network to minimize or maximize some objective; examples include optimization over latent spaces in a generative model to match a target image, or adversarially perturbing an input to worsen classifier performance. Performing such optimization, however, is traditionally quite costly, as it involves a complete forward and ba… ▽ More Many tasks in deep learning involve optimizing over the \emph{inputs} to a network to minimize or maximize some objective; examples include optimization over latent spaces in a generative model to match a target image, or adversarially perturbing an input to worsen classifier performance. Performing such optimization, however, is traditionally quite costly, as it involves a complete forward and backward pass through the network for each gradient step. In a separate line of work, a recent thread of research has developed the deep equilibrium (DEQ) model, a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. In this paper, we show that there is a natural synergy between these two settings. Although, naively using DEQs for these optimization problems is expensive (owing to the time needed to compute a fixed point for each gradient step), we can leverage the fact that gradient-based optimization can \emph{itself} be cast as a fixed point iteration to substantially improve the overall speed. That is, we \emph{simultaneously} both solve for the DEQ fixed point \emph{and} optimize over network inputs, all within a single ``augmented'' DEQ model that jointly encodes both the original network and the optimization process. Indeed, the procedure is fast enough that it allows us to efficiently \emph{train} DEQ models for tasks traditionally relying on an ``inner'' optimization loop. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: Neurips 2021

Journal ref: Neurips 2021

arXiv:1911.04024 [pdf, other]

MAME : Model-Agnostic Meta-Exploration

Authors: Swaminathan Gurumurthy, Sumit Kumar, Katia Sycara

Abstract: Meta-Reinforcement learning approaches aim to develop learning procedures that can adapt quickly to a distribution of tasks with the help of a few examples. Develo** efficient exploration strategies capable of finding the most useful samples becomes critical in such settings. Existing approaches towards finding efficient exploration strategies add auxiliary objectives to promote exploration by t… ▽ More Meta-Reinforcement learning approaches aim to develop learning procedures that can adapt quickly to a distribution of tasks with the help of a few examples. Develo** efficient exploration strategies capable of finding the most useful samples becomes critical in such settings. Existing approaches towards finding efficient exploration strategies add auxiliary objectives to promote exploration by the pre-update policy, however, this makes the adaptation using a few gradient steps difficult as the pre-update (exploration) and post-update (exploitation) policies are often quite different. Instead, we propose to explicitly model a separate exploration policy for the task distribution. Having two different policies gives more flexibility in training the exploration policy and also makes adaptation to any specific task easier. We show that using self-supervised or supervised learning objectives for adaptation allows for more efficient inner-loop updates and also demonstrate the superior performance of our model compared to prior works in this domain. △ Less

Submitted 10 November, 2019; originally announced November 2019.

Comments: CoRL 2019

arXiv:1808.04359 [pdf, other]

Community Regularization of Visually-Grounded Dialog

Authors: Akshat Agarwal, Swaminathan Gurumurthy, Vasu Sharma, Mike Lewis, Katia Sycara

Abstract: The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language. We posit that requiring artificial agents to adhere to the rules of human language, while also requiring them to maximize information exchange through dialog is an ill… ▽ More The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language. We posit that requiring artificial agents to adhere to the rules of human language, while also requiring them to maximize information exchange through dialog is an ill-posed problem. We observe that humans do not stray from a common language because they are social creatures who live in communities, and have to communicate with many people everyday, so it is far easier to stick to a common language even at the cost of some efficiency loss. Using this as inspiration, we propose and evaluate a multi-agent community-based dialog framework where each agent interacts with, and learns from, multiple agents, and show that this community-enforced regularization results in more relevant and coherent dialog (as judged by human evaluators) without sacrificing task performance (as judged by quantitative metrics). △ Less

Submitted 6 September, 2018; v1 submitted 10 August, 2018; originally announced August 2018.

Comments: 7 pages, ICML/AAMAS Adaptive Learning Agents Workshop 2018 and CVPR Visual Dialog Workshop 2018. Code available at https://github.com/agakshat/visualdialog-pytorch

arXiv:1807.03407 [pdf, other]

High Fidelity Semantic Shape Completion for Point Clouds using Latent Optimization

Authors: Swaminathan Gurumurthy, Shubham Agrawal

Abstract: Semantic shape completion is a challenging problem in 3D computer vision where the task is to generate a complete 3D shape using a partial 3D shape as input. We propose a learning-based approach to complete incomplete 3D shapes through generative modeling and latent manifold optimization. Our algorithm works directly on point clouds. We use an autoencoder and a GAN to learn a distribution of embed… ▽ More Semantic shape completion is a challenging problem in 3D computer vision where the task is to generate a complete 3D shape using a partial 3D shape as input. We propose a learning-based approach to complete incomplete 3D shapes through generative modeling and latent manifold optimization. Our algorithm works directly on point clouds. We use an autoencoder and a GAN to learn a distribution of embeddings for point clouds of object classes. An input point cloud with missing regions is first encoded to a feature vector. The representations learnt by the GAN are then used to find the best latent vector on the manifold using a combined optimization that finds a vector in the manifold of plausible vectors that is close to the original input (both in the feature space and the output space of the decoder). Experiments show that our algorithm is capable of successfully reconstructing point clouds with large missing regions with very high fidelity without having to rely on exemplar based database retrieval. △ Less

Submitted 29 September, 2018; v1 submitted 9 July, 2018; originally announced July 2018.

arXiv:1805.05356 [pdf, other]

doi 10.1145/3209811.3209879

Exploiting Data and Human Knowledge for Predicting Wildlife Poaching

Authors: Swaminathan Gurumurthy, Lantao Yu, Chenyan Zhang, Yongchao **, Wei** Li, Haidong Zhang, Fei Fang

Abstract: Poaching continues to be a significant threat to the conservation of wildlife and the associated ecosystem. Estimating and predicting where the poachers have committed or would commit crimes is essential to more effective allocation of patrolling resources. The real-world data in this domain is often sparse, noisy and incomplete, consisting of a small number of positive data (poaching signs), a la… ▽ More Poaching continues to be a significant threat to the conservation of wildlife and the associated ecosystem. Estimating and predicting where the poachers have committed or would commit crimes is essential to more effective allocation of patrolling resources. The real-world data in this domain is often sparse, noisy and incomplete, consisting of a small number of positive data (poaching signs), a large number of negative data with label uncertainty, and an even larger number of unlabeled data. Fortunately, domain experts such as rangers can provide complementary information about poaching activity patterns. However, this kind of human knowledge has rarely been used in previous approaches. In this paper, we contribute new solutions to the predictive analysis of poaching patterns by exploiting both very limited data and human knowledge. We propose an approach to elicit quantitative information from domain experts through a questionnaire built upon a clustering-based division of the conservation area. In addition, we propose algorithms that exploit qualitative and quantitative information provided by the domain experts to augment the dataset and improve learning. In collaboration with World Wild Fund for Nature, we show that incorporating human knowledge leads to better predictions in a conservation area in Northeastern China where the charismatic species is Siberian Tiger. The results show the importance of exploiting human knowledge when learning from limited data. △ Less

Submitted 14 May, 2018; originally announced May 2018.

Comments: COMPASS 2018

MSC Class: 68T10

arXiv:1706.02071 [pdf, other]

DeLiGAN : Generative Adversarial Networks for Diverse and Limited Data

Authors: Swaminathan Gurumurthy, Ravi Kiran Sarvadevabhatla, Venkatesh Babu Radhakrishnan

Abstract: A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities. However, typical GAN-based approaches require large amounts of training data to capture the diversity across the image modality. In this paper, we propose DeLiGAN… ▽ More A class of recent approaches for generating images, called Generative Adversarial Networks (GAN), have been used to generate impressively realistic images of objects, bedrooms, handwritten digits and a variety of other image modalities. However, typical GAN-based approaches require large amounts of training data to capture the diversity across the image modality. In this paper, we propose DeLiGAN -- a novel GAN-based architecture for diverse and limited training data scenarios. In our approach, we reparameterize the latent generative space as a mixture model and learn the mixture model's parameters along with those of GAN. This seemingly simple modification to the GAN framework is surprisingly effective and results in models which enable diversity in generated samples although trained with limited data. In our work, we show that DeLiGAN can generate images of handwritten digits, objects and hand-drawn sketches, all using limited amounts of data. To quantitatively characterize intra-class diversity of generated samples, we also introduce a modified version of "inception-score", a measure which has been found to correlate well with human assessment of generated samples. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: Accepted at CVPR-2017. Code for training the GAN models and computing modified inception-scores can be found at https://github.com/val-iisc/deligan

arXiv:1208.1429 [pdf]

doi 10.5121/csit.2012.2348

Deploying Health Monitoring ECU Towards Enhancing the Performance of In-Vehicle Network

Authors: Geetishree Mishra, Rajeshwari Hegde, K. S. Gurumurthy

Abstract: Electronic Control Units (ECUs) are the fundamental electronic building blocks of any automotive system. They are multi-purpose, multi-chip and multicore computer systems where more functionality is delivered in software rather than hardware. ECUs are valuable assets for the vehicles as critical time bounded messages are communicated through. Looking into the safety criticality, already developed… ▽ More Electronic Control Units (ECUs) are the fundamental electronic building blocks of any automotive system. They are multi-purpose, multi-chip and multicore computer systems where more functionality is delivered in software rather than hardware. ECUs are valuable assets for the vehicles as critical time bounded messages are communicated through. Looking into the safety criticality, already developed mission critical systems such as ABS, ESP etc, rely fully on electronic components leading to increasing requirements of more reliable and dependable electronic systems in vehicles. Hence it is inevitable to maintain and monitor the health of an ECU which will enable the ECUs to be followed, assessed and improved throughout their life-cycle starting from their inception into the vehicle. In this paper, we propose a Health monitoring ECU that enables the early trouble shooting and servicing of the vehicle prior to any catastrophic failure. △ Less

Submitted 7 August, 2012; originally announced August 2012.

Comments: 7 pages, 4 figures, FCST 2012

arXiv:1003.5442 [pdf]

doi 10.5121/vlsic.2010.1103

Arithmetic Operations in Multi-Valued Logic

Authors: Vasundara Patel, K. S. Gurumurthy

Abstract: This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved… ▽ More This paper presents arithmetic operations like addition, subtraction and multiplications in Modulo-4 arithmetic, and also addition, multiplication in Galois field, using multi-valued logic (MVL). Quaternary to binary and binary to quaternary converters are designed using down literal circuits. Negation in modular arithmetic is designed with only one gate. Logic design of each operation is achieved by reducing the terms using Karnaugh diagrams, kee** minimum number of gates and depth of net in to consideration. Quaternary multiplier circuit is proposed to achieve required optimization. Simulation result of each operation is shown separately using Hspice. △ Less

Submitted 29 March, 2010; originally announced March 2010.

Comments: 12 Pages, VLSICS Journal 2010

Journal ref: International Journal Of VLSI Design & Communication Systems 1.1 (2010) 21-32

Showing 1–10 of 10 results for author: Gurumurthy, S