Skip to main content

Showing 1–33 of 33 results for author: Balakrishnan, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19388  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Taming Data and Transformers for Audio Generation

    Authors: Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

    Abstract: Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing two new models. First, we propose AutoCap, a high-quality and efficient automatic audio captioning model. We show that by leveraging metadata available… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Webpage: https://snap-research.github.io/GenAU/

  2. arXiv:2404.15274  [pdf, other

    cs.LG cs.CV eess.IV physics.med-ph

    Metric-guided Image Reconstruction Bounds via Conformal Prediction

    Authors: Matt Y Cheung, Tucker J Netherton, Laurence E Court, Ashok Veeraraghavan, Guha Balakrishnan

    Abstract: Recent advancements in machine learning have led to novel imaging systems and algorithms that address ill-posed problems. Assessing their trustworthiness and understanding how to deploy them safely at test time remains an important and open problem. We propose a method that leverages conformal prediction to retrieve upper/lower bounds and statistical inliers/outliers of reconstructions based on th… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  3. arXiv:2402.03748  [pdf, other

    cs.DS

    Succinct Data Structure for Chordal Graphs with Bounded Vertex Leafage

    Authors: Girish Balakrishnan, Sankardeep Chakraborty, N S Narayanaswamy, Kunihiko Sadakane

    Abstract: We improve the worst-case information theoretic lower bound of Munro and Wu (ISAAC 2018) for $n-$vertex unlabeled chordal graphs when vertex leafage is bounded and leafage is unbounded. The class of unlabeled $k-$vertex leafage chordal graphs that consists of all chordal graphs with vertex leafage at most $k$ and unbounded leafage, denoted $\mathcal{G}_k$, is introduced for the first time. For… ▽ More

    Submitted 11 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 19 pages, 2 figure

  4. arXiv:2311.18822  [pdf, other

    cs.CV

    ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

    Authors: Moayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez

    Abstract: Diffusion models have revolutionized image generation in recent years, yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion, a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global… ▽ More

    Submitted 31 March, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at CVPR 2024. Project Page: https://elasticdiffusion.github.io/

  5. arXiv:2311.18064  [pdf, other

    cs.CV

    GELDA: A generative language annotation framework to reveal visual biases in datasets

    Authors: Krish Kabra, Kathleen M. Lewis, Guha Balakrishnan

    Abstract: Bias analysis is a crucial step in the process of creating fair datasets for training and evaluating computer vision models. The bottleneck in dataset analysis is annotation, which typically requires: (1) specifying a list of attributes relevant to the dataset domain, and (2) classifying each image-attribute pair. While the second step has made rapid progress in automation, the first has remained… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 21 pages, 15 figures, 9 tables

  6. arXiv:2311.16353  [pdf, other

    cs.LG cs.AI cs.CV eess.IV eess.SP

    Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations

    Authors: Delaram Pirhayatifard, Mohammad Taha Toghani, Guha Balakrishnan, César A. Uribe

    Abstract: In this work, we address the challenge of multi-task image generation with limited data for denoising diffusion probabilistic models (DDPM), a class of generative models that produce high-quality images by reversing a noisy diffusion process. We propose a novel method, SR-DDPM, that leverages representation-based techniques from few-shot learning to effectively learn from fewer samples across diff… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  7. arXiv:2311.02427  [pdf, other

    cs.DS

    Succinct Data Structure for Graphs with $d$-Dimensional $t$-Representation

    Authors: Girish Balakrishnan, Sankardeep Chakraborty, Seungbum Jo, N S Narayanaswamy, Kunihiko Sadakane

    Abstract: Erdős and West (Discrete Mathematics'85) considered the class of $n$ vertex intersection graphs which have a {\em $d$-dimensional} {\em $t$-representation}, that is, each vertex of a graph in the class has an associated set consisting of at most $t$ $d$-dimensional axis-parallel boxes. In particular, for a graph $G$ and for each $d \geq 1$, they consider $i_d(G)$ to be the minimum $t$ for which… ▽ More

    Submitted 6 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: 21 pages, 5 figures

  8. arXiv:2308.05441  [pdf, other

    cs.CV

    Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

    Authors: Hao Liang, Pietro Perona, Guha Balakrishnan

    Abstract: We propose an experimental method for measuring bias in face recognition systems. Existing methods to measure bias depend on benchmark datasets that are collected in the wild and annotated for protected (e.g., race, gender) and non-protected (e.g., pose, lighting) attributes. Such observational datasets only permit correlational conclusions, e.g., "Algorithm A's accuracy is different on female and… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: accepted to iccv2023; 18 figures

  9. arXiv:2308.02100  [pdf, other

    eess.IV cs.CV

    CT Reconstruction from Few Planar X-rays with Application towards Low-resource Radiotherapy

    Authors: Yiran Sun, Tucker Netherton, Laurence Court, Ashok Veeraraghavan, Guha Balakrishnan

    Abstract: CT scans are the standard-of-care for many clinical ailments, and are needed for treatments like external beam radiotherapy. Unfortunately, CT scanners are rare in low and mid-resource settings due to their costs. Planar X-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work, we propose a method to generate CT vo… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: 10 pages, 5 figures

  10. arXiv:2305.20048  [pdf, other

    cs.CV

    F?D: On understanding the role of deep feature spaces on face generation evaluation

    Authors: Krish Kabra, Guha Balakrishnan

    Abstract: Perceptual metrics, like the Fréchet Inception Distance (FID), are widely used to assess the similarity between synthetically generated and ground truth (real) images. The key idea behind these metrics is to compute errors in a deep feature space that captures perceptually and semantically rich image features. Despite their popularity, the effect that different deep features and their design choic… ▽ More

    Submitted 11 August, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Code and dataset to be released soon

  11. arXiv:2305.00149  [pdf, other

    eess.IV cs.CV

    X-ray Recognition: Patient identification from X-rays using a contrastive objective

    Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

    Abstract: Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest po… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  12. arXiv:2305.00147  [pdf, other

    eess.IV cs.CV

    Visualizing chest X-ray dataset biases using GANs

    Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

    Abstract: Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), t… ▽ More

    Submitted 5 September, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

    Comments: Medical Imaging with Deep Learning(MIDL) 2023

  13. arXiv:2304.02101  [pdf, other

    cs.DC cs.CV cs.NI

    MadEye: Boosting Live Video Analytics Accuracy with Adaptive Camera Configurations

    Authors: Mike Wong, Murali Ramanujam, Guha Balakrishnan, Ravi Netravali

    Abstract: Camera orientations (i.e., rotation and zoom) govern the content that a camera captures in a given scene, which in turn heavily influences the accuracy of live video analytics pipelines. However, existing analytics approaches leave this crucial adaptation knob untouched, instead opting to only alter the way that captured images from fixed orientations are encoded, streamed, and analyzed. We presen… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 19 pages, 16 figures

  14. arXiv:2302.12828  [pdf, other

    cs.CV cs.LG

    SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

    Authors: Ahmed Imtiaz Humayun, Randall Balestriero, Guha Balakrishnan, Richard Baraniuk

    Abstract: Current Deep Network (DN) visualization and interpretability methods rely heavily on data space visualizations such as scoring which dimensions of the data are responsible for their associated prediction or generating new data features or samples that best match a given DN unit or representation. In this paper, we go one step further by develo** the first provably exact method for computing the… ▽ More

    Submitted 6 June, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: 11 pages, 20 figures

  15. arXiv:2302.03750  [pdf, other

    cs.CV cs.LG stat.ME

    Linking convolutional kernel size to generalization bias in face analysis CNNs

    Authors: Hao Liang, Josue Ortega Caro, Vikram Maheshri, Ankit B. Patel, Guha Balakrishnan

    Abstract: Training dataset biases are by far the most scrutinized factors when explaining algorithmic biases of neural networks. In contrast, hyperparameters related to the neural network architecture have largely been ignored even though different network parameterizations are known to induce different implicit biases over learned features. For example, convolutional kernel size is known to affect the freq… ▽ More

    Submitted 3 December, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: WACV 2024

  16. arXiv:2301.05187  [pdf, other

    cs.CV cs.GR eess.IV

    WIRE: Wavelet Implicit Neural Representations

    Authors: Vishwanath Saragadam, Daniel LeJeune, Jasper Tan, Guha Balakrishnan, Ashok Veeraraghavan, Richard G. Baraniuk

    Abstract: Implicit neural representations (INRs) have recently advanced numerous vision-related areas. INR performance depends strongly on the choice of the nonlinear activation function employed in its multilayer perceptron (MLP) network. A wide range of nonlinearities have been explored, but, unfortunately, current INRs designed to have high accuracy also suffer from poor robustness (to signal noise, para… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  17. arXiv:2203.04913  [pdf, other

    cs.CV cs.LG

    Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers

    Authors: Dominik Zietlow, Michael Lohaus, Guha Balakrishnan, Matthäus Kleindessner, Francesco Locatello, Bernhard Schölkopf, Chris Russell

    Abstract: Algorithmic fairness is frequently motivated in terms of a trade-off in which overall performance is decreased so as to improve performance on disadvantaged groups where the algorithm would otherwise be less accurate. Contrary to this, we find that applying existing fairness approaches to computer vision improve fairness by degrading the performance of classifiers across all groups (with increased… ▽ More

    Submitted 31 March, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

  18. arXiv:2202.03532  [pdf, other

    cs.CV

    MINER: Multiscale Implicit Neural Representations

    Authors: Vishwanath Saragadam, Jasper Tan, Guha Balakrishnan, Richard G. Baraniuk, Ashok Veeraraghavan

    Abstract: We introduce a new neural signal model designed for efficient high-resolution representation of large-scale signals. The key innovation in our multiscale implicit neural representation (MINER) is an internal representation via a Laplacian pyramid, which provides a sparse multiscale decomposition of the signal that captures orthogonal parts of the signal across scales. We leverage the advantages of… ▽ More

    Submitted 17 July, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 14 pages, accepted to ECCV 2022

  19. arXiv:2201.10423  [pdf, other

    cs.CV cs.GR cs.LG

    Rayleigh EigenDirections (REDs): GAN latent space traversals for multidimensional features

    Authors: Guha Balakrishnan, Raghudeep Gadde, Aleix Martinez, Pietro Perona

    Abstract: We present a method for finding paths in a deep generative model's latent space that can maximally vary one set of image features while holding others constant. Crucially, unlike past traversal approaches, ours can manipulate multidimensional features of an image such as facial identity and pixels within a specified region. Our method is principled and conceptually simple: optimal traversal direct… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  20. arXiv:2111.04332  [pdf, other

    cs.DS

    Succinct Data Structure for Path Graphs

    Authors: Girish Balakrishnan, Sankardeep Chakraborty, N S Narayanaswamy, Kunihiko Sadakane

    Abstract: We consider the problem of designing a succinct data structure for {\it path graphs} (which are a proper subclass of chordal graphs and a proper superclass of interval graphs) on $n$ vertices while supporting degree, adjacency, and neighborhood queries efficiently. We provide the following two solutions for this problem: - an $n \log n+o(n \log n)$-bit succinct data structure that supports adjac… ▽ More

    Submitted 2 March, 2023; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: 39 pages, 5 figure, 6 sections, 2 tables

  21. arXiv:2103.13455  [pdf, other

    cs.CV cs.AI cs.LG stat.AP

    Matched sample selection with GANs for mitigating attribute confounding

    Authors: Chandan Singh, Guha Balakrishnan, Pietro Perona

    Abstract: Measuring biases of vision systems with respect to protected attributes like gender and age is critical as these systems gain widespread use in society. However, significant correlations between attributes in benchmark datasets make it difficult to separate algorithmic bias from dataset bias. To mitigate such attribute confounding during bias analysis, we propose a matching approach that selects a… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  22. arXiv:2011.11156  [pdf, other

    cs.CV

    Better Aggregation in Test-Time Augmentation

    Authors: Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag

    Abstract: Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that… ▽ More

    Submitted 11 October, 2021; v1 submitted 22 November, 2020; originally announced November 2020.

    Journal ref: ICCV 2021

  23. arXiv:2007.06570  [pdf, other

    cs.CV cs.CY cs.LG

    Towards causal benchmarking of bias in face analysis algorithms

    Authors: Guha Balakrishnan, Yuanjun Xiong, Wei Xia, Pietro Perona

    Abstract: Measuring algorithmic bias is crucial both to assess algorithmic fairness, and to guide the improvement of algorithms. Current methods to measure algorithmic bias in computer vision, which are based on observational datasets, are inadequate for this task because they conflate algorithmic bias with dataset bias. To address this problem we develop an experimental method for measuring algorithmic b… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

    Comments: Long-form version of ECCV 2020 paper

  24. arXiv:2001.01026  [pdf, other

    cs.GR cs.CV

    Painting Many Pasts: Synthesizing Time Lapse Videos of Paintings

    Authors: Amy Zhao, Guha Balakrishnan, Kathleen M. Lewis, Frédo Durand, John V. Guttag, Adrian V. Dalca

    Abstract: We introduce a new video synthesis task: synthesizing time lapse videos depicting how a given painting might have been created. Artists paint using unique combinations of brushes, strokes, and colors. There are often many possible ways to create a given painting. Our goal is to learn to capture this rich range of possibilities. Creating distributions of long-term videos is a challenge for learni… ▽ More

    Submitted 25 April, 2020; v1 submitted 3 January, 2020; originally announced January 2020.

    Comments: 10 pages, CVPR 2020

  25. arXiv:2001.00059  [pdf, other

    cs.SE cs.CL cs.LG cs.PL

    Learning and Evaluating Contextual Embedding of Source Code

    Authors: Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi

    Abstract: Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come with the development of pre-trained contextual embeddings, such as BERT, which can be fine-tuned for downstream tasks with less labeled data and training budget,… ▽ More

    Submitted 17 August, 2020; v1 submitted 21 December, 2019; originally announced January 2020.

    Comments: Published in ICML 2020. This version (v.3) is the final camera-ready version of the paper. It contains the re-computed results, based on the open-sourced datasets

  26. arXiv:1909.00475  [pdf, other

    cs.CV

    Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

    Authors: Guha Balakrishnan, Adrian V. Dalca, Amy Zhao, John V. Guttag, Fredo Durand, William T. Freeman

    Abstract: We introduce visual deprojection: the task of recovering an image or video that has been collapsed along a dimension. Projections arise in various contexts, such as long-exposure photography, where a dynamic scene is collapsed in time to produce a motion-blurred image, and corner cameras, where reflected light from a scene is collapsed along a spatial dimension because of an edge occluder to yield… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  27. Unsupervised Learning of Probabilistic Diffeomorphic Registration for Images and Surfaces

    Authors: Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu

    Abstract: Classical deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require supervis… ▽ More

    Submitted 23 July, 2019; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: MedIA: Medical Image Analysis (MICCAI2018 Special Issue). Expands on MICCAI 2018 paper (arXiv:1805.04605) by introducing an extension to anatomical surface registration, new experiments, and analysis of diffeomorphic implementations. Keywords: medical image registration; diffeomorphic; invertible; probabilistic modeling; variational inference. Code available at http://voxelmorph.csail.mit.edu. arXiv admin note: text overlap with arXiv:1805.04605

  28. arXiv:1902.09383  [pdf, other

    cs.CV

    Data augmentation using learned transformations for one-shot medical image segmentation

    Authors: Amy Zhao, Guha Balakrishnan, Frédo Durand, John V. Guttag, Adrian V. Dalca

    Abstract: Image segmentation is an important task in many medical applications. Methods based on convolutional neural networks attain state-of-the-art accuracy; however, they typically rely on supervised training with large labeled datasets. Labeling medical images requires significant expertise and time, and typical hand-tuned approaches for data augmentation fail to capture the complex variations in such… ▽ More

    Submitted 6 April, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: 9 pages, CVPR 2019

  29. VoxelMorph: A Learning Framework for Deformable Medical Image Registration

    Authors: Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

    Abstract: We present VoxelMorph, a fast learning-based framework for deformable, pairwise medical image registration. Traditional registration methods optimize an objective function for each pair of images, which can be time-consuming for large datasets or rich deformation models. In contrast to this approach, and building on recent learning-based methods, we formulate registration as a function that maps a… ▽ More

    Submitted 1 September, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

    Comments: Accepted to IEEE TMI ( (c) IEEE). This manuscript expands the CVPR 2018 paper (arXiv:1802.02604) by introducing an auxiliary model that uses segmentation maps during training, an amortized optimization analysis, and extensive model analysis. Code available at http://voxelmorph.csail.mit.edu

  30. Unsupervised Learning for Fast Probabilistic Diffeomorphic Registration

    Authors: Adrian V. Dalca, Guha Balakrishnan, John Guttag, Mert R. Sabuncu

    Abstract: Traditional deformable registration techniques achieve impressive results and offer a rigorous theoretical treatment, but are computationally intensive since they solve an optimization problem for each image pair. Recently, learning-based methods have facilitated fast registration by learning spatial deformation functions. However, these approaches use restricted deformation models, require superv… ▽ More

    Submitted 14 September, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

    Comments: MICCAI 2018 (Oral Presentation). Proceedings: LNCS 11070, pp 729-738

    Journal ref: LNCS 11070, pp 729-738, Springer. 2018

  31. arXiv:1804.07739  [pdf, other

    cs.CV

    Synthesizing Images of Humans in Unseen Poses

    Authors: Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag

    Abstract: We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, retaining the appearance of both the person and background. We present a modular generative neural network that synthesizes unseen poses using training pairs of images and poses taken from human action videos. Our network separates a… ▽ More

    Submitted 20 April, 2018; originally announced April 2018.

    Comments: CVPR 2018

  32. An Unsupervised Learning Model for Deformable Medical Image Registration

    Authors: Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca

    Abstract: We present a fast learning-based algorithm for deformable, pairwise 3D medical image registration. Current registration methods optimize an objective function independently for each pair of images, which can be time-consuming for large data. We define registration as a parametric function, and optimize its parameters given a set of images from a collection of interest. Given a new pair of scans, w… ▽ More

    Submitted 20 April, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

    Comments: 9 pages, in CVPR 2018

  33. arXiv:1612.04007  [pdf, other

    cs.CV

    A Video-Based Method for Objectively Rating Ataxia

    Authors: Ronnachai Jaroensri, Amy Zhao, Guha Balakrishnan, Derek Lo, Jeremy Schmahmann, John Guttag, Fredo Durand

    Abstract: For many movement disorders, such as Parkinson's disease and ataxia, disease progression is visually assessed by a clinician using a numerical disease rating scale. These tests are subjective, time-consuming, and must be administered by a professional. This can be problematic where specialists are not available, or when a patient is not consistently evaluated by the same clinician. We present an a… ▽ More

    Submitted 7 September, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Comments: MLHC 2017