Search | arXiv e-print repository

On the Fine-Grained Hardness of Inverting Generative Models

Authors: Feyza Duman Keles, Chinmay Hegde

Abstract: The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a… ▽ More The objective of generative model inversion is to identify a size-$n$ latent vector that produces a generative model output that closely matches a given target. This operation is a core computational primitive in numerous modern applications involving computer vision and NLP. However, the problem is known to be computationally challenging and NP-hard in the worst case. This paper aims to provide a fine-grained view of the landscape of computational hardness for this problem. We establish several new hardness lower bounds for both exact and approximate model inversion. In exact inversion, the goal is to determine whether a target is contained within the range of a given generative model. Under the strong exponential time hypothesis (SETH), we demonstrate that the computational complexity of exact inversion is lower bounded by $Ω(2^n)$ via a reduction from $k$-SAT; this is a strengthening of known results. For the more practically relevant problem of approximate inversion, the goal is to determine whether a point in the model range is close to a given target with respect to the $\ell_p$-norm. When $p$ is a positive odd integer, under SETH, we provide an $Ω(2^n)$ complexity lower bound via a reduction from the closest vectors problem (CVP). Finally, when $p$ is even, under the exponential time hypothesis (ETH), we provide a lower bound of $2^{Ω(n)}$ via a reduction from Half-Clique and Vertex-Cover. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 19 pages

arXiv:2305.02997 [pdf, other]

When Do Neural Nets Outperform Boosted Trees on Tabular Data?

Authors: Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, Colin White

Abstract: Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this… ▽ More Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla. △ Less

Submitted 30 October, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: NeurIPS Datasets and Benchmarks Track 2023

arXiv:2301.12540 [pdf, other]

Implicit Regularization for Group Sparsity

Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

Abstract: We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In… ▽ More We study the implicit regularization of gradient descent towards structured sparsity via a novel neural reparameterization, which we call a diagonally grouped linear neural network. We show the following intriguing property of our reparameterization: gradient descent over the squared regression loss, without any explicit regularization, biases towards solutions with a group sparsity structure. In contrast to many existing works in understanding implicit regularization, we prove that our training trajectory cannot be simulated by mirror descent. We analyze the gradient dynamics of the corresponding regression problem in the general noise setting and obtain minimax-optimal error rates. Compared to existing bounds for implicit sparse regularization using diagonal linear networks, our analysis with the new reparameterization shows improved sample complexity. In the degenerate case of size-one groups, our approach gives rise to a new algorithm for sparse linear regression. Finally, we demonstrate the efficacy of our approach with several numerical experiments. △ Less

Submitted 29 January, 2023; originally announced January 2023.

Comments: accepted by ICLR 2023

arXiv:2209.10105 [pdf, ps, other]

Distributed Online Non-convex Optimization with Composite Regret

Authors: Zhanhong Jiang, Aditya Balu, Xian Yeow Lee, Young M. Lee, Chinmay Hegde, Soumik Sarkar

Abstract: Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex los… ▽ More Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on develo** approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 41 pages, presented in allerton conference 2022

arXiv:2110.01532 [pdf, other]

Differentiable Spline Approximations

Authors: Minsu Cho, Aditya Balu, Ameya Joshi, Anjana Deva Prasad, Biswajit Khara, Soumik Sarkar, Baskar Ganapathysubramanian, Adarsh Krishnamurthy, Chinmay Hegde

Abstract: The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend grad… ▽ More The paradigm of differentiable programming has significantly enhanced the scope of machine learning via the judicious use of gradient-based optimization. However, standard differentiable programming methods (such as autodiff) typically require that the machine learning models be differentiable, limiting their applicability. Our goal in this paper is to use a new, principled approach to extend gradient-based optimization to functions well modeled by splines, which encompass a large family of piecewise polynomial models. We derive the form of the (weak) Jacobian of such functions and show that it exhibits a block-sparse structure that can be computed implicitly and efficiently. Overall, we show that leveraging this redesigned Jacobian in the form of a differentiable "layer" in predictive models leads to improved performance in diverse applications such as image segmentation, 3D point cloud reconstruction, and finite element analysis. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 9 pages, accepted in Neurips 2021

arXiv:2108.05574 [pdf, other]

Implicit Sparse Regularization: The Impact of Depth and Early Stop**

Authors: Jiangyuan Li, Thanh V. Nguyen, Chinmay Hegde, Raymond K. W. Wong

Abstract: In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stop** is crucial for gradient descent to converge to a sparse model, a phenomenon… ▽ More In this paper, we study the implicit bias of gradient descent for sparse regression. We extend results on regression with quadratic parametrization, which amounts to depth-2 diagonal linear networks, to more general depth-N networks, under more realistic settings of noise and correlated designs. We show that early stop** is crucial for gradient descent to converge to a sparse model, a phenomenon that we call implicit sparse regularization. This result is in sharp contrast to known results for noiseless and uncorrelated-design cases. We characterize the impact of depth and early stop** and show that for a general depth parameter N, gradient descent with early stop** achieves minimax optimal sparse recovery with sufficiently small initialization and step size. In particular, we show that increasing depth enlarges the scale of working initialization and the early-stop** window so that this implicit sparse regularization effect is more likely to take place. △ Less

Submitted 26 October, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: 32 pages, accepted by NeurIPS 2021. arXiv admin note: text overlap with arXiv:1909.05122 by other authors

arXiv:2105.06371 [pdf, other]

Provably Convergent Algorithms for Solving Inverse Problems Using Generative Models

Authors: Viraj Shah, Rakib Hyder, M. Salman Asif, Chinmay Hegde

Abstract: The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by deep generative networks). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-conv… ▽ More The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by deep generative networks). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-convex algorithmic approach that (a) theoretically enjoys linear convergence guarantees for certain linear and nonlinear inverse problems, and (b) empirically improves upon conventional techniques such as back-propagation. We support our claims with the experimental results for solving various inverse problems. We also propose an extension of our approach that can handle model mismatch (i.e., situations where the generative network prior is not exactly applicable). Together, our contributions serve as building blocks towards a principled use of generative models in inverse problems with more complete algorithmic understanding. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: arXiv admin note: text overlap with arXiv:1810.03587, arXiv:1802.08406

arXiv:2104.14538 [pdf, other]

Distributed Multigrid Neural Solvers on Megavoxel Domains

Authors: Aditya Balu, Sergio Botelho, Biswajit Khara, Vinay Rao, Chinmay Hegde, Soumik Sarkar, Santi Adavani, Adarsh Krishnamurthy, Baskar Ganapathysubramanian

Abstract: We consider the distributed training of large-scale neural networks that serve as PDE solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used… ▽ More We consider the distributed training of large-scale neural networks that serve as PDE solvers producing full field outputs. We specifically consider neural solvers for the generalized 3D Poisson equation over megavoxel domains. A scalable framework is presented that integrates two distinct advances. First, we accelerate training a large model via a method analogous to the multigrid technique used in numerical linear algebra. Here, the network is trained using a hierarchy of increasing resolution inputs in sequence, analogous to the 'V', 'W', 'F', and 'Half-V' cycles used in multigrid approaches. In conjunction with the multi-grid approach, we implement a distributed deep learning framework which significantly reduces the time to solve. We show the scalability of this approach on both GPU (Azure VMs on Cloud) and CPU clusters (PSC Bridges2). This approach is deployed to train a generalized 3D Poisson solver that scales well to predict output full-field solutions up to the resolution of 512x512x512 for a high dimensional family of inputs. △ Less

Submitted 29 April, 2021; originally announced April 2021.

arXiv:2102.12643 [pdf, other]

Provable Compressed Sensing with Generative Priors via Langevin Dynamics

Authors: Thanh V. Nguyen, Gauri Jagatap, Chinmay Hegde

Abstract: Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved… ▽ More Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved good empirical performance, its theoretical behavior is not well understood. In this paper, we introduce the use of stochastic gradient Langevin dynamics (SGLD) for compressed sensing with a generative prior. Under mild assumptions on the generative model, we prove the convergence of SGLD to the true signal. We also demonstrate competitive empirical performance to standard gradient descent. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2008.12338 [pdf, other]

Adversarially Robust Learning via Entropic Regularization

Authors: Gauri Jagatap, Ameya Joshi, Animesh Basak Chowdhury, Siddharth Garg, Chinmay Hegde

Abstract: In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an additional entropic regularization. Our loss function considers the contribution of adversarial samples that are drawn from a specially designed distribution in the data space that assigns high probability to points with high… ▽ More In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an additional entropic regularization. Our loss function considers the contribution of adversarial samples that are drawn from a specially designed distribution in the data space that assigns high probability to points with high loss and in the immediate neighborhood of training samples. Our proposed algorithms optimize this loss to seek adversarially robust valleys of the loss landscape. Our approach achieves competitive (or better) performance in terms of robust classification accuracy as compared to several state-of-the-art robust learning approaches on benchmark datasets such as MNIST and CIFAR-10. △ Less

Submitted 19 February, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

arXiv:2007.12792 [pdf, other]

Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models

Authors: Sergio Botelho, Ameya Joshi, Biswajit Khara, Soumik Sarkar, Chinmay Hegde, Santi Adavani, Baskar Ganapathysubramanian

Abstract: Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. Howeve… ▽ More Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions ($\geq 1024 \times 1024$). Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models - training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods. We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are $2-3\times$ faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size. △ Less

Submitted 24 July, 2020; originally announced July 2020.

Comments: 10 pages, 18 figures

arXiv:2007.04087 [pdf, other]

Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery

Authors: Minsu Cho, Mohammadreza Soltani, Chinmay Hegde

Abstract: In this paper, we study two important problems in the automated design of neural networks -- Hyper-parameter Optimization (HPO), and Neural Architecture Search (NAS) -- through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space en… ▽ More In this paper, we study two important problems in the automated design of neural networks -- Hyper-parameter Optimization (HPO), and Neural Architecture Search (NAS) -- through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy), leads to improvement over existing hyperparameter optimization methods. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach. In the second part of this paper, we establish a connection between NAS and structured sparse recovery. Building upon ``one-shot'' approaches in NAS, we propose a novel algorithm that we call CoNAS by merging ideas from one-shot approaches with a techniques for learning low-degree sparse Boolean polynomials. We provide theoretical analysis on the number of validation error measurements. Finally, we validate our approach on several datasets and discover novel architectures hitherto unreported, achieving competitive (or better) results in both performance and search time compared to the existing NAS approaches. △ Less

Submitted 6 July, 2020; originally announced July 2020.

Comments: arXiv admin note: text overlap with arXiv:1906.02869

arXiv:2006.15741 [pdf, ps, other]

ESPN: Extremely Sparse Pruned Networks

Authors: Minsu Cho, Ameya Joshi, Chinmay Hegde

Abstract: Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state… ▽ More Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods (such as SNIP) with Lottery-Ticket type approaches. We validate our approach on several datasets and outperform several existing pruning approaches in both test accuracy and compression ratio. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:1911.11983 [pdf, ps, other]

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-param… ▽ More A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies. △ Less

Submitted 2 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: Added Sections 3.2 and 3.4 on inductive biases. Fixed an error in deriving the neural tangent kernel in Section 3.3

arXiv:1910.06878 [pdf, other]

On Higher-order Moments in Adam

Authors: Zhanhong Jiang, Aditya Balu, Sin Yong Tan, Young M Lee, Chinmay Hegde, Soumik Sarkar

Abstract: In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension namely, HAdam, which uses higher order moments of the stochastic gradient. Our analysis and experiments reveal that certain higher-order moments of the stochas… ▽ More In this paper, we investigate the popular deep learning optimization routine, Adam, from the perspective of statistical moments. While Adam is an adaptive lower-order moment based (of the stochastic gradient) method, we propose an extension namely, HAdam, which uses higher order moments of the stochastic gradient. Our analysis and experiments reveal that certain higher-order moments of the stochastic gradient are able to achieve better performance compared to the vanilla Adam algorithm. We also provide some analysis of HAdam related to odd and even moments to explain some intriguing and seemingly non-intuitive empirical results. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Comments: Accepted in Beyond First Order Methods in Machine Learning workshop in 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1909.02583 [pdf, other]

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents

Authors: Xian Yeow Lee, Sambit Ghadai, Kai Liang Tan, Chinmay Hegde, Soumik Sarkar

Abstract: Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (AS) (corresponding to actuators in engineering systems) are… ▽ More Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (AS) (corresponding to actuators in engineering systems) are equally perverse; such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose a white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results shows that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents. △ Less

Submitted 18 November, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

Comments: Version 2 with supplementary materials

arXiv:1906.08763 [pdf, other]

Algorithmic Guarantees for Inverse Imaging with Untrained Network Priors

Authors: Gauri Jagatap, Chinmay Hegde

Abstract: Deep neural networks as image priors have been recently introduced for problems such as denoising, super-resolution and inpainting with promising performance gains over hand-crafted image priors such as sparsity and low-rank. Unlike learned generative priors they do not require any training over large datasets. However, few theoretical guarantees exist in the scope of using untrained neural networ… ▽ More Deep neural networks as image priors have been recently introduced for problems such as denoising, super-resolution and inpainting with promising performance gains over hand-crafted image priors such as sparsity and low-rank. Unlike learned generative priors they do not require any training over large datasets. However, few theoretical guarantees exist in the scope of using untrained neural network priors for inverse imaging problems. We explore new applications and theory for untrained neural network priors. Specifically, we consider the problem of solving linear inverse problems, such as compressive sensing, as well as non-linear problems, such as compressive phase retrieval. We model images to lie in the range of an untrained deep generative network with a fixed seed. We further present a projected gradient descent scheme that can be used for both compressive sensing and phase retrieval and provide rigorous theoretical guarantees for its convergence. We also show both theoretically as well as empirically that with deep network priors, one can achieve better compression rates for the same image quality compared to hand crafted priors. △ Less

Submitted 27 March, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

Comments: NeurIPS 2019 version with few modifications

Journal ref: NeurIPS 2019

arXiv:1906.02869 [pdf, other]

One-Shot Neural Architecture Search via Compressive Sensing

Authors: Minsu Cho, Mohammadreza Soltani, Chinmay Hegde

Abstract: Neural Architecture Search remains a very challenging meta-learning problem. Several recent techniques based on parameter-sharing idea have focused on reducing the NAS running time by leveraging proxy models, leading to architectures with competitive performance compared to those with hand-crafted designs. In this paper, we propose an iterative technique for NAS, inspired by algorithms for learnin… ▽ More Neural Architecture Search remains a very challenging meta-learning problem. Several recent techniques based on parameter-sharing idea have focused on reducing the NAS running time by leveraging proxy models, leading to architectures with competitive performance compared to those with hand-crafted designs. In this paper, we propose an iterative technique for NAS, inspired by algorithms for learning low-degree sparse Boolean functions. We validate our approach on the DARTs search space (Liu et al., 2018b) and NAS-Bench-201 (Yang et al., 2020). In addition, we provide theoretical analysis via upper bounds on the number of validation error measurements needed for reliable learning, and include ablation studies to further in-depth understanding of our technique. △ Less

Submitted 7 February, 2022; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: 2nd Workshop on Neural Architecture Search at ICLR 2021

arXiv:1906.01626 [pdf, other]

Encoding Invariances in Deep Generative Models

Authors: Viraj Shah, Ameya Joshi, Sambuddha Ghosal, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde

Abstract: Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a… ▽ More Reliable training of generative adversarial networks (GANs) typically require massive datasets in order to model complicated distributions. However, in several applications, training samples obey invariances that are \textit{a priori} known; for example, in complex physics simulations, the training data obey universal laws encoded as well-defined mathematical equations. In this paper, we propose a new generative modeling approach, InvNet, that can efficiently model data spaces with known invariances. We devise an adversarial training algorithm to encode them into data distribution. We validate our framework in three experimental settings: generating images with fixed motifs; solving nonlinear partial differential equations (PDEs); and reconstructing two-phase microstructures with desired statistical properties. We complement our experiments with several theoretical results. △ Less

Submitted 4 June, 2019; originally announced June 2019.

arXiv:1904.11095 [pdf, other]

Reducing The Search Space For Hyperparameter Optimization Using Group Sparsity

Authors: Minsu Cho, Chinmay Hegde

Abstract: We propose a new algorithm for hyperparameter selection in machine learning algorithms. The algorithm is a novel modification of Harmonica, a spectral hyperparameter selection approach using sparse recovery methods. In particular, we show that a special encoding of hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit stra… ▽ More We propose a new algorithm for hyperparameter selection in machine learning algorithms. The algorithm is a novel modification of Harmonica, a spectral hyperparameter selection approach using sparse recovery methods. In particular, we show that a special encoding of hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy) leads to improvement over existing hyperparameter optimization methods such as Successive Halving and Random Search. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach. △ Less

Submitted 24 April, 2019; originally announced April 2019.

Comments: Published at ICASSP 2019

arXiv:1812.00557 [pdf, other]

Signal Reconstruction from Modulo Observations

Authors: Viraj Shah, Chinmay Hegde

Abstract: We consider the problem of reconstructing a signal from under-determined modulo observations (or measurements). This observation model is inspired by a (relatively) less well-known imaging mechanism called modulo imaging, which can be used to extend the dynamic range of imaging systems; variations of this model have also been studied under the category of phase unwrap**. Signal reconstruction in… ▽ More We consider the problem of reconstructing a signal from under-determined modulo observations (or measurements). This observation model is inspired by a (relatively) less well-known imaging mechanism called modulo imaging, which can be used to extend the dynamic range of imaging systems; variations of this model have also been studied under the category of phase unwrap**. Signal reconstruction in the under-determined regime with modulo observations is a challenging ill-posed problem, and existing reconstruction methods cannot be used directly. In this paper, we propose a novel approach to solving the inverse problem limited to two modulo periods, inspired by recent advances in algorithms for phase retrieval under sparsity constraints. We show that given a sufficient number of measurements, our algorithm perfectly recovers the underlying signal and provides improved performance over other existing algorithms. We also provide experiments validating our approach on both synthetic and real data to depict its superior performance. △ Less

Submitted 16 July, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

arXiv:1811.09669 [pdf, other]

Physics-aware Deep Generative Models for Creating Synthetic Microstructures

Authors: Rahul Singh, Viraj Shah, Balaji Pokuri, Soumik Sarkar, Baskar Ganapathysubramanian, Chinmay Hegde

Abstract: A key problem in computational material science deals with understanding the effect of material distribution (i.e., microstructure) on material performance. The challenge is to synthesize microstructures, given a finite number of microstructure images, and/or some physical invariances that the microstructure exhibits. Conventional approaches are based on stochastic optimization and are computation… ▽ More A key problem in computational material science deals with understanding the effect of material distribution (i.e., microstructure) on material performance. The challenge is to synthesize microstructures, given a finite number of microstructure images, and/or some physical invariances that the microstructure exhibits. Conventional approaches are based on stochastic optimization and are computationally intensive. We introduce three generative models for the fast synthesis of binary microstructure images. The first model is a WGAN model that uses a finite number of training images to synthesize new microstructures that weakly satisfy the physical invariances respected by the original data. The second model explicitly enforces known physical invariances by replacing the traditional discriminator in a GAN with an invariance checker. Our third model combines the first two models to reconstruct microstructures that respect both explicit physics invariances as well as implicit constraints learned from the image data. We illustrate these models by reconstructing two-phase microstructures that exhibit coarsening behavior. The trained models also exhibit interesting latent variable interpolation behavior, and the results indicate considerable promise for enforcing user-defined physics constraints during microstructure synthesis. △ Less

Submitted 21 November, 2018; originally announced November 2018.

arXiv:1810.03587 [pdf, ps, other]

Algorithmic Aspects of Inverse Problems Using Generative Models

Authors: Chinmay Hegde

Abstract: The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by generative adversarial networks, or GANs). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a… ▽ More The traditional approach of hand-crafting priors (such as sparsity) for solving inverse problems is slowly being replaced by the use of richer learned priors (such as those modeled by generative adversarial networks, or GANs). In this work, we study the algorithmic aspects of such a learning-based approach from a theoretical perspective. For certain generative network architectures, we establish a simple non-convex algorithmic approach that (a) theoretically enjoys linear convergence guarantees for certain inverse problems, and (b) empirically improves upon conventional techniques such as back-propagation. We also propose an extension of our approach that can handle model mismatch (i.e., situations where the generative network prior is not exactly applicable.) Together, our contributions serve as building blocks towards a more complete algorithmic understanding of generative models in inverse problems. △ Less

Submitted 8 October, 2018; originally announced October 2018.

arXiv:1806.07863 [pdf, ps, other]

Learning ReLU Networks via Alternating Minimization

Authors: Gauri Jagatap, Chinmay Hegde

Abstract: We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given samples, interleaved with weight updates via a least-squares step. The main focus of our paper are 1-hidden layer networks with $k$ hidden neurons and ReLU activatio… ▽ More We propose and analyze a new family of algorithms for training neural networks with ReLU activations. Our algorithms are based on the technique of alternating minimization: estimating the activation patterns of each ReLU for all given samples, interleaved with weight updates via a least-squares step. The main focus of our paper are 1-hidden layer networks with $k$ hidden neurons and ReLU activation. We show that under standard distributional assumptions on the $d-$dimensional input data, our algorithm provably recovers the true `ground truth' parameters in a linearly convergent fashion. This holds as long as the weights are sufficiently well initialized; furthermore, our method requires only $n=\widetilde{O}(dk^2)$ samples. We also analyze the special case of 1-hidden layer networks with skipped connections, commonly used in ResNet-type architectures, and propose a novel initialization strategy for the same. For ReLU based ResNet type networks, we provide the first linear convergence guarantee with an end-to-end algorithm. We also extend this framework to deeper networks and empirically demonstrate its convergence to a global minimum. △ Less

Submitted 10 October, 2018; v1 submitted 20 June, 2018; originally announced June 2018.

arXiv:1806.00572 [pdf, ps, other]

Autoencoders Learn Generative Linear Models

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of the… ▽ More We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of these models, we prove that under suitable choices of hyperparameters, architectures, and initialization, autoencoders learned by gradient descent can successfully recover the parameters of the corresponding model. To our knowledge, this is the first result that rigorously studies the dynamics of gradient descent for weight-sharing autoencoders. Our analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks. △ Less

Submitted 15 February, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Experimental study on synthesis data added. Typos fixed

arXiv:1805.12120 [pdf, other]

On Consensus-Optimality Trade-offs in Collaborative Deep Learning

Authors: Zhanhong Jiang, Aditya Balu, Chinmay Hegde, Soumik Sarkar

Abstract: In distributed machine learning, where agents collaboratively learn from diverse private data sets, there is a fundamental tension between consensus and optimality. In this paper, we build on recent algorithmic progresses in distributed deep learning to explore various consensus-optimality trade-offs over a fixed communication topology. First, we propose the incremental consensus-based distributed… ▽ More In distributed machine learning, where agents collaboratively learn from diverse private data sets, there is a fundamental tension between consensus and optimality. In this paper, we build on recent algorithmic progresses in distributed deep learning to explore various consensus-optimality trade-offs over a fixed communication topology. First, we propose the incremental consensus-based distributed SGD (i-CDSGD) algorithm, which involves multiple consensus steps (where each agent communicates information with its neighbors) within each SGD iteration. Second, we propose the generalized consensus-based distributed SGD (g-CDSGD) algorithm that enables us to navigate the full spectrum from complete consensus (all agents agree) to complete disagreement (each agent converges to individual model parameters). We analytically establish convergence of the proposed algorithms for strongly convex and nonconvex objective functions; we also analyze the momentum variants of the algorithms for the strongly convex case. We support our algorithms via numerical experiments, and demonstrate significant improvements over existing methods for collaborative deep learning. △ Less

Submitted 30 May, 2018; originally announced May 2018.

arXiv:1804.09217 [pdf, ps, other]

On Learning Sparsely Used Dictionaries from Incomplete Samples

Authors: Thanh V. Nguyen, Akshay Soni, Chinmay Hegde

Abstract: Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the… ▽ More Most existing algorithms for dictionary learning assume that all entries of the (high-dimensional) input data are fully observed. However, in several practical applications (such as hyper-spectral imaging or blood glucose monitoring), only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the dictionary learning literature. In this paper, we provide provable approaches for learning - from incomplete samples - a family of dictionaries whose atoms have sufficiently "spread-out" mass. First, we propose a descent-style iterative algorithm that linearly converges to the true dictionary when provided a sufficiently coarse initial estimate. Second, we propose an initialization algorithm that utilizes a small number of extra fully observed samples to produce such a coarse initial estimate. Finally, we theoretically analyze their performance and provide asymptotic statistical and computational guarantees. △ Less

Submitted 24 April, 2018; originally announced April 2018.

arXiv:1802.08406 [pdf, other]

Solving Linear Inverse Problems Using GAN Priors: An Algorithm with Provable Guarantees

Authors: Viraj Shah, Chinmay Hegde

Abstract: In recent works, both sparsity-based methods as well as learning-based methods have proven to be successful in solving several challenging linear inverse problems. However, sparsity priors for natural signals and images suffer from poor discriminative capability, while learning-based methods seldom provide concrete theoretical guarantees. In this work, we advocate the idea of replacing hand-crafte… ▽ More In recent works, both sparsity-based methods as well as learning-based methods have proven to be successful in solving several challenging linear inverse problems. However, sparsity priors for natural signals and images suffer from poor discriminative capability, while learning-based methods seldom provide concrete theoretical guarantees. In this work, we advocate the idea of replacing hand-crafted priors, such as sparsity, with a Generative Adversarial Network (GAN) to solve linear inverse problems such as compressive sensing. In particular, we propose a projected gradient descent (PGD) algorithm for effective use of GAN priors for linear inverse problems, and also provide theoretical guarantees on the rate of convergence of this algorithm. Moreover, we show empirically that our algorithm demonstrates superior performance over an existing method of leveraging GANs for compressive sensing. △ Less

Submitted 23 February, 2018; originally announced February 2018.

arXiv:1712.03281 [pdf, other]

Fast Low-Rank Matrix Estimation without the Condition Number

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: In this paper, we study the general problem of optimizing a convex function $F(L)$ over the set of $p \times p$ matrices, subject to rank constraints on $L$. However, existing first-order methods for solving such problems either are too slow to converge, or require multiple invocations of singular value decompositions. On the other hand, factorization-based non-convex algorithms, while being much… ▽ More In this paper, we study the general problem of optimizing a convex function $F(L)$ over the set of $p \times p$ matrices, subject to rank constraints on $L$. However, existing first-order methods for solving such problems either are too slow to converge, or require multiple invocations of singular value decompositions. On the other hand, factorization-based non-convex algorithms, while being much faster, require stringent assumptions on the \emph{condition number} of the optimum. In this paper, we provide a novel algorithmic framework that achieves the best of both worlds: asymptotically as fast as factorization methods, while requiring no dependency on the condition number. We instantiate our general framework for three important matrix estimation problems that impact several practical applications; (i) a \emph{nonlinear} variant of affine rank minimization, (ii) logistic PCA, and (iii) precision matrix estimation in probabilistic graphical model learning. We then derive explicit bounds on the sample complexity as well as the running time of our approach, and show that it achieves the best possible bounds for both cases. We also provide an extensive range of experimental results, and demonstrate that our algorithm provides a very attractive tradeoff between estimation accuracy and running time. △ Less

Submitted 8 December, 2017; originally announced December 2017.

arXiv:1711.06221 [pdf, other]

A Forward-Backward Approach for Visualizing Information Flow in Deep Networks

Authors: Aditya Balu, Thanh V. Nguyen, Apurva Kokate, Chinmay Hegde, Soumik Sarkar

Abstract: We introduce a new, systematic framework for visualizing information flow in deep networks. Specifically, given any trained deep convolutional network model and a given test image, our method produces a compact support in the image domain that corresponds to a (high-resolution) feature that contributes to the given explanation. Our method is both computationally efficient as well as numerically ro… ▽ More We introduce a new, systematic framework for visualizing information flow in deep networks. Specifically, given any trained deep convolutional network model and a given test image, our method produces a compact support in the image domain that corresponds to a (high-resolution) feature that contributes to the given explanation. Our method is both computationally efficient as well as numerically robust. We present several preliminary numerical results that support the benefits of our framework over existing methods. △ Less

Submitted 16 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning

arXiv:1711.03638 [pdf, ps, other]

Provably Accurate Double-Sparse Coding

Authors: Thanh V. Nguyen, Raymond K. W. Wong, Chinmay Hegde

Abstract: Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, and other machine learning applications. The central goal is to learn an overcomplete dictionary that can sparsely represent a given input dataset. However, a key challenge is that storage, transmission, and processing of the learned dictionary can be untenably high if the data dimension is high. In t… ▽ More Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, and other machine learning applications. The central goal is to learn an overcomplete dictionary that can sparsely represent a given input dataset. However, a key challenge is that storage, transmission, and processing of the learned dictionary can be untenably high if the data dimension is high. In this paper, we consider the double-sparsity model introduced by Rubinstein et al. (2010b) where the dictionary itself is the product of a fixed, known basis and a data-adaptive sparse component. First, we introduce a simple algorithm for double-sparse coding that can be amenable to efficient implementation via neural architectures. Second, we theoretically analyze its performance and demonstrate asymptotic sample complexity and running time benefits over existing (provable) approaches for sparse coding. To our knowledge, our work introduces the first computationally efficient algorithm for double-sparse coding that enjoys rigorous statistical guarantees. Finally, we support our analysis via several numerical experiments on simulated data, confirming that our method can indeed be useful in problem sizes encountered in practical applications. △ Less

Submitted 12 December, 2017; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: 40 pages. An abbreviated conference version appears at AAAI 2018

arXiv:1710.00109 [pdf, other]

Reconstruction from Periodic Nonlinearities, With Applications to HDR Imaging

Authors: Viraj Shah, Mohammadreza Soltani, Chinmay Hegde

Abstract: We consider the problem of reconstructing signals and images from periodic nonlinearities. For such problems, we design a measurement scheme that supports efficient reconstruction; moreover, our method can be adapted to extend to compressive sensing-based signal and image acquisition systems. Our techniques can be potentially useful for reducing the measurement complexity of high dynamic range (HD… ▽ More We consider the problem of reconstructing signals and images from periodic nonlinearities. For such problems, we design a measurement scheme that supports efficient reconstruction; moreover, our method can be adapted to extend to compressive sensing-based signal and image acquisition systems. Our techniques can be potentially useful for reducing the measurement complexity of high dynamic range (HDR) imaging systems, with little loss in reconstruction quality. Several numerical experiments on real data demonstrate the effectiveness of our approach. △ Less

Submitted 29 September, 2017; originally announced October 2017.

arXiv:1708.02999 [pdf, other]

Demixing Structured Superposition Signals from Periodic and Aperiodic Nonlinear Observations

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: We consider the demixing problem of two (or more) structured high-dimensional vectors from a limited number of nonlinear observations where this nonlinearity is due to either a periodic or an aperiodic function. We study certain families of structured superposition models, and propose a method which provably recovers the components given (nearly) $m = \mathcal{O}(s)$ samples where $s$ denotes the… ▽ More We consider the demixing problem of two (or more) structured high-dimensional vectors from a limited number of nonlinear observations where this nonlinearity is due to either a periodic or an aperiodic function. We study certain families of structured superposition models, and propose a method which provably recovers the components given (nearly) $m = \mathcal{O}(s)$ samples where $s$ denotes the sparsity level of the underlying components. This strictly improves upon previous nonlinear demixing techniques and asymptotically matches the best possible sample complexity. We also provide a range of simulations to illustrate the performance of the proposed algorithms. △ Less

Submitted 8 August, 2017; originally announced August 2017.

Comments: arXiv admin note: substantial text overlap with arXiv:1701.06597

arXiv:1706.08936 [pdf, other]

Fast Algorithms for Learning Latent Variables in Graphical Models

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: We study the problem of learning latent variables in Gaussian graphical models. Existing methods for this problem assume that the precision matrix of the observed variables is the superposition of a sparse and a low-rank component. In this paper, we focus on the estimation of the low-rank component, which encodes the effect of marginalization over the latent variables. We introduce fast, proper le… ▽ More We study the problem of learning latent variables in Gaussian graphical models. Existing methods for this problem assume that the precision matrix of the observed variables is the superposition of a sparse and a low-rank component. In this paper, we focus on the estimation of the low-rank component, which encodes the effect of marginalization over the latent variables. We introduce fast, proper learning algorithms for this problem. In contrast with existing approaches, our algorithms are manifestly non-convex. We support their efficacy via a rigorous theoretical analysis, and show that our algorithms match the best possible in terms of sample complexity, while achieving computational speed-ups over existing methods. We complement our theory with several numerical experiments. △ Less

Submitted 11 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

arXiv:1706.07880 [pdf, ps, other]

Collaborative Deep Learning in Fixed Topology Networks

Authors: Zhanhong Jiang, Aditya Balu, Chinmay Hegde, Soumik Sarkar

Abstract: There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper… ▽ More There is significant recent interest to parallelize deep learning algorithms in order to handle the enormous growth in data and model sizes. While most advances focus on model parallelization and engaging multiple computing agents via using a central parameter server, aspect of data parallelization along with decentralized computation has not been explored sufficiently. In this context, this paper presents a new consensus-based distributed SGD (CDSGD) (and its momentum variant, CDMSGD) algorithm for collaborative deep learning over fixed topology networks that enables data parallelization as well as decentralized computation. Such a framework can be extremely useful for learning agents with access to only local/private data in a communication constrained environment. We analyze the convergence properties of the proposed algorithm with strongly convex and nonconvex objective functions with fixed and diminishing step sizes using concepts of Lyapunov function construction. We demonstrate the efficacy of our algorithms in comparison with the baseline centralized SGD and the recently proposed federated averaging algorithm (that also enables data parallelism) based on benchmark datasets such as MNIST, CIFAR-10 and CIFAR-100. △ Less

Submitted 23 June, 2017; originally announced June 2017.

arXiv:1705.07469 [pdf, other]

Improved Algorithms for Matrix Recovery from Rank-One Projections

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: We consider the problem of estimation of a low-rank matrix from a limited number of noisy rank-one projections. In particular, we propose two fast, non-convex \emph{proper} algorithms for matrix recovery and support them with rigorous theoretical analysis. We show that the proposed algorithms enjoy linear convergence and that their sample complexity is independent of the condition number of the un… ▽ More We consider the problem of estimation of a low-rank matrix from a limited number of noisy rank-one projections. In particular, we propose two fast, non-convex \emph{proper} algorithms for matrix recovery and support them with rigorous theoretical analysis. We show that the proposed algorithms enjoy linear convergence and that their sample complexity is independent of the condition number of the unknown true low-rank matrix. By leveraging recent advances in low-rank matrix approximation techniques, we show that our algorithms achieve computational speed-ups over existing methods. Finally, we complement our theory with some numerical experiments. △ Less

Submitted 21 May, 2017; originally announced May 2017.

arXiv:1705.06412 [pdf, other]

Sample-Efficient Algorithms for Recovering Structured Signals from Magnitude-Only Measurements

Authors: Gauri Jagatap, Chinmay Hegde

Abstract: We consider the problem of recovering a signal $\mathbf{x}^* \in \mathbf{R}^n$, from magnitude-only measurements $y_i = |\left\langle\mathbf{a}_i,\mathbf{x}^*\right\rangle|$ for $i=[m]$. Also called the phase retrieval, this is a fundamental challenge in bio-,astronomical imaging and speech processing. The problem above is ill-posed; additional assumptions on the signal and/or the measurements are… ▽ More We consider the problem of recovering a signal $\mathbf{x}^* \in \mathbf{R}^n$, from magnitude-only measurements $y_i = |\left\langle\mathbf{a}_i,\mathbf{x}^*\right\rangle|$ for $i=[m]$. Also called the phase retrieval, this is a fundamental challenge in bio-,astronomical imaging and speech processing. The problem above is ill-posed; additional assumptions on the signal and/or the measurements are necessary. In this paper we first study the case where the signal $\mathbf{x}^*$ is $s$-sparse. We develop a novel algorithm that we call Compressive Phase Retrieval with Alternating Minimization, or CoPRAM. Our algorithm is simple; it combines the classical alternating minimization approach for phase retrieval with the CoSaMP algorithm for sparse recovery. Despite its simplicity, we prove that CoPRAM achieves a sample complexity of $O(s^2\log n)$ with Gaussian measurements $\mathbf{a}_i$, matching the best known existing results; moreover, it demonstrates linear convergence in theory and practice. Additionally, it requires no extra tuning parameters other than signal sparsity $s$ and is robust to noise. When the sorted coefficients of the sparse signal exhibit a power law decay, we show that CoPRAM achieves a sample complexity of $O(s\log n)$, which is close to the information-theoretic limit. We also consider the case where the signal $\mathbf{x}^*$ arises from structured sparsity models. We specifically examine the case of block-sparse signals with uniform block size of $b$ and block sparsity $k=s/b$. For this problem, we design a recovery algorithm Block CoPRAM that further reduces the sample complexity to $O(ks\log n)$. For sufficiently large block lengths of $b=Θ(s)$, this bound equates to $O(s\log n)$. To our knowledge, this constitutes the first end-to-end algorithm for phase retrieval where the Gaussian sample complexity has a sub-quadratic dependence on the signal sparsity level. △ Less

Submitted 26 November, 2017; v1 submitted 18 May, 2017; originally announced May 2017.

arXiv:1701.06607 [pdf, other]

Stable Recovery Of Sparse Vectors From Random Sinusoidal Feature Maps

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: Random sinusoidal features are a popular approach for speeding up kernel-based inference in large datasets. Prior to the inference stage, the approach suggests performing dimensionality reduction by first multiplying each data vector by a random Gaussian matrix, and then computing an element-wise sinusoid. Theoretical analysis shows that collecting a sufficient number of such features can be relia… ▽ More Random sinusoidal features are a popular approach for speeding up kernel-based inference in large datasets. Prior to the inference stage, the approach suggests performing dimensionality reduction by first multiplying each data vector by a random Gaussian matrix, and then computing an element-wise sinusoid. Theoretical analysis shows that collecting a sufficient number of such features can be reliably used for subsequent inference in kernel classification and regression. In this work, we demonstrate that with a mild increase in the dimension of the embedding, it is also possible to reconstruct the data vector from such random sinusoidal features, provided that the underlying data is sparse enough. In particular, we propose a numerically stable algorithm for reconstructing the data vector given the nonlinear features, and analyze its sample complexity. Our algorithm can be extended to other types of structured inverse problems, such as demixing a pair of sparse (but incoherent) vectors. We support the efficacy of our approach via numerical experiments. △ Less

Submitted 11 July, 2017; v1 submitted 23 January, 2017; originally announced January 2017.

arXiv:1701.06597 [pdf, other]

Iterative Thresholding for Demixing Structured Superpositions in High Dimensions

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: We consider the demixing problem of two (or more) high-dimensional vectors from nonlinear observations when the number of such observations is far less than the ambient dimension of the underlying vectors. Specifically, we demonstrate an algorithm that stably estimate the underlying components under general \emph{structured sparsity} assumptions on these components. Specifically, we show that for… ▽ More We consider the demixing problem of two (or more) high-dimensional vectors from nonlinear observations when the number of such observations is far less than the ambient dimension of the underlying vectors. Specifically, we demonstrate an algorithm that stably estimate the underlying components under general \emph{structured sparsity} assumptions on these components. Specifically, we show that for certain types of structured superposition models, our method provably recovers the components given merely $n = \mathcal{O}(s)$ samples where $s$ denotes the number of nonzero entries in the underlying components. Moreover, our method achieves a fast (linear) convergence rate, and also exhibits fast (near-linear) per-iteration complexity for certain types of structured models. We also provide a range of simulations to illustrate the performance of the proposed algorithm. △ Less

Submitted 23 January, 2017; originally announced January 2017.

arXiv:1608.01234 [pdf, other]

doi 10.1109/TSP.2017.2706181

Fast Algorithms for Demixing Sparse Signals from Nonlinear Observations

Authors: Mohammadreza Soltani, Chinmay Hegde

Abstract: We study the problem of demixing a pair of sparse signals from noisy, nonlinear observations of their superposition. Mathematically, we consider a nonlinear signal observation model, $y_i = g(a_i^Tx) + e_i, \ i=1,\ldots,m$, where $x = Φw+Ψz$ denotes the superposition signal, $Φ$ and $Ψ$ are orthonormal bases in $\mathbb{R}^n$, and $w, z\in\mathbb{R}^n$ are sparse coefficient vectors of the constit… ▽ More We study the problem of demixing a pair of sparse signals from noisy, nonlinear observations of their superposition. Mathematically, we consider a nonlinear signal observation model, $y_i = g(a_i^Tx) + e_i, \ i=1,\ldots,m$, where $x = Φw+Ψz$ denotes the superposition signal, $Φ$ and $Ψ$ are orthonormal bases in $\mathbb{R}^n$, and $w, z\in\mathbb{R}^n$ are sparse coefficient vectors of the constituent signals, and $e_i$ represents the noise. Moreover, $g$ represents a nonlinear link function, and $a_i\in\mathbb{R}^n$ is the $i$-th row of the measurement matrix, $A\in\mathbb{R}^{m\times n}$. Problems of this nature arise in several applications ranging from astronomy, computer vision, and machine learning. In this paper, we make some concrete algorithmic progress for the above demixing problem. Specifically, we consider two scenarios: (i) the case when the demixing procedure has no knowledge of the link function, and (ii) the case when the demixing algorithm has perfect knowledge of the link function. In both cases, we provide fast algorithms for recovery of the constituents $w$ and $z$ from the observations. Moreover, we support these algorithms with a rigorous theoretical analysis, and derive (nearly) tight upper bounds on the sample complexity of the proposed algorithms for achieving stable recovery of the component signals. We also provide a range of numerical simulations to illustrate the performance of the proposed algorithms on both real and synthetic signals and images. △ Less

Submitted 21 July, 2017; v1 submitted 3 August, 2016; originally announced August 2016.

arXiv:1202.1595 [pdf, ps, other]

Signal Recovery on Incoherent Manifolds

Authors: Chinmay Hegde, Richard G. Baraniuk

Abstract: Suppose that we observe noisy linear measurements of an unknown signal that can be modeled as the sum of two component signals, each of which arises from a nonlinear sub-manifold of a high dimensional ambient space. We introduce SPIN, a first order projected gradient method to recover the signal components. Despite the nonconvex nature of the recovery problem and the possibility of underdetermined… ▽ More Suppose that we observe noisy linear measurements of an unknown signal that can be modeled as the sum of two component signals, each of which arises from a nonlinear sub-manifold of a high dimensional ambient space. We introduce SPIN, a first order projected gradient method to recover the signal components. Despite the nonconvex nature of the recovery problem and the possibility of underdetermined measurements, SPIN provably recovers the signal components, provided that the signal manifolds are incoherent and that the measurement operator satisfies a certain restricted isometry property. SPIN significantly extends the scope of current recovery models and algorithms for low dimensional linear inverse problems and matches (or exceeds) the current state of the art in terms of performance. △ Less

Submitted 8 June, 2012; v1 submitted 7 February, 2012; originally announced February 2012.

Comments: 20 pages, 3 figures. Submitted to IEEE Trans. Inform. Theory. Revised version (June 2012) : fixed typos in proofs

Showing 1–41 of 41 results for author: Hegde, C